"error in importRdata" using isoformswitchanalyzR

Margaret_M · July 4, 2023, 11:47am

Hello,

I’m trying to run isoformswitchanalyzR, my history is here: Galaxy | Europe

I seem to be having a problem with the reference transcriptome not matching with my quantified data.

My error message:

“Step 1 of 7: Checking data…
Step 2 of 7: Obtaining annotation…
importing GTF (this may take a while)…
Error in importRdata(isoformCountMatrix = quantificationData$counts, isoformRepExpression = quantificationData$abundance, :
The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925).
Either isforoms found in the annotation are not quantifed or vise versa.
Specifically:
63009 isoforms were quantified.
60127 isoforms are annotated.
Only 60049 overlap.
2960 isoforms quantifed had no corresponding annoation”

I’ve double checked my filtering steps and they seem fine, I’m not sure if I’ve missed something or if the wormbasegeneIDs are messing something up?

Thanks for any help

Tutorial:

jennaj · July 6, 2023, 11:04pm

Hi @Margaret_M

Stringtie will create a type of placeholder transcript and gene ID for novel data. Novel means no known annotation.

Using the setting Use Reference transcripts only? = YES will restrict the data to known annotation, so that is one choice.

Or, an annotation GTF and transcript fasta can be created to include novel predictions. That is what is happening in this section of the tutorial (near the end, please scroll down). https://training.galaxyproject.org/topics/transcriptomics/tutorials/differential-isoform-expression/tutorial.html#transcriptome-assembly-quantification-and-evaluation

I don’t see those manipulations in your history – or did I miss it?

Margaret_M · July 21, 2023, 4:28pm

Thank you. I did indeed skip the latter manipulation, because it doesn’t seem to cut the transcripts fully contained in a reference intron out of the reference - just identifies them. Am I misunderstanding the tutorial?

jennaj · July 21, 2023, 5:04pm

Yes, this is what the analysis is doing. Identifying novels.

I’m not sure what this means. Could you explain more?

Margaret_M · July 21, 2023, 5:21pm

Yes, as far as I understand, the problem that IsoformswitchanalyzR is having is that there are unannotated isoforms in one of its inputs, and not in the other. I noticed in the tutorial that "Use Reference transcripts only?” is toggled to No at the beginning of the Stringtie assembly, but later toggled to yes. I’m attempting to re-run the workflow but always toggling “Use reference transcripts only” to “yes”

jennaj · July 21, 2023, 5:28pm

Those settings were specified on purpose since they fit the data in the tutorial example. Setting both to Yes is what I would suggest as well. I had forgotten about that bit (it has come up before) but definitely recognize what is going on now, so thanks for clarifying. Hope that works out!

Margaret_M · July 21, 2023, 7:28pm

Well that does seem to resolve that error, thank you! However, now I get the following error, and I’m not sure how to interpret it or what to do with it. Any suggestions would be very gratefully appreciated

My history: Galaxy | Europe

"Step 1 of 3: Identifying which algorithm was used…
The quantification algorithm used was: StringTie
Found 6 quantification file(s) of interest
Step 2 of 3: Reading data…
reading in files with read_tsv
1 Error in tximport::tximport(files = localFiles, type = tolower(dataAnalyed$orign), :
all(c(abundanceCol, countsCol, lengthCol) %in% names(raw)) is not TRUE
Calls: importIsoformExpression → → stopifnot
Warning message:
One or more parsing issues, call problems() on your data frame for details,
e.g.:
dat ← vroom(…)
problems(dat) "

jennaj · July 21, 2023, 10:23pm

Hum… this is about importing a file and the data frame (“table”) doesn’t match what the tool is expecting. Meaning, the data needs to be R friendly. We can catch some of that for free-text custom values entered on the tool form but not all cases. So, simple is better.

What I usually check first:

No empty files or header only files (it happens…)
Do any files have a header? Try removing those, leaving just data lines.
Are all custom values entered on the tool form (labels) formatted Ok? Alphanumeric characters, underscores, and not starting with a number tend to work best. Also, keep them short-ish to avoid another gotcha.

If that doesn’t work, please post back a link to the Dataset Details view (“i” in a circle icon in an expanded dataset). That has all the technical details, plus the inputs, parameters, and full job logs. The data will only have a peak view which is sometimes enough.

Margaret_M · August 19, 2023, 3:52am

I’ve checked and removed the one header I could find. I’ve made sure that the files are not empty and the values are all formatted fine.

Link to the details here.

Thanks for any suggestions

jennaj · August 21, 2023, 4:59pm

Hi @Margaret_M

This GTF still appears to have headers. Please try correcting this data and any others like it. You can run the manipulation on the collection.

tool: Select
option: Not Matching
regular expression: ^#

Note: Including dots . in the element identifiers of collections can cause different problems, especially when using a workflow. When creating a collection, always use the default option to remove file extensions. From here, you could recreate the collection or use the Collection Manipulation tools to adjust them.

gallardoalba · August 22, 2023, 12:57pm

Hi @Margaret_M,

I’m trying to re-run your analysis from the beginning; I’m not sure if the problem could be a result of a deficient annotation, because IsoformSwitchAnalyzeR seems to be very stringent about it. Could you re-upload the Caenorhabditis paired-end datasets and share the fresh history with me?

Regards

Margaret_M · August 28, 2023, 6:22pm

Certainly, here is a history with just the paired-end datasets and the WBcel235 assembly files. I had considered it was a problem with not using ensembl IDs.

gallardoalba · August 29, 2023, 12:31pm

Hi @Margaret_M,
did you remove the hidden files?

Margaret_M · September 5, 2023, 4:56pm

I’m very sorry, which files would you like to see?

gallardoalba · September 12, 2023, 5:22pm

The original FASTQ files.

Topic		Replies	Views
IsoformSwitchAnalyzR - "longer object length is not a multiple of shorter object length" usegalaxy.eu support gtn-tutorial , transcriptomics , stringtie	2	534	July 4, 2023
Import Error with annotating Deseq2 results	5	69	July 5, 2024
Troubleshooting FeatureCounts Error featurecounts	1	82	November 28, 2024
Error in reference genome file format - STRINGTIE transcriptomics , stringtie	1	153	May 28, 2024
Genome wide alternate splicing analysis - IsoformSwitchAnalyzeR error from StringTie input - no CDS transcriptomics	9	113	August 7, 2024

"error in importRdata" using isoformswitchanalyzR

Related topics