I’m trying to run isoformswitchanalyzR, my history is here: Galaxy | Europe
I seem to be having a problem with the reference transcriptome not matching with my quantified data.
My error message:
“Step 1 of 7: Checking data…
Step 2 of 7: Obtaining annotation…
importing GTF (this may take a while)…
Error in importRdata(isoformCountMatrix = quantificationData$counts, isoformRepExpression = quantificationData$abundance, :
The annotation and quantification (count/abundance matrix and isoform annotation) seems to be different (Jaccard similarity < 0.925).
Either isforoms found in the annotation are not quantifed or vise versa.
Specifically:
63009 isoforms were quantified.
60127 isoforms are annotated.
Only 60049 overlap.
2960 isoforms quantifed had no corresponding annoation”
I’ve double checked my filtering steps and they seem fine, I’m not sure if I’ve missed something or if the wormbasegeneIDs are messing something up?
Thank you. I did indeed skip the latter manipulation, because it doesn’t seem to cut the transcripts fully contained in a reference intron out of the reference - just identifies them. Am I misunderstanding the tutorial?
Yes, as far as I understand, the problem that IsoformswitchanalyzR is having is that there are unannotated isoforms in one of its inputs, and not in the other. I noticed in the tutorial that "Use Reference transcripts only?” is toggled to No at the beginning of the Stringtie assembly, but later toggled to yes. I’m attempting to re-run the workflow but always toggling “Use reference transcripts only” to “yes”
Those settings were specified on purpose since they fit the data in the tutorial example. Setting both to Yes is what I would suggest as well. I had forgotten about that bit (it has come up before) but definitely recognize what is going on now, so thanks for clarifying. Hope that works out!
Well that does seem to resolve that error, thank you! However, now I get the following error, and I’m not sure how to interpret it or what to do with it. Any suggestions would be very gratefully appreciated
"Step 1 of 3: Identifying which algorithm was used…
The quantification algorithm used was: StringTie
Found 6 quantification file(s) of interest
Step 2 of 3: Reading data…
reading in files with read_tsv
1 Error in tximport::tximport(files = localFiles, type = tolower(dataAnalyed$orign), :
all(c(abundanceCol, countsCol, lengthCol) %in% names(raw)) is not TRUE
Calls: importIsoformExpression → → stopifnot
Warning message:
One or more parsing issues, call problems() on your data frame for details,
e.g.:
dat ← vroom(…)
problems(dat) "
Hum… this is about importing a file and the data frame (“table”) doesn’t match what the tool is expecting. Meaning, the data needs to be R friendly. We can catch some of that for free-text custom values entered on the tool form but not all cases. So, simple is better.
What I usually check first:
No empty files or header only files (it happens…)
Do any files have a header? Try removing those, leaving just data lines.
Are all custom values entered on the tool form (labels) formatted Ok? Alphanumeric characters, underscores, and not starting with a number tend to work best. Also, keep them short-ish to avoid another gotcha.
If that doesn’t work, please post back a link to the Dataset Details view (“i” in a circle icon in an expanded dataset). That has all the technical details, plus the inputs, parameters, and full job logs. The data will only have a peak view which is sometimes enough.
Note: Including dots . in the element identifiers of collections can cause different problems, especially when using a workflow. When creating a collection, always use the default option to remove file extensions. From here, you could recreate the collection or use the Collection Manipulation tools to adjust them.
I’m trying to re-run your analysis from the beginning; I’m not sure if the problem could be a result of a deficient annotation, because IsoformSwitchAnalyzeR seems to be very stringent about it. Could you re-upload the Caenorhabditis paired-end datasets and share the fresh history with me?
Certainly, here is a history with just the paired-end datasets and the WBcel235 assembly files. I had considered it was a problem with not using ensembl IDs.