Looks great! I’m glad the serve could chomp through those RNA-Star jobs! I’ll keep checking back on yours and mine. Then we can follow up if any errors come up (after one rerun try for stray cluster issues during the event). Thanks!!
Hey Jennifer,
So I noticed the RNAquast tool still does not work as the job keeps failing, but everything else seems to be running smoothly.
Hi @SehajR
The job and the rest of the history look really great! I’m really glad to see the IsoformSwitcher results as successful!
The rnaQuast tool is exceeding the runtime limits UseGalaxy.org server. That runtime limit cannot be extended (for any tool) and is not related to the job memory (which we can sometimes adjust). FAQ: Understanding walltime error messages
Instead, you can try running that particular tool at a server with a longer runtime allocation. UseGalaxy.eu is probably the best choice, although UseGalaxy.org.au might work too.
You don’t need to start over – just copy the inputs for that tool into a new history (using the gear icon above the list of datasets). Then transfer that smaller history over to the EU server and try running the tool there.
I started a test at EU to see what happens. We can follow up about it more. I’m curious if the full sized human run is even possible at all on the public servers.
This is certainly a stress test with rnaQuast!
Notes:
I happened to notice an example to help with understanding the complexity of this type of analysis. What seemed (just by eye, I don’t “recommend” this method!) to be low complexity regions in the predicted transcripts, and sometimes the entire transcript!
I took the first transcript from the final sample in collection 2104, sample Wt26_Rep3.fastq and ran a BLAT against the human genome at UCSC. The hit was for a simple repeat in multiple places of the genome for over 200 bases. So I clicked through to the browser to examine the first hit in the listing. Can you notice why this is interesting but also maybe difficult for tools to process, even on really massive clusters? Toggle on the GCR Incident track for some interesting details. Clearly conserved, too. The nature of sequences like these are exactly why full scale transcriptomics on humans is so complex – it isn’t automatic, and can require a bit of curation, especially when casting a wider net for discovery purposes. All at once is challenging, anywhere, and can be limited by the tools/methods themselves. This is part of why scientists tend to focus on smaller genomic regions (horizontal slices of the genome) and or feature types/clusters (vertical slices of content meaning).
>STRG.1.1
TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC
CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAACCCTAACCCTAACCCTAACCCTAACCCTAA
CCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCT
AACCC
How to solve this is complex. Maybe the RNA-Star alignments could have some parameter tunings to avoid this. Maybe low complexity regions can be filtered at later steps. Maybe some data is processed the way to you have done already, and some is put off into a different slice for custom manipulations. But let’s see what happens with my test job – I’d like to see if rnaQuast can report about situations like this in real data, and how well.
Hey Jennifer!
Thank you so much for following up, I really appreciate the troubleshooting on your end! I’ll monitor the rerun on the EU server from the history link you shared, it seems to be running still.
As for the BLAT results, it is indeed really interesting! A lot of these concepts are new to me as I do not come from a bioinformatics background, but definitely insightful, I really wasn’t aware of the level of complexity we’re working with…
Sehaj