convert to CRAM (for some reason they say they aren’t cram)
samtools sort
convert to Bam
sam tools fasta
trimmomatic
bowtie2
htseqcount
and then with other data I am
uploading fastq files
uncompressing
trimmomatic
bowtie
htseq count
and then combining all this data into deseq2.
Everything was going smoothly until I started to use the bowtie. It kept coming up with ‘Uncaught exception in exposed API method on any tools’. I tried to use any other tools and they came up with the same error. All my jobs are complete up to this point until the trimmomatic on the fastq files but I won’t even let me stop them.
It is hard to guess but maybe the fastq files have some problem? These two tools are good for confirming format and scientific content: FastQC and Fastq info. We have a QA tutorial here that explains how to run the process on batches of data → Hands-on: Quality Control / Quality Control / Sequence analysis. You can swap out the trimming tool to adapt this to your process (although either should be fine to prepare reads for DE analysis).
Then, for the mapping step, these are RNA-seq reads, correct? If true, you’ll want to use a splice-aware mapping tool like HISAT2 or RNA STAR that works better for RNA-to-DNA alignments.
We have a complete end-to-end pipeline for differential expression analysis convered in our tutorials (known features), along with protocols that involve more transcript discovery steps (known + novel features, or just novel). See → Transcriptomics / Tutorial List. Note: these tutorials all have workflows that you can adapt, plus some important tips for mapper parameters – so even if you just review some parts, they are worth it.
I also see some extra steps in the tools you list out. Some advice:
Avoid uncompressing data or adding in sorting steps unless the downstream tool requires it. This should be somewhat rare.
All BAM files are coordinate sorted upon upload or when converted from other formats – unless you are using the tool to coordinate sort the BAM for some reason? This has very specific utility, and I can think of just one tool that requires that and it wasn’t in your listing. But maybe I am misunderstanding this.
Most tools can work with compressed fastq data, including the QC, trimming, and mapping tools you mentioned. For what you are doing, the datatype would be fastqsanger.gz, and Galaxy should “guess” that without extra intervention. The QC tools above can confirm the format, along with the scientific checks.
When using the Upload tool, consider using all defaults. If the datatype guess is wrong, that can be an important clue that something is unexpected in the new data. This guide has more details → Getting Data into Galaxy
If you could load up BAM instead of CRAM that might be a better place to start since it seems the data isn’t being recognized correctly.
Example: if you want to try loading CRAM, I am not sure what the uncompress step is doing or how you are doing that (pencil icon?). You won’t need a SAM file for most tools – BAM is enough: even if the tool has some legacy naming that includes “sam”, it will work with BAM (with one exception I can think of, and it wasn’t a listed tool).
Please give that a review and we can follow up more once you are oriented. If you get stuck at a particular step, we can help to troubleshoot the error. This is how to share your work for feedback here. But first try loading your data again, and running some QA once you have reads in Galaxy, since I think that is where things are going wrong.
The API error is very odd! Even “bad” data shouldn’t cause this kind of error so it is a bit confusing. So I’m now wondering where you are working. The UseGalaxy.org server seems fine to me today, and the UseGalaxy.eu server was undergoing an update earlier. That seems finished now so a rerun might be enough if that is where you are working and the fastq data checks out as Ok. If you can capture a screenshot when this happens, and describe where you are working (URL) and what you were doing that might help us to figure out what was going on.
Thank you for this help! It is very helpful and I will be applying it.
I am working on getting a screenshot of the error and will put it on here shortly. It won’t let me sign out, delete any jobs or use any tools, so I’m not sure how to start over.
Additionally, do you have any guidelines about working with the CRAM files. That is the format of the majority of my files and I don’t have BAM versions.
Then, if you want to extract the reads, use SamToFastq (BAM input is fine)
Followed by the QA checks, and downstream steps
I think others have had problems with CRAM data but I haven’t seen any discussion about that recently, so I would give this a try. If you have problems we can followup more about it and find out the actual status.
@jennaj,
I tried to use the Convert CRAM to BAM tool but when I uploaded the data they were recognized as fastsanger.gz. I tried to put these through the suggest workflow in ‘Reference-based RNA Seq data analysis’ but when I was mapped it using STAR, the alignment score was about 7%. Do you know why galaxy would recognize my cram files as fastsanger.gz. Is there something I should do about this?
To make sure I understand: You had CRAM files locally, used the Upload tool to get those into Galaxy, and the result from that was fastq reads in your history. Maybe that is how CRAM files are handled now – automatically converted to reads.
For the alignment result, anytime the very first tool you use after loading new data has a problem, the first thing to check is what is actually inside that newly loaded data.
So, you could perform some QA on those reads. But maybe just start by looking at them – if you click on the “eye” icon Galaxy will display the top of the file in an uncompressed version. Do those look like fastq files? How to check, and how to determine what kind of reads and the format (single-end, paired-end, interleaved paired-end) → Hands-on: NGS data logistics / NGS data logistics / Introduction to Galaxy Analyses
You can swap out that specific trimming tool for others, but the tools that generate statistics based on the quality scores are used with any method, even if you don’t plan on trimming. Sort of a sanity check: are the files intact (technically) and is the quality good enough to use as-is (scientifically). That tutorial section also includes a really simple mapping section that can help to get the job set up correctly. Be sure to notice the parts about single-end and paired-end data. Most tools do not process interleaved paired-end reads, so you will need those split out, then you can put the files into a paired-end collection to process in batches.
All this format standardization can seem tedious, but you only need to do it once. It makes using a bunch of tools from different original tool authors possible. Meaning, set the organization to fit a datatype, then any tool will know how to work with it later on.
You can explain more if I misunderstood some of this