I was trying to run a pipeline these days. Unfortunately, I don’t know the strandedness of the sample (fastq files) I have. Is there any way to check this? I tried already to convert my fastqs to BAMs with bowtie and then use Infer Experiment to check for that. I was wondering if there is any other way.
Hi, I am new to this area too! I have been reading about the same thing. I did an RNA seq experiment using the illumina TruSeq stranded mRNA kit and as part of this protocol strand specificity is achieved by replacing dTTP with dUTP in the SMM (Second Strand Marking Mix), followed by second strand cDNA synthesis using DNA Polymerase I and RNase H. The incorporation of dUTP in second strand synthesis quenches the second strand during amplification. I think that means that only the forward strand is taken forward so the library and resulting data is forward stranded. Hope this helps or that I can be corrected!
To check actual strandedness, as a QA/QC step for data with unknown protocols, or your own data with a known (to confirm), try this:
Map the reads to the reference genome as unstranded
Run the tool Infer Experiment on the resulting bam
Note this will require a bed dataset with 12 columns
Depending on how your obtained/created the bed data, it may have the datatype bed or bed12 assigned. Either is Ok, but check to make sure the data really has 12 columns
UCSC’s Table Brower is one source for complete bed data (tool: Get Data > UCSC main).
If your genome is not supported by UCSC, or you want to base the gene model on some other data provider’s genome annotation, you can convert any gtf dataset to a bed12 dataset (tool: Convert GTF to BED12)
Do not obtain gtf data from the UCSC Table Browser for most purposes
UCSC has a gtf appropriate for RNA-seq tools for selected genomes in their Downloads area
If working somewhere else, or you want a downloaded copy, you can use the Table Browser directly (https://genome.ucsc.edu/cgi-bin/hgTables) and not check the box to send the output to Galaxy. Once the file is downloaded, use the Upload tool to get it into Galaxy.
Generating and Interpreting the results from `Infer Experiment`
See these “Galaxy Training Network” (GTN) tutorials:
A search with the keyword “gtf” will find more topics, too.
Converting Fastq > BAM produces a bam dataset without any mapping information. If you actually mapped already with Bowtie (this wasn’t clear to me!), maybe there was some other problem. RNA-seq data might not map well enough with a DNA mapping tool – use an RNA mapping tool like HISAT2 instead to see if that helps.
Try the method above to produce a bam with mapping results for QA/QC purposes, run Infer Experiment on that, interpret the results, then run the analysis mapping with strand settings that match your data.