Check for strandedness

doctorasgr · September 8, 2020, 12:20pm

Hi all,
I was trying to run a pipeline these days. Unfortunately, I don’t know the strandedness of the sample (fastq files) I have. Is there any way to check this? I tried already to convert my fastqs to BAMs with bowtie and then use Infer Experiment to check for that. I was wondering if there is any other way.

Thank you,
Athanasios

Alysha · September 8, 2020, 2:18pm

Hi, I am new to this area too! I have been reading about the same thing. I did an RNA seq experiment using the illumina TruSeq stranded mRNA kit and as part of this protocol strand specificity is achieved by replacing dTTP with dUTP in the SMM (Second Strand Marking Mix), followed by second strand cDNA synthesis using DNA Polymerase I and RNase H. The incorporation of dUTP in second strand synthesis quenches the second strand during amplification. I think that means that only the forward strand is taken forward so the library and resulting data is forward stranded. Hope this helps or that I can be corrected!
Alysha

jennaj · September 9, 2020, 5:04pm

Hi @doctorasgr and @Alysha

To check actual strandedness, as a QA/QC step for data with unknown protocols, or your own data with a known (to confirm), try this:

Map the reads to the reference genome as unstranded
Run the tool Infer Experiment on the resulting bam

Note this will require a bed dataset with 12 columns
Depending on how your obtained/created the bed data, it may have the datatype bed or bed12 assigned. Either is Ok, but check to make sure the data really has 12 columns
UCSC’s Table Brower is one source for complete bed data (tool: Get Data > UCSC main).
If your genome is not supported by UCSC, or you want to base the gene model on some other data provider’s genome annotation, you can convert any gtf dataset to a bed12 dataset (tool: Convert GTF to BED12)
- Do not obtain gtf data from the UCSC Table Browser for most purposes
- UCSC has a gtf appropriate for RNA-seq tools for selected genomes in their Downloads area
- Or, you can choose a gtf from another source
- FAQ: Help for Differential Expression Analysis - Galaxy Community Hub (help here applies to any reference annotation usage. Format, content, and “matched” inputs are important – or expected problems.

Map the reads again to your reference genome setting the strand correctly based on the results of Infer Experiment. Use this bam result for analysis.

More Help:

UCSC Table Browser options to get a bed with 12 columns

Set the “clade + genome + assembly”
Pick a track from the “group” Gene and Gene Predictions
Set “region” = genome
Set “output format” = bed and check the box to send the output to Galaxy
Submit the query by clicking on the button “get output”
Not done yet! There will be another sub-form presented next to specify more details…
Choose “Create one BED record per” > “Whole Gene” to output a bed with 12 columns
Finish by clicking on the button for “get bed”
If you are logged into a public Galaxy server known by UCSC (usegalaxy.org, usegalaxy.eu, and usegalaxy.org.eu are “known”), the output will be sent to your active History.
If working somewhere else, or you want a downloaded copy, you can use the Table Browser directly (Table Browser) and not check the box to send the output to Galaxy. Once the file is downloaded, use the Upload tool to get it into Galaxy.

Generating and Interpreting the results from `Infer Experiment`

See these “Galaxy Training Network” (GTN) tutorials:

1: RNA-Seq reads to counts

De novo transcriptome reconstruction with RNA-Seq

More More Help:

A search with the keyword “gtf” will find more topics, too.

Converting Fastq > BAM produces a bam dataset without any mapping information. If you actually mapped already with Bowtie (this wasn’t clear to me!), maybe there was some other problem. RNA-seq data might not map well enough with a DNA mapping tool – use an RNA mapping tool like HISAT2 instead to see if that helps.

Try the method above to produce a bam with mapping results for QA/QC purposes, run Infer Experiment on that, interpret the results, then run the analysis mapping with strand settings that match your data.

Hope that helps!

ysrbrs · February 12, 2023, 8:16pm

Hello,
HISAT2 has three options under “specify strand information” with the single-end library:

Unstranded
Forward (F)
Reverse (R)

However, the strandedness and software settings table on this link Reference-based RNA-Seq data analysis
does not have the same parameters listed above.

For example, after I run InferExperiment, the result (±,-+) refers me to “First Strand R/RF” for running HISAT2. So, in that case, am I supposed to select “Reverse (R)”?

jennaj · February 13, 2023, 5:46pm

Yes, from what you explain, “reverse” is probably correct for the HISAT2 choices. You could try it and compare what results to confirm.

‘F’ means a read corresponds to a transcript. ‘R’ means a read corresponds to the reverse complemented counterpart of a transcript. With this option being used, every read alignment will have an XS attribute tag: ‘+’ means a read belongs to a transcript on ‘+’ strand of genome. ‘-’ means a read belongs to a transcript on ‘-’ strand of genome. (–rna-strandness)

Topic		Replies	Views
How can I find that a RNAseq data is stranded or un-stranded by usegalaxy tools? usegalaxy.eu support gtn-tutorial , mapping	3	3410	November 17, 2020
Infer Experiment results - Unstranded or Stranded (Reverse)? usegalaxy.eu support mapping , igv	1	432	August 17, 2023
HISAT2-htseq SETTINGS_paired-end stranded library	1	359	June 24, 2022
HTSEQ and hisat2 settings for stranded library usegalaxy.org.au support	2	773	May 27, 2022
Strandedness of the RNA sequencing library	1	232	October 20, 2023

Check for strandedness

Related topics