Check for strandedness

Hi all,
I was trying to run a pipeline these days. Unfortunately, I don’t know the strandedness of the sample (fastq files) I have. Is there any way to check this? I tried already to convert my fastqs to BAMs with bowtie and then use Infer Experiment to check for that. I was wondering if there is any other way.

Thank you,
Athanasios

1 Like

Hi, I am new to this area too! I have been reading about the same thing. I did an RNA seq experiment using the illumina TruSeq stranded mRNA kit and as part of this protocol strand specificity is achieved by replacing dTTP with dUTP in the SMM (Second Strand Marking Mix), followed by second strand cDNA synthesis using DNA Polymerase I and RNase H. The incorporation of dUTP in second strand synthesis quenches the second strand during amplification. I think that means that only the forward strand is taken forward so the library and resulting data is forward stranded. Hope this helps or that I can be corrected!
Alysha

1 Like

Hi @doctorasgr and @Alysha

To check actual strandedness, as a QA/QC step for data with unknown protocols, or your own data with a known (to confirm), try this:

  1. Map the reads to the reference genome as unstranded
  2. Run the tool Infer Experiment on the resulting bam
  • Note this will require a bed dataset with 12 columns

  • Depending on how your obtained/created the bed data, it may have the datatype bed or bed12 assigned. Either is Ok, but check to make sure the data really has 12 columns

  • UCSC’s Table Brower is one source for complete bed data (tool: Get Data > UCSC main).

  • If your genome is not supported by UCSC, or you want to base the gene model on some other data provider’s genome annotation, you can convert any gtf dataset to a bed12 dataset (tool: Convert GTF to BED12)

    • Do not obtain gtf data from the UCSC Table Browser for most purposes
    • UCSC has a gtf appropriate for RNA-seq tools for selected genomes in their Downloads area
    • Or, you can choose a gtf from another source
    • FAQ: https://galaxyproject.org/support/diff-expression/ (help here applies to any reference annotation usage. Format, content, and “matched” inputs are important – or expected problems.
  1. Map the reads again to your reference genome setting the strand correctly based on the results of Infer Experiment. Use this bam result for analysis.

More Help:

UCSC Table Browser options to get a bed with 12 columns
  • Set the “clade + genome + assembly”
  • Pick a track from the “group” Gene and Gene Predictions
  • Set “region” = genome
  • Set “output format” = bed and check the box to send the output to Galaxy
  • Submit the query by clicking on the button “get output”
  • Not done yet! There will be another sub-form presented next to specify more details…
  • Choose “Create one BED record per” > “Whole Gene” to output a bed with 12 columns
  • Finish by clicking on the button for “get bed”
  • If you are logged into a public Galaxy server known by UCSC (usegalaxy.org, usegalaxy.eu, and usegalaxy.org.eu are “known”), the output will be sent to your active History.
  • If working somewhere else, or you want a downloaded copy, you can use the Table Browser directly (https://genome.ucsc.edu/cgi-bin/hgTables) and not check the box to send the output to Galaxy. Once the file is downloaded, use the Upload tool to get it into Galaxy.
Generating and Interpreting the results from `Infer Experiment`

See these “Galaxy Training Network” (GTN) tutorials:

https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/rna-seq-reads-to-counts/tutorial.html#qc-strandness

https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/de-novo/tutorial.html#mapping

More More Help:

A search with the keyword “gtf” will find more topics, too.


Converting Fastq > BAM produces a bam dataset without any mapping information. If you actually mapped already with Bowtie (this wasn’t clear to me!), maybe there was some other problem. RNA-seq data might not map well enough with a DNA mapping tool – use an RNA mapping tool like HISAT2 instead to see if that helps.

Try the method above to produce a bam with mapping results for QA/QC purposes, run Infer Experiment on that, interpret the results, then run the analysis mapping with strand settings that match your data.


Hope that helps!