Check for strandedness

Hi @doctorasgr and @Alysha

To check actual strandedness, as a QA/QC step for data with unknown protocols, or your own data with a known (to confirm), try this:

  1. Map the reads to the reference genome as unstranded
  2. Run the tool Infer Experiment on the resulting bam
  • Note this will require a bed dataset with 12 columns

  • Depending on how your obtained/created the bed data, it may have the datatype bed or bed12 assigned. Either is Ok, but check to make sure the data really has 12 columns

  • UCSC’s Table Brower is one source for complete bed data (tool: Get Data > UCSC main).

  • If your genome is not supported by UCSC, or you want to base the gene model on some other data provider’s genome annotation, you can convert any gtf dataset to a bed12 dataset (tool: Convert GTF to BED12)

    • Do not obtain gtf data from the UCSC Table Browser for most purposes
    • UCSC has a gtf appropriate for RNA-seq tools for selected genomes in their Downloads area
    • Or, you can choose a gtf from another source
    • FAQ: Help for Differential Expression Analysis - Galaxy Community Hub (help here applies to any reference annotation usage. Format, content, and “matched” inputs are important – or expected problems.
  1. Map the reads again to your reference genome setting the strand correctly based on the results of Infer Experiment. Use this bam result for analysis.

More Help:

UCSC Table Browser options to get a bed with 12 columns
  • Set the “clade + genome + assembly”
  • Pick a track from the “group” Gene and Gene Predictions
  • Set “region” = genome
  • Set “output format” = bed and check the box to send the output to Galaxy
  • Submit the query by clicking on the button “get output”
  • Not done yet! There will be another sub-form presented next to specify more details…
  • Choose “Create one BED record per” > “Whole Gene” to output a bed with 12 columns
  • Finish by clicking on the button for “get bed”
  • If you are logged into a public Galaxy server known by UCSC (usegalaxy.org, usegalaxy.eu, and usegalaxy.org.eu are “known”), the output will be sent to your active History.
  • If working somewhere else, or you want a downloaded copy, you can use the Table Browser directly (Table Browser) and not check the box to send the output to Galaxy. Once the file is downloaded, use the Upload tool to get it into Galaxy.
Generating and Interpreting the results from `Infer Experiment`

See these “Galaxy Training Network” (GTN) tutorials:

1: RNA-Seq reads to counts

De novo transcriptome reconstruction with RNA-Seq

More More Help:

A search with the keyword “gtf” will find more topics, too.


Converting Fastq > BAM produces a bam dataset without any mapping information. If you actually mapped already with Bowtie (this wasn’t clear to me!), maybe there was some other problem. RNA-seq data might not map well enough with a DNA mapping tool – use an RNA mapping tool like HISAT2 instead to see if that helps.

Try the method above to produce a bam with mapping results for QA/QC purposes, run Infer Experiment on that, interpret the results, then run the analysis mapping with strand settings that match your data.


Hope that helps!

1 Like