Hello, I am completely new to bioinformatics but was hoping to use Salmon in Galaxy to analyze gene-level expression in a time-series RNA-seq dataset. Based on the parameters I set and files I have included (transcriptome FASTA and annotation GTF) I am able to get quantification of both transcripts and genes. I was wondering if there is a way to obtain information on the mapping rate? I know that alignment methods such as HISAT2 or STAR produce alignment rates to assess how well the read mapped to the genome and subsequent assignment rates when counting with a method such as featureCounts. Is there a similar output for mapping rate to assess how well my pair-end reads were assigned to a transcript/gene? Thank you in advance!
Setting this option on the tool form to yes will output the mapping results: Write Mappings to Bam File
Then summarize the mappings with a tool in the SAM/BAM tool group like:
Samtools stats generate statistics for BAM dataset
Hi @jennaj, thank you so much for the quick reply, this is very helpful! Looking at the samtools output I see that reads mapped: 14647458 and reads unmapped: 14635722. Does this suggest that my mapping rate is ~50%? Additionally the value for reads mapped and paired is 11736. Is it correct that this number refers to the paired-end reads that mapped and this value is included in the total reads mapped? Given these results is it safe to assume that the counts Salmon is generating (quantification and gene quantification results) are based on everything that mapped (and includes reads that mapped either on their own or as a part of a paired-end fragment)? Apologies for if these are very basic questions. I have included a part of the samtools output below for reference. Thank you again for the very helpful input!
SN raw total sequences: 29283180
SN filtered sequences: 0
SN sequences: 29283180
SN is sorted: 1
SN 1st fragments: 14641590
SN last fragments: 14641590
SN reads mapped: 14647458
SN reads mapped and paired: 11736 # paired-end technology bit set + both mates mapped
SN reads unmapped: 14635722
SN reads properly paired: 11736 # proper-pair bit set
SN reads paired: 29283180 # paired-end technology bit set
SN reads duplicated: 0 # PCR or optical duplicate bit set
SN reads MQ0: 0 # mapped and MQ=0
SN reads QC failed: 0
SN non-primary alignments: 196568514
SN total length: 1479526038 # ignores clipping
SN total first fragment length: 739701308 # ignores clipping
SN total last fragment length: 739824730 # ignores clipping
SN bases mapped: 740503012 # ignores clipping
SN bases mapped (cigar): 740502931 # more accurate
SN bases trimmed: 0
SN bases duplicated: 0
SN mismatches: 0 # from NM fields
SN error rate: 0.000000e+00 # mismatches / bases mapped (cigar)
SN average length: 50
SN average first fragment length: 51
SN average last fragment length: 51
SN maximum length: 51
SN maximum first fragment length: 51
SN maximum last fragment length: 51
SN average quality: 255.0
SN insert size average: 411.8
SN insert size standard deviation: 249.1
SN inward oriented pairs: 5859
SN outward oriented pairs: 9
SN pairs with other orientation: 0
SN pairs on different chromosomes: 0
SN percentage of properly paired reads (%): 0.0
Yes, this is all true as far as I know when using the default settings (mostly – coverage of a read against a transcript must be at least >= 31 bases to count).
To make the counting even more permissive, you could toggle either or both of these parameters to yes: “Allow Dovetail” or “Recover Orphans”.
To make what is counted up by
Salmon stricter (less permissive), toggle one or more of the other parameters on the form to yes.
Glad the prior info helped, and hope this does too
This absolutely helps! Thank you for the clarification and the wonderful input. I only wish I had asked sooner!