RNAseq data alignment and counting using Salmon

The mapping file needs to be just two columns: transcriptID (tab) geneID

  • What you decide to use as the transcriptID needs to exactly match the content of the transcriptome’s fasta title line (everything on the > lines, no extra content or spaces).
  • The geneID column must have content. The Salmon count input will be organized/grouped by those IDs.
  • The other help in this post might help, too. An fatal error with DESeq2. I didn’t see this prior conversation before that original reply, but I think it covers what you’ll need to address.

Question: should a transcriptID ever be used as a geneID?

Answer: Probably not. If multiple transcripts are actually part of the same gene/locus, the reads will map to all of those transcripts you could end up with a scientific problem when counting (those “multi-mapped” reads excluded). You would also have problems mapping the entire “geneIDs” column of data to other annotation since some wouldn’t actually be geneIDs.

1 Like