The mapping file needs to be just two columns: transcriptID
(tab) geneID
- What you decide to use as the
transcriptID
needs to exactly match the content of the transcriptome’s fasta title line (everything on the > lines, no extra content or spaces). - The
geneID
column must have content. The Salmon count input will be organized/grouped by those IDs. - The other help in this post might help, too. An fatal error with DESeq2. I didn’t see this prior conversation before that original reply, but I think it covers what you’ll need to address.
Question: should a transcriptID ever be used as a geneID?
Answer: Probably not. If multiple transcripts are actually part of the same gene/locus, the reads will map to all of those transcripts you could end up with a scientific problem when counting (those “multi-mapped” reads excluded). You would also have problems mapping the entire “geneIDs” column of data to other annotation since some wouldn’t actually be geneIDs.