Salmon Output File "Name" column gene id format

I am using Salmon to TPM normalize some counts data and was able to get a .tabular output file. When reading the file, I saw that the first column is labeled Name and has a list of genes that start with 5HIN. The first gene is “5HIN8:01339:11853.” I am a little confused on the formatting and is there a table with the conversion of this naming format to gene id? Thanks!

Welcome, @alexis.alburo

The gene values are coming from the reference annotation you are using.

What happens if you run with default settings plus provide a GTF? The gene abundance output should then only have whatever the original values for the “gene_id” attribute in the GTF were.

Tutorials

Hi,

Thank you for your response. I am using the ion torrent server which only gives FASTQ files and used Salmon to TPM normalize. The ion torrent server does not provide a GTF file. Here is what the result looked like:

Ah, Ok, thanks for clarifying.

That implementation is totally different. This forum is for troubleshooting usage issues in Galaxy.

All I have is guesses about the encoding. Gene, then coordinates for some flavour of sub-footprint all combined into a single ID. You already know the gene ID. The rest you’ll need help with to interpret. Which is exactly what you were asking about originally but I hadn’t “gotten” it yet. :slight_smile:

Try contacting the people who run the service you are working on instead. Or, do a search for user docs or possibly Q&A at a general bioinformatics help forum. Or, the service itself probably has documentation somewhere or possibly a vignette with examples.

Happy science!