I am using Salmon to TPM normalize some counts data and was able to get a .tabular output file. When reading the file, I saw that the first column is labeled Name and has a list of genes that start with 5HIN. The first gene is “5HIN8:01339:11853.” I am a little confused on the formatting and is there a table with the conversion of this naming format to gene id? Thanks!
Welcome, @alexis.alburo
The gene values are coming from the reference annotation you are using.
What happens if you run with default settings plus provide a GTF? The gene abundance output should then only have whatever the original values for the “gene_id” attribute in the GTF were.
Tutorials
Hi,
Thank you for your response. I am using the ion torrent server which only gives FASTQ files and used Salmon to TPM normalize. The ion torrent server does not provide a GTF file. Here is what the result looked like:
Ah, Ok, thanks for clarifying.
That implementation is totally different. This forum is for troubleshooting usage issues in Galaxy.
All I have is guesses about the encoding. Gene, then coordinates for some flavour of sub-footprint all combined into a single ID. You already know the gene ID. The rest you’ll need help with to interpret. Which is exactly what you were asking about originally but I hadn’t “gotten” it yet.
Try contacting the people who run the service you are working on instead. Or, do a search for user docs or possibly Q&A at a general bioinformatics help forum. Or, the service itself probably has documentation somewhere or possibly a vignette with examples.
Happy science!