Hi all,
I’m running into an unusual problem when analyzing mouse RNA data on Galaxy. After I upload the R1 and R2 files for my samples, I first analyze for counts using Salmon against the transcripts reference file downloaded from Gencode. As part of the input command, I upload the GTF file (also from Gencode) to map the Encode IDs to the gene name. I then run the analysis and when it is done, the output file still has the Encode IDs and no gene name linked to it. I observe a similar outcome when I analyze the data with DESeq2 and look for fold changes.
My question is if anyone has run into a similar problem and if they could suggest a workaround to this? I’d be so grateful for all your help with this.
Cheers,
Lucas
1 Like
hi @lucasdsouza Welcome to GalaxyHelp!!
If the tool you are talking about Salmon quant and you used a GTF file for the parameter “File containing a mapping of transcripts to genes”, then it is not used to map the IDs to gene names. With that parameter, you will get an additional file named quant.genes.sf that summarizes the abundances at gene level in addition to the transcript abundances.
If you want to know the names of the differentially expressd genes, please use the tool Annotate DESeq2/DEXSeq output tables after DESeq2 run. This will annotate the DESeq output with gene names, their locations etc.
Please checkout this section of the RNA-seq tutorial: Hands-on: Reference-based RNA-Seq data analysis / Reference-based RNA-Seq data analysis / Transcriptomics
cheers,
Pavan
1 Like
Hi Pavan,
Thank you so much for the advice! I’ll give this a run now
Cheers!
Lucas