Hi all,
I am trying to update my RNA-seq training materials to use the salmon tool for quantification rather than mapping and then counting.
My example data are Mouse, so I have downloaded the latest reference genomes from Ensembl and uploaded these to Galaxy:-
http://ftp.ensembl.org/pub/release-104/fasta/mus_musculus/cdna/Mus_musculus.GRCm39.cdna.all.fa.gz
http://ftp.ensembl.org/pub/release-104/gtf/mus_musculus/Mus_musculus.GRCm39.104.chr.gtf.gz
salmon works if I specify the fasta file from ensembl as my reference fasta. There is one row of output for each transcript
Name | Length | EffectiveLength | TPM | NumReads |
---|---|---|---|---|
ENSMUST00000178537.2 | 12 | 2.002 | 0.000000 | 0.000 |
ENSMUST00000178862.2 | 14 | 2.071 | 0.000000 | 0.000 |
ENSMUST00000196221.2 | 9 | 1.818 | 0.000000 |
However, I would like to demonstrate how to obtain gene-level estimates by specifying a transcript to gene mapping. The tool suggests that a gtf file can be used, but when I use the gtf from Ensembl I still get the same number of transcripts in my “gene quantification” (suggesting that it cannot map the transcripts).
I have also created a tab-delimited from Biomart
Transcript stable ID version | Gene stable ID |
---|---|
ENSMUST00000082387.1 | ENSMUSG00000064336 |
ENSMUST00000082388.1 | ENSMUSG00000064337 |
ENSMUST00000082389.1 | ENSMUSG00000064338 |
By my salmon gene output still has all the transcripts.
Can anyone suggest where I am going wrong, or point me to some example fasta and transcript mapping files that work as expected?
Many thanks,
Mark