Salmonquant didnot work using my Reference transcriptome

Hi everyone

I have a problem with salmon quant tool

I am analysis RNAseq data using salmon quant to get TPM value, i used the cDNA file as reference transcriptome for my bacteria (sinorhizobium meliloti strain 1021) but salmon quant does not work(378 & 400 in my shared history), i tried salmon
quant with another cDNA file and worked (356 in my shared history), but did not work with my bacteria cDNA, could you please what is the problem with this file?

i attached the link of bacteria cDNA in ENSEMBL Dataset:

https://ftp.ensemblgenomes.ebi.ac.uk/pub/release-60/bacteria//fasta/bacteria_0_collection/sinorhizobium_meliloti_1021_gca_000006965/cdna/

Can i share my history URL here? Is it enough safe to share publicly?

Welcome @lida-soltanii

Yes, we’ll need to see all of the data in place inside your history to offer specific advice. You can post the share link back here, then unshare once we are done.

This guide includes most of the technical details that we’ll be helping to review.

Some guesses: You mention the reference transcriptome, but not the reference annotation. You will want to use both at the Salmon step if the goal is to run a tool like DESeq2 after. The features in the annotation will have common identifiers with the transcriptome fasta – so be sure to check that is true and simplify the fasta > title lines as needed.

Also, most people do not need to include the reference genome at this stage. But you can share what you have and explain a bit more about your goals as we walk through some suggestions. :slight_smile:

Hi @lida-soltanii

Thanks for sharing your history, this made it so much easier to help with exactly what is going wrong!

This is your message from the tool in the job logs (find these logs using the i-con inside of a dataset).

The tool is stating that it found two or more transcripts with the same sequence identifier. You should extract all the identifiers and count them up to find the duplicates. Then make adjustments. Don’t forget to also update your transcripts-to-genes mapping data too, or you will run into more problems with downstream steps.

Then this recent post has more about Salmon in general.


What to do from here

  1. Double check that you do not have any sequences in your transcriptome that have the same name: the tool thinks that you have at least one duplicate, so at a minimum that needs to be solved.

  2. Consider incorporating reference annotation at the Salmon step. You will need that “transcript-to-gene” mapping file when using DESeq2 later anyway. Both forms have details about what the data is and how it is formatted, and we have prior Q&A about it, but please ask more questions if you get stuck.

  3. You have been manipulating your fasta file already to create the hybrid transcriptome but if that wasn’t in Galaxy, I can let you know that you can do that in Galaxy, too! Converting to a tabular format, making changes, then converting back to fasta format is a pretty common way to do this. Your GTF or tabular transcripts-to-gene data is already tabular.

Hope this helps! Let us know if you get this working, or have more questions :scientist: