How to analyze transcriptomic data with 2 reference genomes?


I am currently examining my RNA-seq data using Galaxy.

My samples consist of virus-infected human cells, resulting in a mixture of two transcriptomes (viral and host).
While I can analyze them separately, my goal is to generate comprehensive data.

Is it possible to use multiple reference genomes for RNA-seq in Galaxy?

Thank you for your assistance.

Hi @Dongjoon

Yes, you could combine the two reference genomes into one, then map against that as a custom genome.

Some resources and tips:

  1. Clean up both genomes → FAQ: How to use Custom Reference Genomes?
  2. Double check that the identifiers on the fasta title lines > are all distinct across both files.
  3. Use this tool to stack one on top of the other in a single output file → Concatenate datasets (there are two versions, either will work for this)
  4. If you have reference annotation, you could do the same. Remove any headers, then concatenate.
  5. If you have identifier duplicates, try to solve that before the concatenate steps, in both file types. These tools are matching up the “chromosome” names. General manipulation tools are covered here (the last one uses regular tools from the tool panel) → GTN Materials Search
  6. Sanity check your results for cross-over mapping! Maybe check how others have done this in publications, etc.

Hope that helps!