Updating the mouse genome to GRCm39/mm39 on Galaxy

Hi everyone,
I was trying to map my sequences to the latest mouse genome assembly (GRCm39/mm39) but I couldn’t find it on galaxy. How do I make a request for the mouse genome to be updated on galaxy?

In the meantime, I successfully got the fasta file of the GRCm39/mm39 from the NCBI FTP site and tried to map my sequences to it using Bowtie2, but it gave an error with no information about the error.
Please help, I’m new to galaxy :sob:
Thank you.

1 Like

Hi @Sedem_Dankwa,
we are working in order to include this version in Galaxy. Regarding the error, could you send an error report (bug icon)?

Regards

2 Likes

Hi @Sedem_Dankwa

I checked your bug report sent in from UseGalaxy.org.

A very large custom genome will likely exceed resources at any public Galaxy server. That said, UseGalaxy.eu is sometimes able to scale even large jobs – so maybe try there.

You’ll need to correct a few items when you run a test there:

  1. The full genome fasta should not be extracted from the UCSC Table Browser. The result will be truncated. Instead, capture the URL for the genome’s fasta from their Downloads area and paste that into the Upload tool in Galaxy to get it into your history as a dataset. Allow Galaxy to choose the “datatype” and do not assign a “database”.
    • UCSC Genome Browser Downloads >> Index of /goldenPath/mm39/bigZips >> you’ll probably want to choose this version: mm39.fa.gz
    • If the fasta loads compressed (fastq.gz), uncompress it before using it as a custom genome (pencil icon > Edit attributes > Convert > uncompress). That will create an uncompressed version as a new dataset. Then you can purge the dataset representing the compressed version to recover the quota space.
  2. The associated reference annotation should also not be sourced from the UCSC Table Browser. The gene_id and transcript_id attributes in the GTF will be the same value, which can lead to scientific content problems later on. The data might also output as truncated due to the size.
    • UCSC creates correctly annotated GTF’s for priority genomes and places them in their Downloads area.
    • Find the version based on RefSeq genes/transcripts here: Index of /goldenPath/mm39/bigZips/genes >> refGene.gtf.gz
    • Again, load the data by URL into Galaxy using default settings then uncompress as needed.
  3. It looks as if the forward and reverse reads were entered on the Bowtie2 tool form in the opposite order. Instead, enter the reads as: R1 = forward and R2 = reverse.

Hope that helps!

1 Like

Thank you so much for taking the time to provide this detailed guide! I really appreciate it.@jennaj

1 Like

Thanks! @gallardoalba

1 Like