Updating the mouse genome to GRCm39/mm39 on Galaxy

Sedem_Dankwa · February 10, 2021, 8:48pm

Hi everyone,
I was trying to map my sequences to the latest mouse genome assembly (GRCm39/mm39) but I couldn’t find it on galaxy. How do I make a request for the mouse genome to be updated on galaxy?

In the meantime, I successfully got the fasta file of the GRCm39/mm39 from the NCBI FTP site and tried to map my sequences to it using Bowtie2, but it gave an error with no information about the error.
Please help, I’m new to galaxy
Thank you.

gallardoalba · February 11, 2021, 2:43pm

Hi @Sedem_Dankwa,
we are working in order to include this version in Galaxy. Regarding the error, could you send an error report (bug icon)?

Regards

jennaj · February 11, 2021, 10:14pm

Hi @Sedem_Dankwa

I checked your bug report sent in from UseGalaxy.org.

A very large custom genome will likely exceed resources at any public Galaxy server. That said, UseGalaxy.eu is sometimes able to scale even large jobs – so maybe try there.

You’ll need to correct a few items when you run a test there:

The full genome fasta should not be extracted from the UCSC Table Browser. The result will be truncated. Instead, capture the URL for the genome’s fasta from their Downloads area and paste that into the Upload tool in Galaxy to get it into your history as a dataset. Allow Galaxy to choose the “datatype” and do not assign a “database”.
- UCSC Genome Browser Downloads >> Index of /goldenPath/mm39/bigZips >> you’ll probably want to choose this version: mm39.fa.gz
- If the fasta loads compressed (fastq.gz), uncompress it before using it as a custom genome (pencil icon > Edit attributes > Convert > uncompress). That will create an uncompressed version as a new dataset. Then you can purge the dataset representing the compressed version to recover the quota space.
The associated reference annotation should also not be sourced from the UCSC Table Browser. The gene_id and transcript_id attributes in the GTF will be the same value, which can lead to scientific content problems later on. The data might also output as truncated due to the size.
- UCSC creates correctly annotated GTF’s for priority genomes and places them in their Downloads area.
- Find the version based on RefSeq genes/transcripts here: Index of /goldenPath/mm39/bigZips/genes >> refGene.gtf.gz
- Again, load the data by URL into Galaxy using default settings then uncompress as needed.
It looks as if the forward and reverse reads were entered on the Bowtie2 tool form in the opposite order. Instead, enter the reads as: R1 = forward and R2 = reverse.

Hope that helps!

Sedem_Dankwa · February 11, 2021, 10:33pm

Thank you so much for taking the time to provide this detailed guide! I really appreciate it.@jennaj

Sedem_Dankwa · February 11, 2021, 10:49pm

Thanks! @gallardoalba

bjoern.gruening · May 20, 2021, 9:45pm

usegalaxy.eu has mm39 now included. I hope that helps!

Topic		Replies	Views
Upload GRCm39 mouse genome usegalaxy.org support reference-index	1	320	June 23, 2023
mm10/GRCM38 genome usegalaxy.eu support reference-index , reference-genome	1	27	August 8, 2024
usegalaxy.eu cufflinks no cached reference data usegalaxy.eu support reference-index	6	23	January 24, 2025
Issue regarding Upoading of the Genome in fasta file from NCBI usegalaxy.org support gtn-tutorial , galaxy-local	2	512	March 9, 2021
Obtain annotation file for mouse genome for ATAC-seq usegalaxy.org support reference-annotation	1	88	June 6, 2024

Updating the mouse genome to GRCm39/mm39 on Galaxy

Related topics