Hi everyone,
I was trying to map my sequences to the latest mouse genome assembly (GRCm39/mm39) but I couldn’t find it on galaxy. How do I make a request for the mouse genome to be updated on galaxy?
In the meantime, I successfully got the fasta file of the GRCm39/mm39 from the NCBI FTP site and tried to map my sequences to it using Bowtie2, but it gave an error with no information about the error.
Please help, I’m new to galaxy
Thank you.
I checked your bug report sent in from UseGalaxy.org.
A very large custom genome will likely exceed resources at any public Galaxy server. That said, UseGalaxy.eu is sometimes able to scale even large jobs – so maybe try there.
You’ll need to correct a few items when you run a test there:
The full genome fasta should not be extracted from the UCSC Table Browser. The result will be truncated. Instead, capture the URL for the genome’s fasta from their Downloads area and paste that into the Upload tool in Galaxy to get it into your history as a dataset. Allow Galaxy to choose the “datatype” and do not assign a “database”.
If the fasta loads compressed (fastq.gz), uncompress it before using it as a custom genome (pencil icon > Edit attributes > Convert > uncompress). That will create an uncompressed version as a new dataset. Then you can purge the dataset representing the compressed version to recover the quota space.
The associated reference annotation should also not be sourced from the UCSC Table Browser. The gene_id and transcript_id attributes in the GTF will be the same value, which can lead to scientific content problems later on. The data might also output as truncated due to the size.
UCSC creates correctly annotated GTF’s for priority genomes and places them in their Downloads area.
Again, load the data by URL into Galaxy using default settings then uncompress as needed.
It looks as if the forward and reverse reads were entered on the Bowtie2 tool form in the opposite order. Instead, enter the reads as: R1 = forward and R2 = reverse.