How to upload locally built reference genome to galaxy cloud server

Hi Everyone,
I just figured out how to build reference genome using my own fasta file using my local galaxy. After indexing the fasta sequence using a set of data manager indexers, anyone know how to upload the build reference genome to my public galaxy user account?

The reason I am doing this is because a lot plant reference genomes such barley, wheat are not available in the public galaxy server (https://usegalaxy.org/). And also I don’t have admin right required to build my own reference genome in the public galaxy account. So I installed galaxy in my local pc and build the reference genome. However, I still want to use the public server for the RNAseq read alignment and downstream analyses, because my pc is low in both RAM and CPU. Thanks in advance :wink:

1 Like

Hi @YONG_JIA

The only way to use reference genomes that are not already indexed at Galaxy Main https://usegalaxy.org (or any other public Galaxy servers) is to use a custom reference genome from the history. Please be aware that for some cases, the custom genome will be too large to use successfully.

If your PC does not have enough resources, and the custom genome functions fail for memory reasons, then you’ll need to consider other alternatives, such as Cloudman/AWS. AWS does offer “Grant in Education” that can help to cover the commercial costs for storage/computational work.

FAQs: https://galaxyproject.org/support/

Galaxy Choices:

Related Q&A: Reference genome import from main to local server

1 Like

Hi @jennaj
Thanks a lot for listing out all the choices :slight_smile: I have made applications to both Nectar Cloud in Australia and Google Cloud Platform for more computation power and space. I will see how it goes. I may try your suggested options as well later.

When using RNA STAR to align reads to my own barley fasta file, I came across the following error:


STAR outputs three files: xxx.log, xxx.bed, xxx.bam. While the run is complete and successful, just with the “An error occurred setting the metadata for this dataset”. By clicking the “Auto-detect” button in the Attributes tab, I could fix the error for the xxx.log and xxx.bed output files, but it does not work for the xxx.bam file. Then I attempted to view the bam file in local IGV, then it states this:
Capture%202

Do you know how to fix this? Thanks a lot. :slight_smile:

1 Like

BTW, I was using the fasta file from history as the reference genome (without indexing). Haven’t been able to upload the indexed genome file yet :wink:

1 Like

Try clicking on the “eye” icon for the BAM dataset, does the dataset contain alignment lines and not just headers? Review the logs and bed dataset as well, it may give some clues about what went wrong.

I suspect the job failed for memory reasons (STAR is very memory intensive), otherwise redetecting the metadata for the BAM would have been successful. Indexed genomes will use fewer resources, but even when indexed, the entire genome is held in memory. This prior QA just from today explains the resources the tool needs: RNA-STAR, hg38 GTF reference annotation, Cloudman/AWS options plus local Galaxy "Cloud Bursting" for memory intensive mapping - #7 by jennaj.

This post also covers a new function for “Cloud Bursting” – meaning, attaching cloud resources to a local Galaxy for larger jobs, on demand. Is just another choice to consider, but requires more administrative work to set up and space on your local to store data would be needed. Using a Cloud Galaxy will be simpler to configure and offer more space and memory – so hope one of those works out!

The “name” issue with IGV was likely due to the existing metadata problems with the failed BAM. Once you have a successful result, any datasets you want to view in IGV need to have a “database” assignment (see the Custom Genome FAQ, specifically the “Custom Build” option). This allows you to assign a custom “database” to data. To have it display in IGV with the underlying genome sequence, then you’ll need to install your custom genome into a local IGV as well, and name it the same.

I don’t think Barley is available as a pre-indexed genome in IGV, but you could check the list, and if there, confirm it is the same build/version as the version you are using in Galaxy for mapping. If the same, then create your Custom Build in Galaxy to have the same “database” name, aka “dbkey”, as IGV uses for the data to “match up”.

Hi @jennaj Thanks for the reply.
I had a look at the bam file, it looks good to me. The issue is that, when I manually set the database name to “Custom build” which I created by following the Custom Genome FAQ, and save, the error is gone. However, clicking the “Auto detect” would bring the error back. I guess this is an problem with custom genome?

The bam file size shows it is ~ 1.1Gb, while the barley fasta.gz file size ~1.2 Gb, which may indicate the alignment is actually complete?

1 Like

Ok, that may be a bug. I’ve marked this for follow up next week. Thanks for reporting the problem!! More feedback once I done some testing.

Apologies for delayed reply. For some reason I did not get a direct notification about this… but looks important and have it tracked now.