Reference genome import from main to local server

galaxy-local
data-manager
#1

Hi,

I’m attempting to run HISAT2 on paired RNAseq data. I have run it successfully previously on the main server using the mm10 built-in reference genome, however, I am now using a local server and the built-in reference genomes have apparently not been included in the set-up. I’m hoping to get some assistance on how to obtain the right reference genome file for mm10 installed, or even better how to update the local server so the same built-in reference genome will be available as in the main server?

Thank you,
Asta

1 Like
htseq-count configuration problem
#2

Hi,

The mm10 reference genome/build is sourced from UCSC.

You’ll need to index the genome on your server with Data Manager tools.

Install the genome with Data Managers (sourced from the ToolShed, all are in a separate category there). Install DMs like any other tool using the Admin functions.

You’ll need these DMs at a minimum. Execute them in this order first:

  • Fasta fetcher – has an option to pick UCSC as the data source.
  • SAM indexer
  • Picard indexer
  • 2bit (twoBit) indexer

Then get the DMs that create indexes for the tools you want to use. Run these after the others have above have completed for the best results.

The above was copied over and slightly updated from this prior Galaxy Biostars Q&A. It has more details (worth reviewing):

https://biostar.usegalaxy.org/p/19371/

A search here with the keyword string “biostars data manager fetch sam picard” will find more prior Q&A that covers many different use cases: https://galaxyproject.org/search/

Genome index or dbkey not accessed by tools on a local Galaxy - Solution: Run tool-specific Data Managers
#3

Hi Jenna,

This was really helpful thank you! I have a couple of questions:
I don’t seem to be able to pick any options with the Fasta fetcher tool? Is it correct that the tool is named data_manager_fetch_genome_dbkeys_all_fasta in the ToolShed? It is the only fasta fetcher tool I can find.
Additionally, when you write ‘Execute’ do you simply mean installing the tools to my local server?

Thank you again,
Asta

#4

Hi Jenna,

I have now succeeded in executing the four listed DMs according to https://github.com/galaxyproject/dagobah-training/blob/2017-montpellier/sessions/05-reference-genomes/ex1-reference-genomes.md
However, I subsequently ran the DM for HISAT2 index and it ran for a few hours, then failed and have me the following message:
“Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:18:13
Allocating rank array
Ranking v-sort output
Fatal error: Exit code 247 ()
Settings:
Output files: “mm10.*.ht2”
Line rat”

I’m unsure of what this error code means, I hope you can clarify?

Asta

1 Like
#5

It looks like the tool is running out of memory. Are you running this in a local Galaxy on a personal computer? It might not have the resources you need.

The mouse genome is pretty large. If indexing (or later mapping) wouldn’t work when using HISAT2 functions line-command (jobs exceed resources – disk space or memory) then they wouldn’t work in Galaxy. See here for common index sizes: https://ccb.jhu.edu/software/hisat2/index.shtml

There are Cloud-based Galaxy options. Check to see if any of the academic clouds are available to you/your institution. AWS also offers grants to cover research projects for students, researchers, etc (a simple online form, usually turns around quickly). Galaxy itself is free – and the cloud version is designed to be easy to administer and has many indexes pre-computed – but you’ll need to connect a resource for the database-data storage and computational work. Many scientists/teachers use a cloud option every day.

For Galaxy platform choices, please see:

1 Like
#6

I have been running galaxy on an university-opened local server - however, I have previously been opening the server from my personal computer, I thought this shouldn’t make a difference since the server itself is an academic server and it doesn’t look like there is a storage limit listed.
I have also succeeded in uploading RNAseq fastqsanger files and run FastQC on these files without any issues?

#7

Does this mean accessing some internal Galaxy server that is hosted by your University? Other people use it, there is an administrator, and you are also an administrator?

If this means just opening a browser window for the same server above, but from a different computer, then you are still using the same Galaxy account/server for work.

If the above is true, contact the admins that are running the technical side of the server. They can check the server logs. It is very likely that more memory needs to be allocated for this job on whatever cluster they attached.

If instead, you are running your own Galaxy (whether on a university server and/or your own computer), please explain the source. Is it from a https://getgalaxy.org GitHub install and is this current with version 19.01? Or, some docker version (which URL did you source it from? there are a few, including training versions). The job is almost certainly running out of resources (most likely memory – and that is different from the amount of disk space you may have available). We can point you to server administration docs/tutorials.

Thanks!

1 Like
#8

Thank you again, it is an internal galaxy server hosted by the university, I have access with personal login details but others have access to the internal server too. I’m an administrator for the server with my login details as well.
I have contacted the admins now and hopefully they can allocate more memroy for the job, thank you again so much!

1 Like
#9

So I managed to import the HISAT2 reference genome and now I’m getting the following error:
Fatal error: Exit code 127 ()
samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
(ERR): hisat2-align died with signal 13 (PIPE)

1 Like
#10

You’ll need to contact you admins again. This error means there is a Samtools dependency problem. They might need to uninstall/reinstall HISAT2 using the “manage dependencies” option. Also ask them to make sure they are running the latest version of Galaxy, including checking for point-updates since the original release (19.01). Installing the most current version of HISAT2 would also be important (2.1.0+galaxy4).

If those admins need help, they can reach the developers at this Gitter chat https://gitter.im/galaxyproject/dev. In some cases, getting dependencies sorted out correctly takes a bit more effort. I see this kind of error reported across many different tools/platforms (from google search) and it seems to be linked to a laundry list of factors: OpenSSL, bioconda, conda, conda-forge, et cetera. The “fix” details are not specific for all but people are addressing it successfully. So, please start by getting Galaxy and the tool updated, see if that resolves the problems, and if not have them report it at Gitter for help. HISAT2 is working correctly on the public servers.

1 Like
#11

Update: I decided to start up the Gitter question just to see if someone already recognizes the problem/knows the solution. They may write back here, or in the chat: https://gitter.im/galaxyproject/dev?at=5cd5b632bdc3b64fcf2389aa

1 Like
#12

Having a recent version of conda is also important, see https://docs.galaxyproject.org/en/latest/admin/conda_faq.html#how-can-i-upgrade-conda

2 Likes