Indexing reference genomes with Data Managers: Resources, tutorials, troubleshooting

YONG_JIA · May 31, 2019, 3:03pm

Hi Jennaj,
I think I figured out where the issue is, these indexers require some dependency which are missing in my galaxy server. Most common one is samtools. I searched online and found no clear guide on how to install samtools. Could you help me here? Thanks

YONG_JIA · May 31, 2019, 4:26pm

Hi Jennaj,
Crazy, I have figured out how to install the missing samtools by going through the galaxy document: conda for tool dependencies, where it states if you set “conda_auto_install” to “true”, galaxy will look for and install Conda packages for missing tool dependecies before running a job.

I did exactly what it says and run the indexer again, then it all works like a magic Thanks a lot for your nice guide.

jennaj · May 31, 2019, 5:09pm

@YONG_JIA Super, glad you found our docs ( https://docs.galaxyproject.org ) and the help solved your problem

YONG_JIA · June 2, 2019, 1:56pm

Hi Jannaj,
Sorry to bother you again, regarding the reference genome preparation, you have been suggesting doing the four basic indexing first and then tool-specific indexing, what is the reason for doing this? Can I just go straight to the tool-specific indexing such as the RNA-star, HISAT indexers?

I wonder whether you would be able to help with another problem. I got an error " java.lang.OutOfMemoryError" during the Picard indexing. I found the exact same issue was posted earlier here by someone else in github:

There was an answer posted but I couldn’t really understand how to do it. I have chased the question two days ago but haven’t got any response yet. Thanks a lot

jennaj · June 3, 2019, 5:38pm

Hi @YONG_JIA

If you haven’t created the SAMtools, Picard, and 2bit indexes, problems with tools can come up.

That error you reference is for when using Picard tools line command. For your indexing with a Data Manager in Galaxy, this same error likely means that your local Galaxy does not have enough memory to index the genome. If you are trying to index wheat from most public sources, there will never be enough memory (the genome is simply too large). Using the PLAZA version can help reduce the amount of memory needed due to the way it was reorganized, but it will still be substantial.

Related Q&A

RichardBJ · June 26, 2021, 9:03am

I had this all working nicely on my local install, then when something failed I ran them again and the data managers are now broken. Down in the details I see “never run them twice or there will be chaos and it is tricky to fix” essentially.
OK so having brought this apocalypse upon my server, is there a way to fix it; presumably deleting a bunch of files and starting again? Can you help at all please!
Richard

RichardBJ · June 26, 2021, 10:36pm

Seemed to get around this, buy rummaging around until finding the duplicate lines and deleting them. Index still says failed, but seems to have worked so long as you manually type the location into the file.

jennaj · June 28, 2021, 5:05pm

Yes, duplicate entries will lead to problems. As will missing entries.

Sometimes starting over is the easiest way, but correcting the data directly is also possible (just complicated!).

More details are in these FAQs if you want to attempt a manual fix: Galaxy Administration

Stopping the server, making changes, then restarting seems to always work best. Pay special attention to spaces, tabs, and the like. Using a command-line text editor that reveals whitespace characters is essential, imho.

RichardBJ · July 7, 2021, 2:43pm

Thanks, all sorted now. Again (my third installation of the server)!!
Kind Regards

Topic		Replies	Views
Genome index or dbkey not accessed by tools on a local Galaxy - Solution: Run tool-specific Data Managers usegalaxy.org support server-admin , tool-install , galaxy-local , data-manager	5	960	May 18, 2019
No options available (Select Reference Genome) server-admin , reference-index , galaxy-local , data-manager , transcriptomics , cvmfs , rna_star	3	1488	June 14, 2022
adding reference genome to a local install galaxy-local	3	638	March 9, 2021
Reference Genome in some tools - Fully indexing genomes with Data Managers galaxy-local , data-manager , reference-genome , variant-analysis	3	1312	January 27, 2020
Building an indexed genome file for GATK tools usegalaxy.org support data-manager , gatk4 , server-open-issue , vcf	13	2357	October 27, 2021

Indexing reference genomes with Data Managers: Resources, tutorials, troubleshooting

Related topics