HISAT2 and job stages, resources, PLAZA genomes, avoiding deprecated tools

Ngmmahi_Singh · May 19, 2020, 9:53am

Hisat 2 is running from two days still not completeed, I have used zea mays reference genome in fasta format and rna aeq file in fastqsanger, the rna seq file size in 10 gb and my RAM size is 4 GB, Please help me how to complete this task?

jennaj · May 19, 2020, 6:26pm

Hello @Ngmmahi_Singh

I moved your question to a distinct topic. We also got your bug report.

There were some recent server issues, but your issues are likely the custom genome fasta format from the iGenomes source. I understand that you need avoid the local download, uncompress, and upload of the fasta and annotation.

What I replied directly may help others. PLAZA hosts curated genome data (assembly + annotation) that can be imported to public Galaxy servers or installed into local/cloud Galaxy servers.

Hi,

If you are willing to use a different source for the genome and annotation (same base assembly), PLAZA hosts this genome fasta data in their download area in a simple “gz” compressed format. That will load up to Galaxy in full directly - server to server by URL Upload.

The matching annotation is also available, which can help to avoid technical data issues (example: mismatched chromosome identifiers between various genome release “builds” sourced from different data providers).

Please see: https://bioinformatics.psb.ugent.be/plaza/versions/plaza_v4_5_monocots/download

HISAT2 and other computationally expensive tools will queue longer right now. Execution will also take time since the custom genome is indexed as part of the job. I’m not sure if this particular genome is too large to use as a custom genome or not at the public Galaxy servers – but you can certainly try. Run at least twice to eliminate cluster issues as being a factor (example: by chance, two large jobs can run on the same cluster node and fail for resource reasons, but a rerun works). If the job fails for runtime or memory problems, twice , then it is actually too large (or there is an input or setting problem).

Tophat/Tophat2 are both deprecated tools, unlikely to work, and even putatively successful “green” results are produced – those may be still problematic content-wise. Avoid all deprecated tools if at all possible.

FAQs: https://galaxyproject.org/support/

Thanks!