Custom genome help and troubleshooting plus where to find HISAT2 alignment statistics

I have put the download papAnu.fa to history as my reference genome, but why it does not work when I do hisat2 for alignment with other fastq files

1 Like

Welcome @zhwdong9

Are you sourcing the papAnu2 or papAnu4 genome from UCSC?

http://hgdownload.soe.ucsc.edu/downloads.html#baboon

Make certain that you get the fasta version of the genome. For papAnu4, this would be: http://hgdownload.soe.ucsc.edu/goldenPath/papAnu4/bigZips/ >> papAnu4.fa.gz (soft-masked version, but the hard-masked is also available).

This particular data does not need to be reformatted through NormalizeFasta as in the first FAQ listed below states (will already be formatted correctly from this source). But you should load it in the Upload tool without setting a “datatype”. Allow Galaxy to uncompress the data and assign the final datatype (will be “fasta”).

FAQs: https://galaxyproject.org/support

Also, click on the tags I added for more help/prior Q&A for various related issues end users have had when first learning how to work with Custom Genomes/Builds. Some of these may inform you about how to resolve your problem.

However, if none of that helps, please share more information:

  1. Try at least one rerun to eliminate transient server/cluster issues. Let us know if the job fails again (and how/why) – and if you confirmed that the genome fasta is intact (fully loaded) and has the correct datatype assigned.
  2. Where are you working? Public Galaxy (URL)? Your own Galaxy?
  3. How does the job fail?
    • Review the contents of the “bug” icon report (it does not need to be submitted, but if you do, include a link to this post for context).
    • Click into the “i” icon for job details and review the stderr and stdout reports. These often explain exactly what the problem is.

For reference, many Galaxy Tutorials from the GTN include the tool HISAT2. Start with those in the group “Transcriptomics”:

Thanks!

Thank you, I will try it again

I did it like jennaj ever mentioned, but why the result did not show how many reads and the alignment rate. If I used the build-in genome (papAnu2) then ran hisat 2, I can get all the information.

1 Like

Hi @zhwdong9

The statistics are there, for every HISAT2 run by default. But if for some reason there were many “warnings” or processing details, the statistic report will be after those lines, and you won’t see it in the expanded dataset view. Only the first few lines of stderr are captured here, as a “peek” view.

Technically, the underlying HISAT2 tool reports the alignment statistics in a part of the job output named stderr. That’s just how this particular tool works, and it doesn’t necessarily indicate that there was an actual error/problem even though the name of that output includes “err” (abbreviation for “error”) in it. All tools have a stdout and stderr output by default, although not all tool authors report meaningful data in those places. The Galaxy wrapper around a tool can capture and (sometimes) reformat or add more information to the default stdout and/or stderr to provide more details. And some tool wrappers, like this one, have an option to report the interesting parts of those metadata as a distinct dataset in your history (just the statistics).

How to review the full stderr message (any tool).

For HISAT2, this is where the alignment statistics are captured (at the end, any content above it is worth reviewing but doesn’t necessarily mean that anything went wrong. In my example, the warnings are simply informing that there were some very short reads in my fastqsanger inputs. Too short for the tool to map. A few of these kinds of warnings not unusual – but if my reads didn’t map well overall, could explain one reason why (perhaps QA was too aggressive or the data was not of high quality originally).

A job’s stdout can also be reviewed this way. Plus, this is a method to review the tool parameters originally set/submitted in a simplified format (scroll further into the report than what is shown in the graphics below to find this information). This is a very useful report to learn how to review, and not only when something goes wrong (error or unexpected results).

  1. Expand the result bam dataset and click on the “View details” icon. This example happens to have too many warnings to show the statistics, and is probably similar to one of your jobs.

  1. The “Job details” report will display in the center pane. Locate and click on the link to the stderr output.

  1. On the stdout report, scroll down past the warnings to find the statistics at the end. In this example, all of my warnings except for the last one before the actual statistics were not captured in the screenshot, on purpose, to keep the graphic smaller. The last line is from the wrapper and is just a processing detail that can be ignored.

How to capture only the HISAT2 alignment statistics from the full stderr message into a distinct dataset report into the history.

On the HISAT2 tool form, expand the section “Summary Options” to access the parameter to “Print alignment summary to a file.” It happens to be set to “No” by default. In the graphic, I changed the option to be “Yes”, and the extra output dataset was created during job execution. It contains just the alignment statistics – in a cleaned-up format (no “warnings” or technical run details). The statistics content is exactly the same as those found at the end of the stderr output above.

Hope that helps!

hi, Jenn: I am very appreciate! based on your second suggestion, the problem is really resolved! Truly thanks

1 Like