I have put the download papAnu.fa to history as my reference genome, but why it does not work when I do hisat2 for alignment with other fastq files
Are you sourcing the
papAnu4 genome from UCSC?
Make certain that you get the fasta version of the genome. For
papAnu4, this would be: http://hgdownload.soe.ucsc.edu/goldenPath/papAnu4/bigZips/ >> papAnu4.fa.gz (soft-masked version, but the hard-masked is also available).
This particular data does not need to be reformatted through
NormalizeFasta as in the first FAQ listed below states (will already be formatted correctly from this source). But you should load it in the
Upload tool without setting a “datatype”. Allow Galaxy to uncompress the data and assign the final datatype (will be “fasta”).
- Preparing and using a Custom Reference Genome or Build
- Mismatched Chromosome identifiers (and how to avoid them)
- Extended Help for Differential Expression Analysis Tools
Also, click on the tags I added for more help/prior Q&A for various related issues end users have had when first learning how to work with Custom Genomes/Builds. Some of these may inform you about how to resolve your problem.
However, if none of that helps, please share more information:
- Try at least one rerun to eliminate transient server/cluster issues. Let us know if the job fails again (and how/why) – and if you confirmed that the genome fasta is intact (fully loaded) and has the correct datatype assigned.
- Where are you working? Public Galaxy (URL)? Your own Galaxy?
- How does the job fail?
- Review the contents of the “bug” icon report (it does not need to be submitted, but if you do, include a link to this post for context).
- Click into the “i” icon for job details and review the
stdoutreports. These often explain exactly what the problem is.
For reference, many Galaxy Tutorials from the GTN include the tool
HISAT2. Start with those in the group “Transcriptomics”:
Thank you, I will try it again
I did it like jennaj ever mentioned, but why the result did not show how many reads and the alignment rate. If I used the build-in genome (papAnu2) then ran hisat 2, I can get all the information.
The statistics are there, for every
HISAT2 run by default. But if for some reason there were many “warnings” or processing details, the statistic report will be after those lines, and you won’t see it in the expanded dataset view. Only the first few lines of
stderr are captured here, as a “peek” view.
Technically, the underlying
HISAT2 tool reports the alignment statistics in a part of the job output named
stderr. That’s just how this particular tool works, and it doesn’t necessarily indicate that there was an actual error/problem even though the name of that output includes “err” (abbreviation for “error”) in it. All tools have a
stderr output by default, although not all tool authors report meaningful data in those places. The Galaxy wrapper around a tool can capture and (sometimes) reformat or add more information to the default
stderr to provide more details. And some tool wrappers, like this one, have an option to report the interesting parts of those metadata as a distinct dataset in your history (just the statistics).
How to review the full
stderr message (any tool).
HISAT2, this is where the alignment statistics are captured (at the end, any content above it is worth reviewing but doesn’t necessarily mean that anything went wrong. In my example, the warnings are simply informing that there were some very short reads in my
fastqsanger inputs. Too short for the tool to map. A few of these kinds of warnings not unusual – but if my reads didn’t map well overall, could explain one reason why (perhaps QA was too aggressive or the data was not of high quality originally).
stdout can also be reviewed this way. Plus, this is a method to review the tool parameters originally set/submitted in a simplified format (scroll further into the report than what is shown in the graphics below to find this information). This is a very useful report to learn how to review, and not only when something goes wrong (error or unexpected results).
- Expand the result bam dataset and click on the “View details” icon. This example happens to have too many warnings to show the statistics, and is probably similar to one of your jobs.
- The “Job details” report will display in the center pane. Locate and click on the link to the
- On the
stdoutreport, scroll down past the warnings to find the statistics at the end. In this example, all of my warnings except for the last one before the actual statistics were not captured in the screenshot, on purpose, to keep the graphic smaller. The last line is from the wrapper and is just a processing detail that can be ignored.
How to capture only the
HISAT2 alignment statistics from the full
stderr message into a distinct dataset report into the history.
HISAT2 tool form, expand the section “Summary Options” to access the parameter to “Print alignment summary to a file.” It happens to be set to “No” by default. In the graphic, I changed the option to be “Yes”, and the extra output dataset was created during job execution. It contains just the alignment statistics – in a cleaned-up format (no “warnings” or technical run details). The statistics content is exactly the same as those found at the end of the
stderr output above.
Hope that helps!
hi, Jenn: I am very appreciate! based on your second suggestion, the problem is really resolved! Truly thanks