I have been using Unicycler for assemblies and have been following this tutorial:
However, when I try to open my assembly using “display with IGV local” (figure 14 of the tutorial) I get an error message:
It seems to say that the fasta file is available but the fai file is missing.
I do not have jobs still running, all my assemblies are showing as green, so I don’t see why it says “additional datasets require to be be generated”.
Did I miss a step when trying to use IGV local?
Thanks for help
An index for IGV is being created at this preparatory step. This is another “job”, distinct from the assembly or other jobs run in the History. Your
Unicyler job created a
fasta output (without an index). Display in IGV requires an index, which is what is happening at this step.
When “Dataset Status” states “new”, then that input (a
fa.fai index in your case) is still being generated.
When “Ready” for both are true, then the data can be visualized.
How long this prep step takes depends on the size/content of the original dataset and how busy the server you are working at is.
So if I understand properly, the only way for me to know if both files are available is to check regularly using the IGV local button?
I thought that an analysis showed in green meant that it was all done, but then that’s not true for index files?
I’ll just wait then, Thanks
The assembly is done, but moving the data into any 3rd party external application can require some reformatting or the creation of additional files that the other application is expecting – or they can require none – it depends on the external application.
In this case (IGV), a fasta index is required for new genomes. This would be true even if you decided to load the fasta directly into IGV as a new genome (instead of loading it directly from Galaxy). The “Galaxy-to-Local IGV” data web-based loading for fasta data is a convenience feature in Galaxy that skips the need for you to create the fasta index as a distinct step, download the fasta and fasta index, then load both into IGV directly.
To avoid this step, you could pre-load your assembly as a new genome in IGV. Then promote the assembly in Galaxy to a “Custom Build”. Use the same exact “database” aka “dbkey” name in both places, and assign that “database” to datasets you wish to visualize in IGV against that new genome/assembly.
Any fasta file can be used directly as a “Custom Genome” with tools wrapped in Galaxy natively, without any fasta or tool-specific indexing (that is done at job runtime, when needed). But, sometimes promoting a fasta/Custom genome to a “Custom Build” is helpful for other reasons – there are tools in Galaxy that also require an assigned “database” aka “dbkey”.
How to create/use a Custom Genome/Build in Galaxy:
IGV is a different application. The methods to create a “New Genome” in your own IGV are below. The preparations steps will require that you index the fasta with samtools as part of that process. Or you can create the fasta index in Galaxy and download that along with the fasta.
Note: Certain other datatypes already have an attached index. An example is
BAM data. The download icon for BAM data will contain two datasets: the
.bam (mapping results) and the
Whether or not to spend the extra effort to directly index your fasta in IGV as a new genome and create a Custom Build in Galaxy so that you can assign that custom “database” to other datasets is up to you. That will somewhat depend on how often you plan to use it as a reference genome for other analyses.
Hope that helps to explain the options