Hello @mmomeni
Yes, you are correct!
The ? indicates that the dataset file is not yet assigned to a specific named fasta index. This cannot be assigned during runtime when mapping but can be assigned after. The assignment can be for individual datasets or batches of datasets in a collection folder.
Then, for this part
There is an extra optional step that can be done to fully set up a Custom reference genome. We call this step a Custom Build.
How Custom Builds are assigned and used
-
Uploading a fasta file into your history and selecting it on tool forms is enough for some analysis. You’ll be able to visualize that single file with an automatic generation of a single use fasta index. We want this to be fast and easy!
-
But if you plan to visualize multiple results files together, for example, in a local IGV, this requires that each dataset file has been assigned to the same named fasta index in Galaxy. Then, in IGV, that same reference genome is set up as a a custom index. The common label is used in the applications to load data into the same genome assembly coordinate system.
You might see this called a database or dbkey across applications, but they are all the same thing: the fasta file itself and a fasta index.
genome.fasta
genome.fasta.fai
FAQs outlining the steps!
- Start here → FAQ: How to use Custom Reference Genomes?
That FAQ links to these details
- FAQ: Adding a custom database/build (dbkey)
- FAQ: Changing database/build (dbkey)
- FAQ: Changing the datatype of a collection
- FAQ: Using IGV with Galaxy
We have practical help at this forum, too! custom-genome custom-build igv
- Example for BAM datasets and IGV. → BAM index, fasta indexes, display applications, custom genome builds, IGV - #4 by jennaj
- Another example for setting up the genome in IGV. → Is there a way to use the display in IGV tool so I don't have to download the genome separately? - #2 by jennaj
- Example for getting reference annotation data matched up! → Fail to load GFF3 into IGV - #2 by jennaj
Suggestions about what to do next
This is if you want to show them how to do this! It really is very powerful and I think it helps to understand what is going on technically. If the students ever work on the command line in the future, all of this manipulation in Galaxy will be directly transferable: getting a reference genome set up in multiple tools to allow those tools to communicate.
-
(optional) Place a copy of the genome fasta and any reference gtf/gff3 into a dedicated history. Name the history! This makes it easier to remember what you used for your custom dbkeys, especially if you want to reuse it later on.
-
Create the Custom Build in Galaxy
-
Assign your new database dbkey to datasets
-
Create a
genome.fasta.faidataset in Galaxy (from the fasta dataset in your history → pencil icon, convert). -
Download the
genome.fastaandgenome.fasta.faidatasets to your computer. -
Use these two files to create the custom database in a local IGV. The option in IGV is under Genomes → Load Genome from File. Be sure to use the exact same dbkey name label as you used in Galaxy!
-
Then, back in Galaxy, to add a dataset to a visualization in IGV, click on the visualize icon for the dataset, and choose local IGV. The dataset will transfer over, hosted from Galaxy (no direct download step).
There are many examples of this out in the wild when working on the command line – a browser search with “custom genome IGV” will locate tutorials. But the instructions above should be enough when starting from Galaxy and doesn’t involve actually downloading all the analysis data files (just the genome, and just once).
Please let us know if this helps! ![]()