I sequenced my plasmid, which is approximately 15.2kb, and obtained the .bed file. I would like to create a .bam.bai file to visualize the alignment on IGV. Could you please help me obtain this file? Please give me step by step guide, I didn’t know how to do this
If you map reads (create a bam file) in Galaxy, bam.bai index file will be created automatically. However, the plasmid size, 15.2 kb, might be too small for standard alignment tools.
Check this tutorial: Hands-on: Mapping / Mapping / Sequence analysis
It uses built-in genome for read mapping, but all mapping tools can take a custom genome (a fasta file in user history). During setup of mapping job change Source of genome from built-in to From history.
Kind regards,
Igor
Okay, I have the following files after sequencing .bed, .gbk, .maf, .maf.index, .tsv, .fasta and .fastaq. I want to align the whole plasmid sequencing result with my designed plasmid, which is around 15.23kb.
I tried making a .bam.bai file in the Bowtie tool. It gives me .bam and .bai files (two separate files). After this I do not know how to create the .bam.bai file. Please give me step by step guide.
If you already mapped with Bowtie in Galaxy, then @igor advice applies.
The output.bam
will be in your history as a dataset. The output.bam.bai
index is not directly displayed as a seperate dataset, but it is part of this compound dataset. You can download both files using the disc icon if you want to use this data outside of Galaxy. They already exist. Click on that disc icon – you’ll see the choice for both files. Download them separately to get both.
Screenshot of a bam dataset with the disc icon activated
Now, you also have more choices!
Single file hosting for display applications WITHOUT genomic fasta indexes
To view just that single bam output in IGV, you can click on the visualize icon to access display applications. This still uses Galaxy as the data host, but when there is no database assignment, a single on-demand generic fasta index will be created for transfer over to display applications.
WARNINGS:
- There will be no genomic DNA sequence reference included because there is no fasta/fasta.fai index attached to the dataset!
- Additional files cannot be loaded into the same display.
Multiple file hosting for display applications WITH genomic fasta indexes
To view multiple datasets all together in a display application, you can also click on the visualize icon to access display applications. This uses Galaxy as the data host, and when there is a database assignment, this attaches your genome’s specific fasta index for transfer over to display applications.
BENIFITS:
- Genome DNA backbone included in the display.
- Any dataset sharing the same database assignment can be loaded into the same display.
- Native database keys can be assigned and Custom database dbkeys can be created and assigned. Both will work the same!
Not sure how to create and assign a Custom database?
- FAQ: How to use Custom Reference Genomes?
- With many examples at this forum, see custom-genome custom-build
What to do?
Since you have a genome fasta file already, creating a custom database build key, then assigning it to your datasets seems like a good choice! This will allow you to load all of the data into a local IGV application and view everything together.
Now, with IGV, you will also need to set up your custom genome! If you already have IGV set up with your custom genome, just make sure to label your custom genome in Galaxy the same way. The database “dbkey” label must be the same everywhere to instruct the applications to use the same fasta index. Avoid mixing up assemblies here or expect problems with the data coordinates.
This topic has more details, some that overlap with what is already above, but maybe it provides some more context?
Connecting it all together
- IGV configured with your custom genome
- Galaxy configured with your custom genome
- the custom database dbkey (fasta index) assigned to datasets in Galaxy
- then when the database dbkey in IGV is the same term as used in Galaxy, and you select local IGV display from Galaxy, your datasets can be loaded up all together into IGV!
You can also just download all your files and not use Galaxy to host the data, but you’ll still need to configure IGV with your custom genome for the display if you need the genomic DNA sequence as the reference and plan to view all the files together.
Please give that a try and let us know if you need more help!