Converting BAM files to FASTQ?

Thanks for clarifying @bacterial_dna !

This protocol generates Whole Genome Sequencing reads, often termed as WGS. The software can produce reads, but also some of the downstream data files like mapping results and variant calls. It depends on what choices were made.

Your BAM file will always include the original fastq reads. And, it may just have reads, or it may also contain the mapping results. It now sounds like you have the latter.

The reference genome used for the mapping will matter, since you will need a copy of it to run the BAM through a variant calling tool.

Did the bioinformatician also give you a copy of the reference genome fasta? Or, do you know the accession identifiers for it? If not, we may be able to locate it anyway. And, you could also just do the mapping all over again (using the reads in the BAM – I can help with this again).

Where to look: the “header” or top portion of your BAM will likely specify the genome used in some special data lines starting with the @ characters. The goal here is to identify the reference genome used, then to source it from NCBI to get it into your Galaxy history. This will allow you to use it with the next steps.

We will be getting your reference data like this. → FAQ: NCBI reference data

The graphic you posted first is from a display application tool in Galaxy. If you go to this file again, and click on the eye icon again, you will be able to toggle into the Raw data tab, and be able to see the header in plain text. Would you like to post back a screenshot of that? Try to get all of the header lines and the first few data lines underneath (2 or 3 of those is enough). It is ok if this is several screenshots.

Then, for the basic variant calling protocol, we have an example in this tutorial.

Notice how this protocol:

  1. starts with fastq reads
  2. those are mapped to produce a BAM
  3. variants are called to produce a VCF file
  4. then the result is annotated with more information

Your data seems to be from right after step 2 (mapping). We can help you to proceed to the next steps! Then help to get your data prepared for the other types of investigation you asked about.

Worst case, we can help you to back up and run from step 1. What you have is enough to do this, and we can solve the current extraction error since I think I see what was likely going wrong.

So, please post back some screenshots of the BAM header so we can locate your reference genome. You could also generate a history share link and post that back into a reply since that will make our advice quicker and more specific! If you would rather share your history in a chat message, I am going to start one up now. I would really like to get this solved for you. What you want to do is definitely possible in Galaxy, including generating the types of output you could export for the other applications!

Thanks! :slight_smile: