samtools view problems when extracting subset of fastq

After using Minimap2 on a set of nanopore reads aligning them to the contigs from a Flye assembly to produce a Bam file (about 30 Gb) I now identified contamination and have a subset of contigs I want to use to extract the Fastq files that align to these contigs so I can re-assemble. However, the tool “samtools view” when using this Bam file and specifying that I want a subset by manually providing a space separated list of the contigs names (want all reads that align to each contig), that I want the actual reads, I can still only choose Bam file as output and the resulting output file seems to be the same wether I give one contig or many. It only seem to consist of one column with the contig names (possibly also length after a semicolon). Filesize of about 300kb. Gives no errors, but zero output if other tools are used to try extract fastq from this file, like “bamtofastq”. Not sure what I am doing wrong here. Tools like “split bam” into unmapped reads works fine on the same 30 Gb Bam file.
Need help.

Hi @Roger_Meisal

There are several tools to filter BAMs. Filter BAM is one, and this is the tool link at the EU server Galaxy | Europe

For Filter BAM, to filter out all reads mapped to a specific contig, the usage would be something like this:

If I misunderstood, would you please share some data with an example?

Hi Jennaj. Thanks for feedback.
When working with datasets that contain thousands of contigs this solution is not feasible I am afraid. I found a workaround thogh.

Dont know why, but the tool “samtools view” is not working when manually providing the chromosome or contig ids. However, when I provided a BED-file with contig_name start and end positions for each fragment, “samtools view” finally extracted the fastq subset for the fragments in the BED-file.

By using the GFF file (I think it was) from the assembly used to build the BAM file, the id and length of each fragment can be extracted. Only need to insert an aditional column with startposition 0 and then use a list of contigs ids to filter/split the complete list into two. For me this was one containing all contaminants and one containing the decontaminated genome. Then change the data type to BED. Could be that the BAM and or BED files needs sorting before running samtools view, but this solution worked for me.

1 Like