Filtering out host genomic sequences from Illumina paired-end reads

chaic · March 7, 2022, 10:12am

Hi Galaxians. I have obtained Illumina paired-end reads of microbiome metagenomes from a human sample and would now like to remove human (contaminant) sequences from them. I have tried installing Bowtie2 on Anaconda but didn’t get very far as I am new to Anaconda and Bowtie2. I found some instructions at this link for doing this but sadly, I do not know how to implement it. Is it possible to add this function to Bowtie2 in Galaxy? Or is there a tool in Galaxy that can perform a similar function?

Thank you in advance for your help.

wm75 · March 7, 2022, 8:07pm

There is Removal of human reads from SARS-CoV-2 sequencing data. It’s demonstarting things with SARS-CoV-2 sequencing reads, but try to work through it and it should be rather obvious how to apply this to your data, I hope.

chaic · March 8, 2022, 6:12am

Thank you @wm75. I am exploring that now. My issue is that I only have one set of paired-end reads but the example used 2 sets and grouped the data into collections. I will try to figure it out.

chaic · March 11, 2022, 8:23am

For those who are interested, I found out that Bowtie2 can perform that function in Galaxy. In the Bowtie2 window, select Yes for “Write unaligned reads (in fastq format) to separate file(s)”. All those reads that do not map to hg38 (or any other reference genome of your choice) will be written to those files.

I tried the method detailed in Removal of human reads from SARS-CoV-2 sequencing data but still find the majority of scaffolds assembled from the filtered reads to be human sequences. One of them is a 13Kbp human mitochondrial genome There are also sequences like “Homo sapiens contig freeze2_XXXX genomic sequence” and “homo sapiens chromosome 5 clone RP11-455D3 complete sequence”, which I do not understand. I thought the filtering processes should get rid of all human reads. Could this be because hg38 is an incomplete draft of the human genome? I read that the human genome has been fully completed recently. Will we see hg39 anytime soon?

Topic		Replies	Views
Bowtie2 filtering reads mapping , blast , igv	1	580	September 28, 2023
Removal of host sequences without reference genome metagenomics	1	751	August 24, 2021
Extracting WGS reads belonging to a particular organism gtn-tutorial , metagenomics , wgs , mapping	7	1505	January 12, 2019
Bowtie2: how to normalize reads by spike in mapping	1	140	May 24, 2024
viral integration and overlapped human-viral genome sequences viral-integration	8	820	July 6, 2020

Filtering out host genomic sequences from Illumina paired-end reads

Related topics