viral integration and overlapped human-viral genome sequences

waqar · March 4, 2019, 4:43pm

Hi all,
Galaxy is a wonderful place for NGS data analysis but I am getting problems in searching for viral integration. Briefly, I want to look for viral integration sites in human genome. I am not a bioinfomatician and new to this community. Although I found several Linux-based software for viral integration but nothing worked for me as I do not how to use Linux. I would be grateful if you can guide me how to find these sites using galaxy. How to get overlapped human-virus sequences and how to analyse them.
Thanks in advance

mvdbeek · March 4, 2019, 4:47pm

Do you already know what viral sequences to look for and what Galaxy server you are going to use ?

waqar · March 4, 2019, 5:04pm

Hi mvdbeek,
I am using https://usegalaxy.org/
I know the viral sequence. I am working on MMTV. But the problem is it matches with mouse sequence too. Unfortunately, it has sequence similarity with HMTV (human virus). We want to check whether this is true integration or just contamination with mouse sample. I used Bowtie2 for alignment with default settings. Then I aligned my viral sequence with unmapped data and found several viral hits. Although my data matched 98.1% with hg38 and I also found my viral sequence in unmapped data, I am not able to find overlapped viral-human sequences.

mvdbeek · March 4, 2019, 5:12pm

If you have proper integrations you should be able to find both split reads and mate pairs at the viral integration site. I think you should actually align independently to the human genome, the viral genome, and the mouse genome. You can then find the reads that align to the viral sequence and the human genome, but not the mouse genome. This is a little tricky, I wrote a tool to do that (https://toolshed.g2.bx.psu.edu/view/mvdbeek/bam_readtagger/08656cd6c989), I’ll see if this could be installed on usegalaxy.org.

waqar · March 4, 2019, 5:19pm

Thanks for your quick reply. I appreciate your contributions towards this and will look on the tool. Hopefully it will work with galaxy. Could you please guide me which alignment tool will be the best (Bowtie2 or BWA) and at what level of sensitivity. I run the Bowtie2 at very sensitive level but got error message after 2 days (taking more time than expected). My paired data is almost 60GB when compressed and 300GB when uncompressed. I think this create problem…

mvdbeek · March 4, 2019, 5:22pm

That shouldn’t be much of an issue. I usually use bwa-mem for genome alignments and bowtie2 in very sensitive mode for transposable element alignments as bowtie2 handles reads aligning to the beginning and end of contigs much better. If the alignment fails you should probably report this (using the little bug icon on the red dataset), I guess it can be fixed by allocating more resources.

waqar · March 4, 2019, 5:22pm

Thanks for the link. It is working here. Could you please let me know how to use this tool in galaxy? I think it is linux based command…

Regards

mvdbeek · March 4, 2019, 5:23pm

It’ll take a while, I’ll let you know when it is available.

waqar · July 6, 2020, 8:32am

Hi mvdbeek,
Any update on this tool readtagger? Is it available at galaxy tool-shed or not

Regards
Waqar

Topic		Replies	Views
alignment manipulation parameters usegalaxy.org support viral-integration	0	307	March 19, 2019
Have lentivirus integration site analysis in Galaxy? usegalaxy.org support tool-dev	2	84	June 6, 2024
Filtering out host genomic sequences from Illumina paired-end reads usegalaxy.org support mapping	3	926	March 11, 2022
WGS Alignments that tolerate large, unknown, non-genomic insertions usegalaxy.org support workflow , wgs	1	427	August 16, 2021
Generating consensus sequence from bam file re: Variant analysis usegalaxy.eu support assembly	1	1188	June 20, 2019

viral integration and overlapped human-viral genome sequences

Related topics