Galaxy is a wonderful place for NGS data analysis but I am getting problems in searching for viral integration. Briefly, I want to look for viral integration sites in human genome. I am not a bioinfomatician and new to this community. Although I found several Linux-based software for viral integration but nothing worked for me as I do not how to use Linux. I would be grateful if you can guide me how to find these sites using galaxy. How to get overlapped human-virus sequences and how to analyse them.
Thanks in advance
Do you already know what viral sequences to look for and what Galaxy server you are going to use ?
I am using https://usegalaxy.org/
I know the viral sequence. I am working on MMTV. But the problem is it matches with mouse sequence too. Unfortunately, it has sequence similarity with HMTV (human virus). We want to check whether this is true integration or just contamination with mouse sample. I used Bowtie2 for alignment with default settings. Then I aligned my viral sequence with unmapped data and found several viral hits. Although my data matched 98.1% with hg38 and I also found my viral sequence in unmapped data, I am not able to find overlapped viral-human sequences.
If you have proper integrations you should be able to find both split reads and mate pairs at the viral integration site. I think you should actually align independently to the human genome, the viral genome, and the mouse genome. You can then find the reads that align to the viral sequence and the human genome, but not the mouse genome. This is a little tricky, I wrote a tool to do that (https://toolshed.g2.bx.psu.edu/view/mvdbeek/bam_readtagger/08656cd6c989), I’ll see if this could be installed on usegalaxy.org.
Thanks for your quick reply. I appreciate your contributions towards this and will look on the tool. Hopefully it will work with galaxy. Could you please guide me which alignment tool will be the best (Bowtie2 or BWA) and at what level of sensitivity. I run the Bowtie2 at very sensitive level but got error message after 2 days (taking more time than expected). My paired data is almost 60GB when compressed and 300GB when uncompressed. I think this create problem…
That shouldn’t be much of an issue. I usually use bwa-mem for genome alignments and bowtie2 in very sensitive mode for transposable element alignments as bowtie2 handles reads aligning to the beginning and end of contigs much better. If the alignment fails you should probably report this (using the little bug icon on the red dataset), I guess it can be fixed by allocating more resources.
Thanks for the link. It is working here. Could you please let me know how to use this tool in galaxy? I think it is linux based command…
It’ll take a while, I’ll let you know when it is available.