Aligning Illumina RNAseq to SARS-CoV2 genome

genomcc · April 22, 2021, 8:12pm

Hi, I have RNAseq data (Illumina Total RNA) from whole blood RNA of COVID19 patients. Based on ddPCR, I think some have detectable virus in blood. The alignment to HG38 using HISAT2 is great, but when I substitute the SARS-CoV2 genome, I get very few alignments and it does not change between controls and COVIDs, giving maybe 50 alignments out of 17M paired end reads. Any ideas on what could be wrong, or a source for a workflow? Many thanks, TIm

wm75 · April 23, 2021, 8:24am

If you’re just interested in potential SARS-CoV-2 sequences, I wouldn’t recommend HISAT2 necessarily, at least not as the only tool.

Options I’d explore instead:

Do one pass of HISAT2 alignment, then filter for unmapped reads and align those using bwa-mem (or bowtie2)
Generate a combined human/SARS-CoV-2 FASTA, then use bwa-mem (or bowtie2) for a single round of alignment, after which you filter for alignments to the viral genome

Galaxy | Europe | Published History | SARS-CoV-2/human combined ONT reference is an example history for generating the combined genome. You can import it and use it as your starting point, but since your data is Illumina data (not ONT), you would want to concatenate only the hg38 and the NC_045512.2 fasta datasets.

Here’s a workflow that illustrates the approach for ONT data and uses minimap2 as the mapper:
Galaxy | Europe | Published Workflow | SARS-CoV-2: map ONT reads to transcripts

It shouldn’t be hard to modify it for your use case.

The rational behind these recommendations: HISAT2 is a great tool for aligning eukaryotic spliced RNA reads, but it isn’t optimized for viral RNA. A general mapper (like bwa-mem or bowtie2) is sufficient for the viral data and may do a better job in fact.
My first suggestion might be the best way to get optimal sensitivity for human and viral reads, but the second option should be very comparable (or even more sensitive) if you care about the viral reads only (it will do a good enough job of getting the vast majority of human reads out of the way and will discover SARS-CoV-2 alignments reliably).

genomcc · April 26, 2021, 1:17pm

Thanks very much, that’s kinda what I was thinking, so the specifics really help!