DESeq2 with HISAT2 output?

Hi everybody :slight_smile:

I am working with smallRNA. I followed this protocol (Differential abundance testing of small RNAs).

My aim is to do differential expression analysis of siRNA and piRNA using DESeq2. In this protocol they use DESeq2 but for abundance estimation, which is not my aim.

There is another protocol specific for miRNA (Whole transcriptome analysis of Arabidopsis thaliana) which uses DESeq2 for differential expression analysis. I want to do the same but with smallRNA.

Problem: DESeq2 requires a count table, but I don’t have it. In the miRNA protocol, they did miRDeep2 to obtain it. But in the smallRNA protocol, they use HISAT2 against rRNA and miRNA and keep the reads not aligned, after they do transformation from BAM to FASTQ. However, DESeq2 does not accept this file as input, as it is not a count table.

Any ideas how I could obtain the count table as input for DESeq2 after doing HISAT2 to obtain the non-rRNA/miRNA reads?

Thank you!

Hi @LaiaGutierrez
assuming you mapped reads to miRNAs idxstats reports number of mapped reads in 3d column.
Hope this helps.
Kind regards,
Igor

Hi @igor ,

Thanks for your reply. The problem is, I after running HISAT2 against rRNA and miRNA databases and filtering those non-rRNA/miRNA with BAM tools, I don’t obtain a table, but a file like this:


But DESeq2 asks me for a counts table.

Any idea? Thanks!

Hi @Letizia_Iuffrida
I am sorry for unclear explanation. I mis-read your question. Do you have list of ‘targets’ for piRNAs and siRNAs? If yes, map the filtered reads (unaligned to rRNA and miRNA) to genome and count against the annotated tagets. If the targets are are in GTF/GFF format, try featureCounts. For intervals bedtools might be a good option, but filter multimappers before counting.
Kind regards,
Igor

If you deal with repeated sequences, map reads to unique set of sequences, in the same way as for miRNA. Some miRNA loci produce identical mature miRNA, hence we map to mature miRNAs to avoid ambiguity of multimapping.
Kind regards,
Igor

Sorry,
I think you tagged me by mistake instead of @LaiaGutierrez :slight_smile:

oops, sorry! It is Friday evening here. Time for a break.

1 Like

Thanks @igor,
I don’t have targets for piRNA and siRNA. I just want to identify them. Can I align them to the human genome?

Thanks!

Yes, short reads can be aligned to the human genome. If you map with HiSAT2 maybe disable spliced alignment in Alignment options or use genomic aligners such as bowtie(2) or BWA. Short reads might map to multiple positions, especially if derived from mobile elements or repeats. Usually multimapped reads are ignored during counting step. For counting table you need a file with positions of regions of interest.

1 Like

So, to sum up:

  1. HISAT2 to align against miRNA and rRNA reference databases.
  2. BAM to keep those that did not align.
  3. Feature counts using the human genome (.GFF)
  4. DESeq2

Right?

For featureCounts you need an annotation.
Maybe something like this:
filter out reads derived from miRNA and rRNA (map using HiSAT2 to miRBase and rRNA sequences and write unmapped reads to FASTQ files.
map these reads to genome
Identify regions of interest.
Maybe get a published protocol for piRNA and siRNA and reproduce all the steps in Galaxy. These RNAs might require different approach in identification. I believe piRNAs are usually longer, so consider filtering by length.
Kind regards,
Igor

1 Like