I am sequencing miRNA and other small RNA. I found a protocol to obtain the counts for smallRNA. However, this involves getting rid of miRNAs. I wanted to know if there is another protocol to obtain the miRNAs.
In this protocol for smallRNA they also align the reads to reference sequences from Drosophila. I am working with humans. I would like to know where I can download these annotation and reference files of miRNA and rRNA.
Dear @LaiaGutierrez,
I am not certain, which training material you have followed, but you can potentially get the counts for your miRNA, if you have the annotation file for your miRNAs. I assume you have done something like a feature counting with a bed or gtf/gff file that contained smallRNA regions. Thus if you would replace this file by a file with miRNAs, then you get the counts.
You can obtain annotations for dm6 (Drosophila) from UCSC or Ensembl.
I think I have to align my reads to a miRNA reference sequences file. But I am not sure where to find these reference sequences. Is it the same as the annotaton file?
In my case I am working with humans. I attach a picture, so you can tell me if this is what I have to download. In the protocol I followed they use an annotation file and two differne reference sequences, one for rRNA and one for miRNA. I don’t know where to find this.
I also don’t know what exactly is the difference between these files (BED, GTF/GFF…).
I am very new to bioinformatics and I have trouble with some concepts!
I think I have to align my reads to a miRNA reference sequences file.
Yes and no. You have three possibilities. (A) Map only to the reference genome. (B) Map only to the miRNA sequences from a known database. (C) Combine A & B. Option (B) Relies on the database; thus it is quicker, but it might give a misleading count because of multi-mapped reads, and it does not cover unknown miRNA transcripts. I am not 100% familiar with the detailed differences but maybe a combination (option C) is a better approach in your case for human, as mentioned in this article.
But I am not sure where to find these reference sequences.
No. A reference file for alignemnt is typically in FASTA format. An annotation file usually is in gtf/gff and used, e.g., for counting and an overlap analysis.
In my case I am working with humans. I attach a picture, so you can tell me if this is what I have to download. In the protocol I followed they use an annotation file and two differne reference sequences, one for rRNA and one for miRNA. I don’t know where to find this.
I think for human it is better to go to Ensembl and download the file Gene sets, which you can filter for miRNAs.
I also don’t know what exactly is the difference between these files (BED, GTF/GFF…).
You can find a description of various file formats here.