Files for miRNA differential expression analysis


I have some files obtained by small non-coding RNA sequencing in humans. My aim is to identify the miRNA and do differential expression.
I am following this protocol: Hands-on: miRNA differential expression analysis - YouTube
It says that I need some reference files, but I’m new in the field and i don’t understand what’s the difference between them or where I can download them. These files are:

  • annotation.gtf
  • transcriptome.fasta
  • star_miRNA_seq.fasta
  • mature_miRNA.fasta
  • miRNA_stem-loop_seq.fasta

I searched in miRBase and I found some reference files, but I am not sure which are the correct ones or where are the ones missing. I attach a picture.

Can anyone tell me where I can download them and what’s the difference among them (especially between annotation and transcriptome file, not only the format; also between star miRNA and the other two). Thank you so much! :slight_smile:

Hi @LaiaGutierrez,

some background on the files:

  • Annotation.gtf
    The Gene Transfer Format is a file format used for describing genes and other features of DNA, RNA, and protein sequences. You can download it from ENCODE; this file corresponds to the Comprehensive gene annotation dataset.

  • Transcriptome.fasta
    This file includes the sequence of all the transcripts in the human genome. You can download it from ENCODE; this file corresponds to the Transcript sequences dataset.

  • mature_miRNA.fasta
    This file includes the functional miRNA sequences. You can download it from miRBase; this file corresponds to the mature miRNA sequences dataset.

  • miRNA_stem_loop_seq.fasta
    This file contains the hairpin structure of the precursor miRNA. You can download it from miRBase; this file corresponds to the hairpin miRNA sequences dataset.

  • star_miRNA_seq.fasta
    This file includes the sequences generated after processing the pre-miRNAs, which are assumed to be non-active. However, currently, this notation has been modified, since experimental results indicate that those sequences are usually functional. For this reason, star sequences are not currently available in miRBase (because have been integrated with the mature sequences). This file is optional.

I suggest you to have a look at the training, since it can be useful for understanding the pipeline: miRNA data analysis.



Hi @gallardoalba

Thanks for your reply, very well explained :slight_smile:

I followed the training you suggested and I have another doubt, maybe you can also help me with that :grin:. I want to do various analysis with my data.

First analysis:
I have cases and controls. I want to look at differentially expressed miRNA, so I just follow the protocol and I do DESeq2. I obtain the table with fold and p-value. Till here everything okey.

But now I want to do a second analysis:
In this case, I have data for all my cases and controls specifying which of them progress to a disease and which of them don’t progress. I want to do Kaplan-Meier and other tests and graphs. I don’t know where to find the normalized counts for the analysis. Can I use the normalized counts that I obtained during the first analysis? And where can I find the table with these counts (for each patient and miRNA)?

I am not sure if I can use them because before I divided the patients in cases and controls and I compared them. But now I am using all together and I compare those that progress to the disease against those who don’t.

I don’t know if I explained well…

Thank you! :smile:

So if I understand it right, you are looking for the normalized count from deseq2 between your case-control analysis? If it is true, There is an option in deseq2 output section which give you the normalized counts.
In R you have to get the counts normalized first and then use Deseq2, But here in due to deseq2 function which normalized your counts automatically you have your normalized counts after using deseq2.
Hope this is what you looking for.


Thanks @amir

If I download the normalized counts, I can work with them for different types of analysis, right? I mean, I obtained this normalized counts by doing a comparison cases-controls. But then I want to do a comparison between thoses cases who progress to a disease and those who don’t progress to the disease. Can I use the same normalized counts that I obtained when I compared the cases with the controls?

Yes you can. But remember, If you want to compare the read counts for example from control condition to another condition, you can use your control read counts. in this case you don’t need normalized counts, Deseq2 normalize the read counts to exclude some issues which effects DEG.
P.s : If anyway you want to use your normalized counts in other DEG, remember undecimal it first.


1 Like

Thank you so much! Now I understand it better :smile:

1 Like