Can miRNA differential expression analysis be performed using a mRNA-like pipeline with a GTF file from miRBase?

I am planning to perform differential expression analysis for the miRNA reads of an RNA-seq dataset that was not produced using deep sequencing. After reading the Whole transcriptome analysis of Arabidopsis thaliana tutorial, I noticed that the reads used for miRNA analysis were generated using deep sequencing. I would like to know if I can still use MiRDeep2 Mapper and MiRDeep2 Quantifier for my non-deep-sequencing dataset.

Additionally, I would like to know if I can use the pipeline outlined in the Reference-based RNA-Seq data analysis tutorial to perform differential analysis of miRNAs in my dataset, utilizing STAR for mapping, featureCounts for counting and using the GTF file from miRBase instead of a GTF file from ENCODE? If yes, please provide me with the link to download the appropriate GTF file.

Hi @Maryam_Momeni

As far as I know, you can use a standard RNA-seq pipleline and the tools you mention seem fine. This assumes that the reads are not too short to map (~25 bases minimum). I am not sure about the deep sequencing part with the other pipeline … maybe check at a scientific forum where other people using that tool are? Biostars or SeqAnswers is where I would start! You could also ask at the GTN chat – maybe Cristo will see your question (he is who helped in the other topic you asked a question in). Find that chat at the top of the tutorial website.

For the data part, you can get the reference annotation from here. → miRBase - Downloads

Once the GFF3 file is in your history (Upload tool, all defaults), you can convert it to GTF format with gffread. This will make it easier to use some of the tools with default settings, or you can probably use the GFF3 and customize tool settings. By “customize” I mean adjust the attributes that you want tools like FeatureCounts to use for the summaries. These are set for GTF format by default but people use these with GFF3 – just match up the attributes by inspecting your file for the labels used.

And, as an example, I downloaded the human GFF3 into a history here. The data is a match for the built-in UCSC hg38 reference genome at the UseGalaxy servers.

Please have a look and let us know if you have more questions! Happy to see you are following up to tackle this project again! :scientist:

Hi @Maryam_Momeni

Yes, the human file is this one: hsa.gff3

I included the full URL into the comments section of my shared history. The file is also named that way in the history.

If you or anyone else is not sure about the other species, try this:

  1. Load up the files that you think might be related to your species.
  2. Then, click on the eye-icon for those files.
  3. The reference genome that the data is based on is included in the header of the file (the # lines).
  4. From there, you can examine the chromosome names, compare to the built-in reference genomes, or your own Custom Genomes fasta file, and decide how to proceed. We have an FAQ here with all the “how-to” for a deep dive into ensuring all of your reference data and working files match up.

Testing and examining this way should work for any reference data from any source for any analysis. The reference genome the data is based on might not always be in the header but it will be included somewhere – maybe at the website in a guide or in the FTP directory readme files. If ever not sure, you can share the URLs involved and explain what is not clear and we can try to help more at this forum. :rocket:

1 Like