Reference annotation GTF options for UCSC's mouse builds mm9, mm10

Gaia_Gentile · January 11, 2021, 1:03pm

Hello !!!
I am performing a RNA seq analysis with mice sample.
Right now I am struggling in funding¬downloading the file

UCSC Main on Mouse: wgEncodeGencodeBasicVM25 (genome)
that contains all the transcripts in order to perform the Join two Dataset analysis between the Deseq2 files vs the UCSC Main on Mouse: wgEncodeGencodeBasicVM25 (genome) .

Could u help me out please
thanks a lot in advance!!!

jennaj · January 11, 2021, 9:53pm

Hello @Gaia_Gentile

Gencode annotation for the mouse builds mm9 and mm10 are available from the Gencode website.

Instructions for how to load and prepare that annotation for use with tools is covered in this prior Q&A. It is about a specific tool and the human genome, but the same advice applies:

RNA-STAR and hg38 GTF reference annotation

You may also find this FAQ helpful:

Extended Help for Differential Expression Analysis Tools

I also added a few tags to your post that will lead to more related topics. Or, review the results of this search:

Search results for 'reference gtf' - Galaxy Community Help

If that doesn’t answer you question, please explain more about the steps you have done so far (tools), and which data is involved (reference genome, reference annotation). If you are following a tutorial, include a link to that too please.

Thanks!

Gaia_Gentile · January 12, 2021, 10:35am

Good morning,

Thank you for your kind reply.

I am performing a RNA seq data analysis using Galaxy.

My samples are mice sample.

So the first step I have done was to upload the file> UCSC Main on Mouse: wgEncodeGencodeBasicVM25 (genome).

Then i did the Quality control of raw reads using the tool FastQC.

Subsequently I performed the Read alignment ¬ Genome based using RNA STAR Gapped-read mapper for RNA-seq data (Galaxy Version 2.7.6a).

Then I perform the Quality control of aligned reads using Multi QC aggregate results from bioinformatics analyses into a single report (Galaxy Version 1.9).

After I did the Read quantification using htseq-count - Count aligned reads in a BAM file that overlap features in a GFF file (Galaxy Version 0.9.1+galaxy1), in which In the voice Aligned SAM/BAM File> my file of interest and in the voice GFF File**>** UCSC Main on Mouse: wgEncodeGencodeBasicVM25 (genome).

Then I did the DESeq2 Determines differentially expressed features from count tables (Galaxy Version 2.11.40.6+galaxy1) .

After this I would like to perform the tool Join two Datasets side by side on a specified field (Galaxy Version 2.1.3), I have trouble in this part of the analysis. I put in the voice Join> my Deseq2 file of interest and in the voice with> the file UCSC Main on Mouse: wgEncodeGencodeBasicVM25 (genome).

But it is not right because after I perform such analysis I do not obtain for each gene for expl ENSMUST00000021332.9 the name to whom it is associated, in order to know which genes then are upregulated or downregulated.

I miss the file of UCSC Main on Mouse: wgEncodeGencodeBasicVM25 (genome) in which I have the name of the transcripts.

I hope it is clear now.

Looking forward to hearing from you.

Best regards,

Gaia Gentile

jennaj · January 13, 2021, 1:17am

This file does not contain the gene name, just the transcript name and the location on the genome.

Try the file wgEncodeGencodeAttrsVM25 instead.

Thanks!

Gaia_Gentile · March 18, 2021, 1:14pm

Good afternoon,

I am having some rubles in making a Heat map from RNA-seq data. I read the tutorial but still it did not work.
Could you explain how to do it?

Thanks a lot.

Best regards,

Gaia Gentile