Tool to use in place of Cuffdiff

I am in the process of analysing RNA seq dta using galaxy platform. I have merged all my transcripts using Stringtie merge. I now want compare multiple conditions. I cannot use the Cuffdiff tool as I have not used cuff merge or cufflinks to create previous files. What is the other tool I can use in place of Cuffdiff?

1 Like

Hi @NIKITA_JHAVERI, could you provide me additional information about the previous steps of your analysis? If you used Stringtie for assembling the RNA-Seq alignments then I recommend you to have a look at this tutorial De novo transcriptome reconstruction with RNA-Seq.

Regards

I used HISAT2 to align reads to the genome, Stringtie to assemble the transcripts, and Strintie merge to assemble all the reads together. I want to use the DEseq2 tool to analyse differentially expressed genes. However, it asks for a “counts file”. When I used the input from Stringtie, I got an error. I then used “Featurecounts” to get the count file, using the reads from HISAT2, but again got an error. So how do I generate the count files that can be used an an input for DESeq2?
Or what other tool can I use that uses the outputs from Stringtie to compute differentially expressed genes?

Hi @NIKITA_JHAVERI,

after reviewing your analysis, I would like you to provide a few comments. Let me know if you have any additional question.

  1. In the featureCounts tool you should specify “gene” as GFF feature type filter and “Gene” as GFF gene identifier in the advanced options section, since the default values are not present in your GFF3 file.

  2. In order to use Deseq2 with datasets generated by Stringtie I recommend you to follow the tutorial that I previously mentioned to you. It explains all the required steps to carry out this workflow successfully.

  3. Unless you are specifically interested in identifying new transcripts, I would recommend you to follow the workflow developed in this tutorial Reference-based RNA-Seq data analysis
    . Stringtie introduces a lot of noise into differential expression analysis as s result of the large number of unknown potential new transcripts and incorrect assignment of identifiers.

  4. When using Deseq2 with Kalisto, it is necessary to specify “TMP values” in the Choice of Input data option.

Regards

1 Like

Thank you very much. I will fo through the tutorial you mentioned and make the changes you suggested. This helps!

Hello,
I have taken you suggestion and following the steps mentioned in the link
[Reference-based RNA-Seq data analysis].

For the following option while using STAR:
Custom or built-in reference genome: Use a built-in index “Reference genome with or without an annotation”: use genome reference without builtin gene- model
“Select reference genome”: MY reference organism is not mentioned here.

I have imported my reference genome but that does not have the option of without builtin gene-model or without an annotation.

How should I go about it?

Hi @NIKITA_JHAVERI,
in the Custom or built-in reference genome section, you should select the Use reference genome from my history option, and after that it will be possible to select the reference genome from your history.

Regards

Thank you
For running the featureCount, I am following you advice:
In the featureCounts tool you should specify “gene” as GFF feature type filter and “Gene” as GFF gene identifier in the advanced options section, since the default values are not present in your GFF3 file.
I still get an error and am unable to run it.

I also ran htseq-count and am getting an error as well. Can you please let me know if the bam files I created using STAR and HISAT2 are right?

Hi @NIKITA_JHAVERI
you need to specify “sequence_name” as GFF gene identifier (instead of “Gene”). The problem is not the alignment, but how the information is provided in the annotation file. Now it should work fine.

Regards

Thank you. I will try that

Hello,
I still couldn’t make the feature count function work with gff file. I used gtf file as an input and it worked fine. However, the number of assigned values is very less (only 2-4%) and majority are unassigned_no features (>85%). I am not using the latest version of the gtf file. Could this be the reason?

Hi @NIKITA_JHAVERI,
Can you repeat the analysis with this annotation file? ftp://ftp.ensemblgenomes.org/pub/metazoa/release-49/gff3/caenorhabditis_briggsae/Caenorhabditis_briggsae.CB4.49.gff3.gz I think that it is the most recent version.

Thank you. I will do so and let you know.

Hello, I tried with the file you mentioned, but as it is a gff3 file, I was not able to run feature counts with it.
Can you please let me know what to do?
I used a gtf file but an older version and got more than 85% as “unassigned_nofeatures” hit

Hi @NIKITA_JHAVERI,
please could you share your history with me? My email is gallardo@informatik.uni-freiburg.de.

Regards