I constructed a pipeline like ‘fastp> RNA STAR> featureCounts> DESeq2’, and exploited the built-in genome of each of RNA STAR and featureCounts.
The built-in genome of RNA STAR and featureCounts must be different. Will there be any problem with this use?
(I don’t know, but I was told that the same genome should be used for mapping and counting).
Yes, this will be a problem, and it may not be obvious or easy to detect. The
Deseq may not technically fail but instead produce putatively correct “green” dataset results that have scientific content issues.
What is your use-case/goal?
Thank you for answer.
I have constructed the above pipeline to analyze the differentially expressed genes of the control and treatment group.
Then in order to use the same reference genome, should I use Comprehensive gene annotation (CHR) at the top of ‘GTF/GFF3 files’ as input of featureCounts and Transcript sequences (CHR) at the top of ‘Fasta files’ as input of RNA STAR at GENCODE - Human Release 38 ?
Gencode is a good source for reference annotation for
hg38. Be sure to format the data correctly after loading it into Galaxy – specifically for this source, that means removing the header lines.
Then, use the built-in
hg38 reference genome when mapping your reads.
FAQs: Galaxy Support
Tutorials: Galaxy Training!
- Choose the topic Transcriptomics
I also added some tags to your post that link to prior Q&A around this type of analysis. You can also try a search with tool names, as that sometimes find more exact examples of troubleshooting/use-cases.