I have RNA-seq results with many different gene symbols include in. I should omit the predicted genes; however, finding them in this high throughput large scale is problematic. In addition, some of the genes have strange gene accession number such as NM_90098_dup and those genes begin with LOC-, BC-, Gm- or end with -Rik.
Anyone helps me?
These results are from using Cufflinks/Cuffdiff? There is an option to only perform the calculations on known genes from your reference annotation. If the reference annotation itself contains predicted genes, you could filter those out first before using it with analysis (tools:
Select or one of the
Text Manipulation tool group’s “Filter” tools).
GTF annotation datatype is explained in this FAQ: Common datatypes explained
Note: All of the Tuxedo suite tools are considered deprecated. Consider using updated DE methods. Transcriptomics tutorials: