I am working on seq data and trying to use stringtie to assemble transcripts. I have tried using gffread to convert my gff file to a gtf, however I get either an output with 0 lines or a replica of the gff file when I select -E. Using gffcompare when transferring to a BED file also gives me no results.
My error with stringtie look like this: Error: could not any valid reference transcripts in guide.gff (invalid GTF/GFF file?)
What you have right now are predicted open reading frames (ORFs). Those haven’t been organized into transcript and gene relationships (“annotated GFF”), which is what Stringtie will need as an input. Stringtie compares where reads map with respect to the footprints of transcripts, then summarizes by transcript and then by gene to do some comparisons between samples (differential expression aka DE).
I guess you could fake it and consider all of those ORFs as distinct transcripts and give them a label in the 9th attribute column, and consider each transcript as a distinct gene also with a label in that 9th column. They shouldn’t overlap for this kind of use. Or you could identify which do overlap and use that grouping as the gene.
Both gffread and gffcompare expect an annotated GFF as an input.
What tool are you using for the ORF prediction? Does it have options to output other formats?
Update: Oh, I see the tool name “orfFinder”. Ok, from here you can consider a tool like Maker. Scroll down on the tool form into the help section to learn what it does, plus it happens to have associated tutorials that can get you oriented. The “Annotation” section of the tool panel will have more tool choices to consider – and some of that depends on which public server you are working at so maybe review the usegalaxy.* servers first.
Thank you very much for your help. After following your instructions, Istill reciveieve an error when using stringtie indicating an invalid GFF/GTF format. Here is what my GFF3 file looks like using orfFinder and Maker.
Maker has a toggle above the different input sections to add in gene predictions.
But bigger picture – what you are attempting to do is somewhat complicated, and will need other lines of evidence to be meaningful as input to downstream analysis.
If there is public annotation available for your genome assembly, that would be a better place to start, or to layer into your gene modeling as additional evidence.