Hi @s316052000
maybe have a look at TransDecoder. Activate all outputs. Try it with ‘output only longest ORF option’. It can produce GFF and BED files. GFF has annotation of UTRs, and you can get positions of UTRs from BED file, as well. Sequences can be extracted with any FASTA extraction tool, like getfastabed.
You probably will miss transcripts with short ORFs and some complicated cases, such as polycistronic transcripts, might be difficult to process correctly. Also, keep in mind that assembled transcripts from some tools, such as Trinity, can be in both strands, so if you use BED, make sure you consider orientation of ORF while selection 3’ UTR.
Hope it does make sense.
Kind regards,
Igor
1 Like