Can i extract determinate sequences from a FASTA proteome having the transcript_is's in galaxy

So i have a 42K sequence proteome from a non model organism and i need to substract 1000 sequences that i obtain from a differential expresión analysis in order to use eggNOG mapper to do the GO enrichment analysis can i do that on galaxy?

Hi @Nicolas_Romero_Villa

Do you mean that you want to subset the 42k protein sequences before running eggNOG?

Are those protein sequences annotated at all already? Associated with a Gene identifier?

Did the DE analysis involve using any known annotation? Meaning, transcript to gene associations are known?

If you have both, then you could use the gene to subset. But that could be limiting – you won’t find any potential novels since they wouldn’t be included in the subset. This would be true wherever you run these tools – data cannot be subset until you learn how they are associated (associated as orthologs in this use case).

You’ll need to run eggNOG on the entire set that you expect to be connected to the genes of interest, discover the ortholog groups, then filter after for the actual associations (per eggNOG) with your genes of interest (DE result).

Please explain more … am I misunderstanding the goal?