workflow: genes and protein sequences

Hi, I need help. I have the genome of lens ervoides (legume) and need to find genes and symbiosis protein sequences associated with rhizobia using known protein sequences in other legumes. How do i start? How is the work flow?

Hi @Dardo_Dallachiesa,
do you have a list of genes of interest?

Yes @gallardoalba . NFR1 and NFR5 (Nod Factor receptor, kinase), NSP1 (nodulation signal pathway, transcrition factor) and CCamK (another kinase). Thanks

Hi @Dardo_Dallachiesa,
I think that the best approach would be to download the gene sequences of the most closely related species from the NCBI (in that case, by restrincting the search to the Fabaceae family), and then use the NCBI BLAST+ blastn tool in Galaxy to identify the target sequence in the reference genome. I performed a similar analysis which can be useful to guide you.

Regards

2 Likes

Great @gallardoalba Now my problem is this: i have the blast results that indicate the nucleotide when starts the cds. But i only have the nucleotide sequences of lens ervoides, the whole genome. How can i extract this region? can i get the .gff archive? how?. Thanks for your help, i appreciate it

Regards

1 Like

Hi @Dardo_Dallachiesa,
you can generate a GFF file from the BLAST file by using the awk tool with the following expression: {print $1"\tblast\tgene\t"$7"\t"$8"\t.\t.\t.\tID=Gene"$7";Name="$2}

Regards

1 Like