How to get specific genes to show up - Mapping or replacing gene identifiers

annotation
transcriptomics
gene

#1

Hey can anyone tell me how to get the specific gene to show up in the file. I have tried merging, joining, using Uniprot but can not seem to find a method that works. If anyone knows how to get it, please let me know?


#2

What you highlight is the Refseq gene identifier symbol for the gene, presumably provided by the reference annotation database used.

From the highlighted example:

What gene label/name do you what to convert it to?


#3

Hello Jennaj!

I am so sorry for not being specific. I want to convert the gene symbol to the functional protein that corresponds to it.


#4

Thanks for the clarification. The method below will work for any organism/identifier format. The Uniprot API tool will work with many, but not all.

To replace the value in the dataset, first, find a data source or file that provides the annotation for both the gene value you have and the gene value you want. This data might be your original annotation GTF/GFF, or available at NCBI, or from some other source (like the one I linked above - it requires a login so I didn’t check it fully).

Wherever you source this, reformat the annotation so that it is in a two column tabular dataset. The first value should be the same as is currently in the dataset and the second value is what you want to replace it with. Then use the tool Text Manipulation > Replace column by values which are defined in a convert file.

The UCSC Microbial genome browser’s Table Browser http://microbes.ucsc.edu/ does have RefSeq Gene annotation for this genome. In the primary table, name is the RefSeq transcript identifier and name2 is the gene symbol. You could use that instead as the annotation input for the Cuff* tools but you’ll need to construct your own GTF file from the primary table as the Table browser will output GTF files with the same value (transcript) populated for both the transcript_id and gene_id attributes.


#5

Hello Jennaj!

I fairly new to Bioinformatics, could you please elaborate what you mean by the first value and second value in my dataset. Are you talking about in my GTF file? Also, would by chance have any examples of what it should look like?


#6

A plain text file with two columns separated by a “tab” character, no extra whitespace.

first_value <hidden_tab> second_value

See “Tabular” in these FAQs: