How to get specific genes to show up - Mapping or replacing gene identifiers

brandonkarbs34 · February 2, 2019, 3:12pm

Hey can anyone tell me how to get the specific gene to show up in the file. I have tried merging, joining, using Uniprot but can not seem to find a method that works. If anyone knows how to get it, please let me know?

jennaj · February 4, 2019, 3:52pm

What you highlight is the Refseq gene identifier ~~symbol~~ for the gene, presumably provided by the reference annotation database used.

From the highlighted example:

What gene label/name do you what to convert it to?

brandonkarbs34 · February 4, 2019, 5:48pm

Hello Jennaj!

I am so sorry for not being specific. I want to convert the gene symbol to the functional protein that corresponds to it.

jennaj · February 4, 2019, 8:31pm

Thanks for the clarification. The method below will work for any organism/identifier format. The Uniprot API tool will work with many, but not all.

To replace the value in the dataset, first, find a data source or file that provides the annotation for both the gene value you have and the gene value you want. This data might be your original annotation GTF/GFF, or available at NCBI, or from some other source (like the one I linked above - it requires a login so I didn’t check it fully).

Wherever you source this, reformat the annotation so that it is in a two column tabular dataset. The first value should be the same as is currently in the dataset and the second value is what you want to replace it with. Then use the tool Text Manipulation > Replace column by values which are defined in a convert file.

The UCSC Microbial genome browser’s Table Browser http://microbes.ucsc.edu/ does have RefSeq Gene annotation for this genome. In the primary table, name is the RefSeq transcript identifier and name2 is the gene symbol. You could use that instead as the annotation input for the Cuff* tools but you’ll need to construct your own GTF file from the primary table as the Table browser will output GTF files with the same value (transcript) populated for both the transcript_id and gene_id attributes.

brandonkarbs34 · February 5, 2019, 4:38pm

Hello Jennaj!

I fairly new to Bioinformatics, could you please elaborate what you mean by the first value and second value in my dataset. Are you talking about in my GTF file? Also, would by chance have any examples of what it should look like?

jennaj · February 5, 2019, 4:46pm

A plain text file with two columns separated by a “tab” character, no extra whitespace.

first_value <hidden_tab> second_value

See “Tabular” in these FAQs:

Topic		Replies	Views
Sequence to gene name usegalaxy.org support uniprot , mapping , blast , annotatemyids , reference-annotation , feature-annotation , salmon	2	849	July 23, 2019
ref_gene_id featurecounts usegalaxy.org support	6	3171	May 22, 2019
How can i extract gene name from custom GTF file? transcriptomics	1	301	March 12, 2024
Salmon Output File "Name" column gene id format usegalaxy.org support transcriptomics , salmon	3	281	July 25, 2023
DESeq2 Returning Nucleotides As Gene ID usegalaxy.org support ncbi	4	426	October 26, 2022

How to get specific genes to show up - Mapping or replacing gene identifiers

Related topics