Hello! Trying to get galaxy to search existing databases for the transcript type for each gene symbol found in 15x csv files. Have tried annotatemyID but there’s no readout for that. Does anyone know of a plugin or approach I could try please?
I’ve also been messing around with using AI to write python code to directly query MyGene.Info & Ensembl, unsuccessfully.
Welcome, @Verdani
Each gene could have several transcripts, yes? Are you just looking for all the known types that have been identified? An annotation source like UCSC or Ensembl or even NCBI host annotation tracks. These are usually in GTF or GFF3 files and you could intersect those with a list of gene identifiers.
This tool pulls in some known annotation for a few model organisms. If your organism is not supported with a native index, yes, it will not create the full output.
If you can retrieve the annotation in a standard format, intersecting that with a list of gene identifiers, or genomic coordinates, is definitely possible in Galaxy.
A few of the introduction tutorials here are performing manipulations that are similar: comparing two data files together to associate labels and features.
And these tutorials are covering generalized text manipulations, both with web tools, then in R and other interactive environments.
- Data Manipulation Olympics at GTN Materials Search
To help more, my questions are
- What genome assembly are you working with?
- What are the current gene identifiers?
- Where is the source of the annotation that you want to associate?
Let’s start there, thanks!