Automatically acquiring and adding NCBI data

I need to be able to add the map locations of genes to a transcription data set. If there is anything that could read an identifier for a certain gene in a certain e.coli and add it’s listed map location that would be great.

1 Like

Hi @Jacob_DeVries

The position of genes/transcripts in a reference annotation dataset (GTF, GFF3) will be with respect to the reference genome.

iGenomes hosts several E. coli genomes/builds: https://support.illumina.com/sequencing/sequencing_software/igenome.html

Download the archive, unpack it, then update the files you want to work with to Galaxy.

None of these are based on the UCSC version of the genome that is indexed at Galaxy Main https://usegalaxy.org, so load just not the annotation GTF data but the genome fasta as well. DO not use/assign the pre-indexed genomes to your data, or expect mismatch problems.

Then use the genome fasta as a Custom Genome+Build with tools as needed. The fasta may need to be run through NormalizeFasta first. The genome and annotation need to be an exact match at the chromosome level.

Details in these FAQs:

Galaxy tutorials for reference:

It is not clear what format your inputs are in or if you assembled the transcriptome yourself. Please explain more about your data (how created and/or source) and your larger goals if the above doesn’t help.

Thanks!