query sequence to new custom genomic scaffold

duncan-d · January 4, 2022, 5:32pm

Hi. It’s been a couple years since I used Galaxy regularly so I’m a bit rusty. I have just uploaded a custom genomic scaffold file in fasta format to my history and I need to know how to process the file so that I can query it looking for promoter regions upstream of two common genes (cytoplasmic actin, polyubiquitin) using the coding sequence from closely related species. Thanks for any help!

Flow · January 5, 2022, 9:04am

Dear @duncan-d,
You would probably need to do a genome assembly. Galaxy has a big section regarding this topic Galaxy Training!. Maybe start with the introduction tutorial: An Introduction to Genome Assembly.

If you have your contigs, you can investigate your promotors. If it is a genome that is well studied, then you could use annotated promotors from a database. If you look for de-novo promotors or have a less prominent genome (e.g., procaryotes), then you could look for typical promotor sequences of your organism.

Have a good day and best wishes,
Florian

duncan-d · January 5, 2022, 3:29pm

Hi Florian

Thanks for the quick response! What I have is an unannotated, assembled genome from another lab. The genome is Schistocerca americana (American locust), which has a very large genome full of repetitive DNA. There are 1700 sequence contigs in a fasta file. I’ve managed to use the NCBI discontinuous megablast to find the polyubiquitin gene using Drosophila coding sequence, but of course it doesn’t give me the upstream sequence data. I tried opening the file in IGV but it tells me that I’m missing a .fai index file and it can’t recapitulate that file. I emailed the lab asking for the original .fai file but haven’t heard back yet. Is there a way to regenerate that index file with just the fasta file I have?

Dianne

(redacted contact info for privacy)

jennaj · January 7, 2022, 12:15am

Hi @duncan-d

Galaxy creates fa.fai indexes at runtime (when needed).

You can also create the index directly and have it show up as a dataset in your history.

The help in this topic covers all the use cases (any fasta, not just from Unicycler): Opening Unicycler assemblies with IGV local

Hope that helps

Topic		Replies	Views
Bam file to fasta file - Genome assembly usegalaxy.org support genome , assembly	3	4744	February 6, 2019
Extract a subsequence from the whole genome assembly? usegalaxy.eu support custom-genome , data-manager	11	1553	June 11, 2021
Is Samtools Faidx Available on Galaxy US? usegalaxy.org support troubleshooting , igv	4	243	April 9, 2024
Tool request: Get data from Genbank/RefSeq by accession usegalaxy.org support tool-dev	1	545	August 6, 2019
RNA Star: Can I generate a temporary index with files from previous assemblies? reference-annotation , reference-genome	2	128	May 13, 2024

query sequence to new custom genomic scaffold

Related topics