Hi. It’s been a couple years since I used Galaxy regularly so I’m a bit rusty. I have just uploaded a custom genomic scaffold file in fasta format to my history and I need to know how to process the file so that I can query it looking for promoter regions upstream of two common genes (cytoplasmic actin, polyubiquitin) using the coding sequence from closely related species. Thanks for any help!
Dear @duncan-d,
You would probably need to do a genome assembly. Galaxy has a big section regarding this topic Galaxy Training!. Maybe start with the introduction tutorial: An Introduction to Genome Assembly.
If you have your contigs, you can investigate your promotors. If it is a genome that is well studied, then you could use annotated promotors from a database. If you look for de-novo promotors or have a less prominent genome (e.g., procaryotes), then you could look for typical promotor sequences of your organism.
Have a good day and best wishes,
Florian
Hi Florian
Thanks for the quick response! What I have is an unannotated, assembled genome from another lab. The genome is Schistocerca americana (American locust), which has a very large genome full of repetitive DNA. There are 1700 sequence contigs in a fasta file. I’ve managed to use the NCBI discontinuous megablast to find the polyubiquitin gene using Drosophila coding sequence, but of course it doesn’t give me the upstream sequence data. I tried opening the file in IGV but it tells me that I’m missing a .fai index file and it can’t recapitulate that file. I emailed the lab asking for the original .fai file but haven’t heard back yet. Is there a way to regenerate that index file with just the fasta file I have?
Dianne
(redacted contact info for privacy)
Hi @duncan-d
Galaxy creates fa.fai
indexes at runtime (when needed).
You can also create the index directly and have it show up as a dataset in your history.
The help in this topic covers all the use cases (any fasta
, not just from Unicycler
): Opening Unicycler assemblies with IGV local
Hope that helps