Hi
I want to analyse genome sequence of a bacterial strain (say X). Now I want to see of X bacteria has proteins present in other standard bacterial strains like Y, Z. How do I individually search for these proteins against the genome sequence of X using Galaxy?
I would first try a tBLASTN+ mapping, with the protein as the query and the genomes as the target. This is a targeted search.
NCBI BLAST+ tblastn Search translated nucleotide database with protein query sequence(s)
I would also probably create a makeblastdb index of the genomes (individually) .. but whether I bothered to do the extra preparation step would depend on the size of the genomes, how many times I plan to map against them this way (how many proteins I had, or how many remapping repetitions I plan on doing), and whether the original job ran into memory or timeout errors or not (the index can help avoid both).
We don’t have a tutorial for this, but the tool form instructions have details and the process is about the same as anywhere else, so the BLAST+ documentation should be mostly the same usage. Also, I’ve been doing specifically this across organisms, both on the command-line and in Galaxy, for longer than I want to state, so please ask questions if you get stuck and we can try to help!
Ah, I found a very clean small example history. Please see dataset 76 in here, I just started it up fresh. I used a built-in index but the other prior examples in here use a small makeblastdb result as an example of that part.