Hi @wee
Good questions! Let’s go through each.
The output description is at NCBI but you can also find it down in the Help section on any of the BLAST+ tool forms. I’ve quoted that here:
Output format
Because Galaxy focuses on processing tabular data, the default output of this tool is tabular. The standard BLAST+ tabular output contains 12 columns:
Column |
NCBI name |
Description |
1 |
qaccver |
Query accession dot version |
2 |
saccver |
Subject accession dot version (database hit) |
3 |
pident |
Percentage of identical matches |
4 |
length |
Alignment length |
5 |
mismatch |
Number of mismatches |
6 |
gapopen |
Number of gap openings |
7 |
qstart |
Start of alignment in query |
8 |
qend |
End of alignment in query |
9 |
sstart |
Start of alignment in subject (database hit) |
10 |
send |
End of alignment in subject (database hit) |
11 |
evalue |
Expectation value (E-value) |
12 |
bitscore |
Bit score |
Until BLAST+ 2.5.0, the first two columns were qseqid and sseqid, which were usually strings contained multiple pipe-separated entries. In BLAST+ 2.5.0, the first two columns became qacc and sacc (accesion only), while in BLAST+ 2.6.0 this was changed again to use qaccver and saccver (accession dot version).
The BLAST+ tools can optionally output additional columns of information, but this takes longer to calculate. Many commonly used extra columns are included by selecting the extended tabular output. The extra columns are included after the standard 12 columns. This is so that you can write workflow filtering steps that accept either the 12 or 25 column tabular BLAST output. Galaxy now uses this extended 25 column output by default.
Column |
NCBI name |
Description |
13 |
sallseqid |
All subject Seq-id(s), separated by a ‘;’ |
14 |
score |
Raw score |
15 |
nident |
Number of identical matches |
16 |
positive |
Number of positive-scoring matches |
17 |
gaps |
Total number of gaps |
18 |
ppos |
Percentage of positive-scoring matches |
19 |
qframe |
Query frame |
20 |
sframe |
Subject frame |
21 |
qseq |
Aligned part of query sequence |
22 |
sseq |
Aligned part of subject sequence |
23 |
qlen |
Query sequence length |
24 |
slen |
Subject sequence length |
25 |
salltitles |
All subject title(s), separated by a ‘<>’ |
The third option is to customise the tabular output by selecting which columns you want, from the standard set of 12, the default set of 25, or any of the additional columns BLAST+ offers (including species name).
The function per gene is updated, while the gene structure itself is usually not. That means the protein database you are mapping against changes less frequently, and BLAST only cares about that part of it. The meaning (function) is another layer that is added in later.
You can try with a tool like annotateMyIDs when the target is one of the supported types, and the mapping is against a single species. For Swiss-prot, this won’t work, and you would need to pull in annotation files from the database source, then merge based on common column identifiers (hit sequence ID). The tool NCBI Datasets Gene might work with Swiss-prot IDs, but I’m not sure, so you can try.
Not the predicted sequence itself from tabular output, but you can from the XML using Parse blast XML output. You could also explore something like BlastXML to gapped GFF3 for more details.
Other options for bacterial assembly annotation include:
-
Bakta.
-
Methods like this one → Hands-on: Bacterial Genome Annotation / Bacterial Genome Annotation / Genome Annotation
-
Publications. Replicating methods is usually possible.
Hope this helps! 