Understanding BLASTX Tabular Output Against Swiss-Prot (25 Columns)

wee · July 6, 2025, 2:22pm

Hi everyone,

I’ve used the NCBI BLAST+ blastx tool to compare an assembled bacterial genome against the Swiss-Prot database. The output is in tabular format with a total of 25 columns.

I have a few questions:

Could someone help explain the meaning or description of columns 13 to 25?
I noticed there’s no column directly describing gene function. Why is that the case?
Is it possible to retrieve the predicted protein sequences from any of the output files generated during the run?

Thank you in advance for your help!

jennaj · July 8, 2025, 1:16am

Hi @wee

Good questions! Let’s go through each.

The output description is at NCBI but you can also find it down in the Help section on any of the BLAST+ tool forms. I’ve quoted that here:

Output format

Because Galaxy focuses on processing tabular data, the default output of this tool is tabular. The standard BLAST+ tabular output contains 12 columns:

Column	NCBI name	Description
1	qaccver	Query accession dot version
2	saccver	Subject accession dot version (database hit)
3	pident	Percentage of identical matches
4	length	Alignment length
5	mismatch	Number of mismatches
6	gapopen	Number of gap openings
7	qstart	Start of alignment in query
8	qend	End of alignment in query
9	sstart	Start of alignment in subject (database hit)
10	send	End of alignment in subject (database hit)
11	evalue	Expectation value (E-value)
12	bitscore	Bit score

Until BLAST+ 2.5.0, the first two columns were qseqid and sseqid, which were usually strings contained multiple pipe-separated entries. In BLAST+ 2.5.0, the first two columns became qacc and sacc (accesion only), while in BLAST+ 2.6.0 this was changed again to use qaccver and saccver (accession dot version).

The BLAST+ tools can optionally output additional columns of information, but this takes longer to calculate. Many commonly used extra columns are included by selecting the extended tabular output. The extra columns are included after the standard 12 columns. This is so that you can write workflow filtering steps that accept either the 12 or 25 column tabular BLAST output. Galaxy now uses this extended 25 column output by default.

Column	NCBI name	Description
13	sallseqid	All subject Seq-id(s), separated by a ‘;’
14	score	Raw score
15	nident	Number of identical matches
16	positive	Number of positive-scoring matches
17	gaps	Total number of gaps
18	ppos	Percentage of positive-scoring matches
19	qframe	Query frame
20	sframe	Subject frame
21	qseq	Aligned part of query sequence
22	sseq	Aligned part of subject sequence
23	qlen	Query sequence length
24	slen	Subject sequence length
25	salltitles	All subject title(s), separated by a ‘<>’

The third option is to customise the tabular output by selecting which columns you want, from the standard set of 12, the default set of 25, or any of the additional columns BLAST+ offers (including species name).

The function per gene is updated, while the gene structure itself is usually not. That means the protein database you are mapping against changes less frequently, and BLAST only cares about that part of it. The meaning (function) is another layer that is added in later.

You can try with a tool like annotateMyIDs when the target is one of the supported types, and the mapping is against a single species. For Swiss-prot, this won’t work, and you would need to pull in annotation files from the database source, then merge based on common column identifiers (hit sequence ID). The tool NCBI Datasets Gene might work with Swiss-prot IDs, but I’m not sure, so you can try.

Not the predicted sequence itself from tabular output, but you can from the XML using Parse blast XML output. You could also explore something like BlastXML to gapped GFF3 for more details.

Other options for bacterial assembly annotation include:

Bakta.
Methods like this one → Hands-on: Bacterial Genome Annotation / Bacterial Genome Annotation / Genome Annotation
Publications. Replicating methods is usually possible.

Hope this helps!

wee · July 9, 2025, 8:07am

Thanks so much

jennaj · July 9, 2025, 5:36pm

Ah, yes, the sequence descriptions are a good place to start. You can layer in more later if you want to.