RPStblastn subject defline(s)

Marie_Lefebvre · April 26, 2024, 1:32pm

Hi,
I use the RPStblastn tool.
The output I set is an XML BLAST. I would like the ‘query-def’ to be extracted to ‘query-id’ with the advanced option ‘Should the query and subject defline(s) be parsed?’ set to ‘yes’ (as for command line).
This doesn’t work.
When I compare the output with and without the option, I get exactly the same output, as if the option was not taken into account.
Is this normal?

Here’s an example:
How it should work:
Without option:
<Iteration_query-ID>Query_1</Iteration_query-ID>
<Iteration_query-def>NODE_1_length_506_cov_10.687361</Iteration_query-def>
With option:
<Iteration_query-ID>NODE_1_length_506_cov_10.687361</Iteration_query-ID>
<Iteration_query-def>No definition line</Iteration_query-def>

Thanks

jennaj · April 26, 2024, 5:46pm

Hi @Marie_Lefebvre

These seem different to me, so I think option is being applied.

<Iteration_query-ID>Query_1</Iteration_query-ID>

versus

<Iteration_query-ID>NODE_1_length_506_cov_10.687361</Iteration_query-ID>

then

<Iteration_query-def>NODE_1_length_506_cov_10.687361</Iteration_query-def>

versus

<Iteration_query-def>No definition line</Iteration_query-def>

Everything on fasta > title lines before the first whitespace is the “identifier”, and everything after is the “description”. This is how it works everywhere, not just Galaxy and not just BLAST.

So, it looks like the tool attempted to split the title line on the first whitespace, only found one value, and sorted that out differently between the two XML tags based on that advanced option. The first version has a tiny bit more information (auto-generated unique key for the query sequence). This option might matter more if the sequence identifiers were public keys that you wanted to do something with, or if the query fasta actually had a meaningful description, but since this data doesn’t have anything, this seems like a preference.

Hope this helps!

Marie_Lefebvre · April 29, 2024, 2:15pm

Hi @jennaj

In fact, in the example it works because that is what I get on the command line.
But when I use the tool in Galaxy I get:
With option

  <Iteration_query-ID>Query_1</Iteration_query-ID>
  <Iteration_query-def>NODE_1_length_506_cov_10.687361</Iteration_query-def>

Without option

  <Iteration_query-ID>Query_1</Iteration_query-ID>
  <Iteration_query-def>NODE_1_length_506_cov_10.687361</Iteration_query-def>

Which is absolutely identical, which is why I opened this topic.
I hope this clarifies my previous explanations.

peterjc · April 29, 2024, 5:14pm

Cross reference RPStblastn subject defline(s) option not taken into account · Issue #165 · peterjc/galaxy_blast · GitHub where I suggested this might depend on how the database was built (with or without the `-parse_seqids option).

Does anyone know where the PFAM-A blast database on https://usegalaxy.eu/ was from, or how it was built?

Topic		Replies	Views
Filter reference/subject sequences based on mapping usegalaxy.eu support text-manipulation , mapping , blast	3	201	December 12, 2023
NCBI BLAST+ rpstblastn mapping , blast , tool-help	2	128	April 25, 2024
Make custom blastdb for rpstblastn usegalaxy.eu support tool-help , ncbi_makeprofiledb	2	11	September 18, 2024
FASTA Header Changes not shown in BLAST database usegalaxy.eu support dataset , troubleshooting , blast	3	645	February 23, 2022
Please run Megablast as an option with BLASTN usegalaxy.eu support mapping , blast , tool-help , ncbi_blastn_wrapper , megablast_wrapper	8	95	August 18, 2024

RPStblastn subject defline(s)

Related topics