RPStblastn subject defline(s)

Hi,
I use the RPStblastn tool.
The output I set is an XML BLAST. I would like the ‘query-def’ to be extracted to ‘query-id’ with the advanced option ‘Should the query and subject defline(s) be parsed?’ set to ‘yes’ (as for command line).
This doesn’t work.
When I compare the output with and without the option, I get exactly the same output, as if the option was not taken into account.
Is this normal?

Here’s an example:
How it should work:
Without option:
<Iteration_query-ID>Query_1</Iteration_query-ID>
<Iteration_query-def>NODE_1_length_506_cov_10.687361</Iteration_query-def>
With option:
<Iteration_query-ID>NODE_1_length_506_cov_10.687361</Iteration_query-ID>
<Iteration_query-def>No definition line</Iteration_query-def>

Thanks

Hi @Marie_Lefebvre

These seem different to me, so I think option is being applied.


<Iteration_query-ID>Query_1</Iteration_query-ID>

versus

<Iteration_query-ID>NODE_1_length_506_cov_10.687361</Iteration_query-ID>


then


<Iteration_query-def>NODE_1_length_506_cov_10.687361</Iteration_query-def>

versus

<Iteration_query-def>No definition line</Iteration_query-def>


Everything on fasta > title lines before the first whitespace is the “identifier”, and everything after is the “description”. This is how it works everywhere, not just Galaxy and not just BLAST.

So, it looks like the tool attempted to split the title line on the first whitespace, only found one value, and sorted that out differently between the two XML tags based on that advanced option. The first version has a tiny bit more information (auto-generated unique key for the query sequence). This option might matter more if the sequence identifiers were public keys that you wanted to do something with, or if the query fasta actually had a meaningful description, but since this data doesn’t have anything, this seems like a preference. :slight_smile:

Hope this helps!

Hi @jennaj

In fact, in the example it works because that is what I get on the command line.
But when I use the tool in Galaxy I get:
With option

  <Iteration_query-ID>Query_1</Iteration_query-ID>
  <Iteration_query-def>NODE_1_length_506_cov_10.687361</Iteration_query-def>

Without option

  <Iteration_query-ID>Query_1</Iteration_query-ID>
  <Iteration_query-def>NODE_1_length_506_cov_10.687361</Iteration_query-def>

Which is absolutely identical, which is why I opened this topic.
I hope this clarifies my previous explanations.

Cross reference RPStblastn subject defline(s) option not taken into account · Issue #165 · peterjc/galaxy_blast · GitHub where I suggested this might depend on how the database was built (with or without the `-parse_seqids option).

Does anyone know where the PFAM-A blast database on https://usegalaxy.eu/ was from, or how it was built?