Use blastx on Galaxy

bart_joosten · October 26, 2021, 9:35am

Hi,

I’m new to RNA seq and I’m trying to use blastx on galaxy to blast a fasta file of possibly novel lncRNAs using the ncbi nr database. I want to remove transcripts with significant homology to known proteins with: e-value < 1e-10, target coverage > 80%, and identity > 90%, so I can preserve transcripts that are most likely lncRNAs. I tried to change the e-value/expectation value to 0.0000000001 and query coverage to 80% but I cannot change the pident setting (percentage of identical matches). How do I do this? Also, when I run the tool on default settings it is running for more than a day.

gallardoalba · October 27, 2021, 9:05am

Hi @bart_joosten,
could you describe to me your pipeline until now? Do you have raw RNA-seq data or assembled transcripts? Which species do the samples correspond to?

Regards

wm75 · October 27, 2021, 9:29am

In addition to @gallardoalba 's question -
yes, runtime can be substantial for this tool and will depend on how many input sequences you’re trying to match against the database. This is simply how it is.
Regarding your identity threshold: % identity is part of the output you’re getting? If so, you can use standard Galaxy Filter (and possibly Text Processing) tools to perform posterior filtering.

gbbio · October 27, 2021, 9:51am

I can also add my two cents. As already mentioned it is expected that it takes long. The ncbi nt and nr database are growing insanely fast. If I need to blast a large amount of sequences I always make a subselection first. So I only blast against homo sapien sequences for example. But this may be not so easy to do in galaxy.

As an alternative you could check out the diamond tool. I dont have experience with it but they claim to be 2,500 times faster then blastx. I think the key to your question is what @gallardoalba is suggesting. You could reduce your input by removing duplicate sequences for example.

bart_joosten · October 27, 2021, 12:51pm

@gallardoalba Hi, thanks for your reply. I’m trying to find novel lncRNAs. I have used fastq files from human samples and performed trimming (trimmomatic), alignment (hisat2), assembly (stringtie) and have used the merged assembled files (stringtie-merge) in the FEELnc package to find potential new lncRNAs (outputted in a GTF File). I translated the GTF file with possible novel lncRNAs into a FASTA file and uploaded this into galaxy to use with the parameters that I have specified. As has been mentioned, the running time is very long (now 2 days) because my FASTA file is very large. When I tried to use blastx from NCBI (blastx: search protein databases using a translated nucleotide query), it would only allow files with total query length of 100k, so I understand what’s causing the delay.

@wm75 yeah you are correct if I can get an output from blastx on galaxy I’ll perform posterior filtering. Thanks for the suggestion.

bart_joosten · October 27, 2021, 12:51pm

@gbbio I like your suggestion of using the diamond tool! It has all the parameters that I want to use on blastx and should perform faster. I’m trying it right now on galaxy.

bart_joosten · October 29, 2021, 9:36am

@gbbio I have tried using diamond and it works well. However the diamond output in default outputs matching protein transcripts from all taxi but I am of course only interested in human transcripts but I’m having some problems selecting the human taxon id in the diamond tool. Entering the human taxon id 9606 gives an error in the diamond aligner tool (use --taxonmap parameter for the makedb command).

From what I understand I have to make a database using the NCBI nr protein database as a Fasta file and several other files (taxonmap, taxonnames, taxonnodes) specified in galaxy in the diamond makedb tool. However, I have some problems uploading these files. When I try to upload the file for the taxonmap in the choose remote file option (ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz) it gives me an error: Unsupported Media Type (415). So I was wondering: do you know how to solve this problem?

gallardoalba · October 29, 2021, 4:19pm

Hi @bart_joosten,
I tried to upload the file, and I didn’t find any problem history. How did you try to upload it?

Regards

bart_joosten · October 29, 2021, 10:15pm

@gallardoalba I tried it with choose remote file option and it worked perfectly

jennaj · September 21, 2022, 4:36pm

A post was split to a new topic: Troubleshooting use of FEELnc

Topic		Replies	Views
Select human transcripts on the Diamond aligner blast	1	340	March 29, 2022
NCBI BLAST+ blastx custom-genome , mapping , tool-help , ncbi_blastx_wrapper	1	22	October 10, 2024
NCBI blast vs blast+ @ Galaxy	0	539	July 28, 2019
Request to update BLASTP nr database usegalaxy.eu support reference-index , blastp	5	22	March 14, 2025
blastp against nr database database , mapping , blast , blastp	6	3673	October 14, 2019

Use blastx on Galaxy

Related topics