Blastn issue; what is the alternative tool?

Sutrisha_Kundu · April 24, 2024, 2:55pm

I have a trinity assembly with greater than 200,000 sequences. I wished to run it in blast2go after doing the transdecoder program. However, after completing transdecoder, i found that the blast2go is no longer available in galaxy community. So, i ran blastn separately. However, since my assembly contains much greater than 200,000 seqeunces, i had enquired earlier at “usegalaxy-eu” and i have been informed that since my data is huge; blastn might never finish running. So, I have been advised to communicate here regarding which tool should i use for further analysis. Could you please tell me how to annotate the CDS or the assembled nucleotide sequences? Which tool should i use?

jennaj · April 24, 2024, 6:54pm

Hi @Sutrisha_Kundu

I would suggest splitting the query sequences up into smaller batches. If you output tabular results from BLASTN, those could be concatenated after.

Tools involved:

Split file to dataset collection
BLASTN
Concatenate datasets tail-to-head or Collapse Collection into single dataset in order of the collection

You might need to experiment to see how many collection elements (files) are needed to break up the data into jobs that will run on the public clusters. Also, be careful with the BLASTN parameters – it is very easy to “blow up” the results by setting the match criteria as too permissive. You can always filter your results and run BLASTN again on a smaller set of target sequences if interested in sub hits (get rid of reads that only capture non-specific hits).

Hope this helps!

Sutrisha_Kundu · April 25, 2024, 6:31am

Thank you. I will try to follow these steps. If I face any problem, I will contact you further. Please assist me.

Sutrisha_Kundu · April 25, 2024, 11:52am

I have split file to 50 dataset collection. Then, I have run blastn with 1 file. But still, the blastn jobs are still running. The job is yellow in colour. When will it finish? Is it running ok?

jennaj · April 25, 2024, 6:18pm

Hi @Sutrisha_Kundu

It sounds like the jobs are executing. These would process like any other tool, and turn green at the end once done. Since you are running a collection, those jobs will process individually and have different states until done. FAQ: Understanding job statuses

However, after talking with an EU person that helped you before, it came up that running a BLAST against a larger public mixed reference is probably not going to produce the results you are most interested in.

To get your assembled transcripts annotated, using annotation tools as described in this tutorial may be a better choice. Hands-on: De novo transcriptome assembly, annotation, and differential expression analysis / Transcriptomics. Maybe also scroll up to the prior section that covers assembly if you haven’t done any post-assembly quality filtering yet.

Topic		Replies	Views
How to use Trinotate output? usegalaxy.eu support assembly , annotation	1	1675	February 21, 2020
Genes to transcript file for building expression matrix/ DEseq2/ edgeR differential expression usegalaxy.org support cloudman , assembly , cloudlaunch	7	1457	March 18, 2020
What are the RNA-seq data processing steps in Galaxy according to de novo approach? gtn-tutorial , salmon	1	977	June 19, 2020
Megablast jobs have been running for 21 days now usegalaxy.eu support workflow , toolshed	1	519	August 12, 2019
De novo transcriptome assembly, annotation, and differential expression analysis usegalaxy.org support galaxy-local , queued-gray-datasets	5	494	June 30, 2022

Blastn issue; what is the alternative tool?

Related topics