Genes to transcript file for building expression matrix/ DEseq2/ edgeR differential expression

Satya_Mitra · March 8, 2020, 4:50pm

I assembled a de novo transcriptome using Trinity (Bridges) at usegalaxy.org and obtained a fasta, log, and a gff file. I ran transdecoder on my assembly and obtained cds and peptide fasta files…
How do I obtain a genes ID to transcript ID file for my trinity transcripts? I tried making a blast database and blastp/x with my transdecoder outputs (which is taking forever). Blat only allows 25 sequences at a time.I have >100k.
Is there a way I can do this on usegalaxy?
Thanks

jennaj · March 12, 2020, 1:54am

Hello @Satya_Mitra

It sounds like you are still using the older version of Trinity. A new version is available. Trinity itself will now output a genes-to-transcript mapping.

I added more details to a prior post about this same topic. Please review. If you are mixing the older Trinity output with the newer tools included in the updated tool suite, unexpected issues may come up. It would be better to use all of the updated tools to avoid technical problems (errors) or scientific problems (content) – and the latter may be difficult to detect.

Start there, then come back and ask another question if you are still stuck.

Do keep in mind that very large unfiltered assembly datasets may not process well at public servers due to exceeding resources (or you may run into other limits, like the web-Blat query at UCSC). But there are methods to reduce the data in meaningful ways. Review the Trinity suites tool forms – each explains the expected inputs and which tools produce them, and those all correspond to the original Trinity tool author’s workflow guidance (linked into each tool form in the help section).

If your work is actually too large to run at public Galaxy servers, you can set up your own Galaxy and allocate sufficient resources. There are a few ways to do this. Start with the resources below if this option interests you.

Overview: Galaxy Choices - Galaxy Community Hub
Platforms: Galaxy Platform Directory: Servers, Clouds, and Deployable Resources - Galaxy Community Hub
Cloudman Galaxy is a popular option for scientists: Amazon Web Services (AWS) - Galaxy Community Hub. AWS has a simple grant program that can help cover costs.

Thanks!

Satya_Mitra · March 13, 2020, 2:18pm

Thanks for your reply. I do see the genes-to-transcripts file appearing on the newer version of Trinity. Wish we could remove the old version…
Regarding capacity, I also seem to have a ghost using 38GB in my account if you calculate the difference between the number on hovering in the status bar in the top right corner and the running tally in my history. That also seems to hit pause on new analyses before I hit the real cap.
Any suggestions?
Thanks

jennaj · March 13, 2020, 5:25pm

HI @Satya_Mitra

Glad this resolved the original issue!

The old version will be addressed soon. Once done, this ticket will close out and things will be less confusing: Update Trinity once in MTS to capture a new tool wrapper in the suite · Issue #143 · galaxyproject/usegalaxy-playbook · GitHub

Please see this FAQ for how to locate and manage all of your account’s data. Sometimes just logging out then back in again is enough. Sometimes more is required. Both are covered in detail:

The account usage quota seems incorrect

Thanks!

Satya_Mitra · March 14, 2020, 1:25pm

Thanks for replying.
While Trinity is generating a genes-to-transcripts file, the job is waiting to run for ~2 days.
Can you help?
Thanks,
Satya

jennaj · March 16, 2020, 5:29pm

Hi @Satya_Mitra Has your job completed by now?

Satya_Mitra · March 18, 2020, 1:52am

No, it was held up for 2 days doing nothing.
At that point I moved on with HISAT2. Thanks for your help

jennaj · March 18, 2020, 6:08pm

@Satya_Mitra

Odd. Yesterday when you asked, we checked the cluster these jobs execute on to make sure there wasn’t a problem. Trinity and other tools that run on that specific cluster resource are processing normally from the usegalaxy.org server. Meaning, there are currently no known server-side or cluster issues.

All resources are busy, but that is normal. The optimal job execution strategy remains the same: get jobs queued up then allow them to fully process. Avoid deleting/restarting, as that only places jobs back at the end of the queue again (further extending wait time).

A job queued 4 days ago should have at least started by now. The exception could be if you had a very large number of large jobs all queued at the same time (~6 or more) – for situations like that, a few jobs will run, then a few more, repeating until all are completed. This “batching” allows for fair access to computational resources for all people working on the same server.

FAQ that may help next time: Datasets and how jobs execute

If you think jobs are not queuing and executing properly, in your account specifically, that cannot be explained by the advice above or in the FAQ – or even if you are not sure (some of it is a bit technical) – please let us know. We can certainly take a closer look and try to help.

Topic		Replies	Views
Is it possible to download 'Trinity.fasta.gene_trans_map' from Galaxy Trinity? usegalaxy.org support	2	1984	March 12, 2020
I am not able find Genes to transcript map tool usegalaxy.org support	1	294	February 9, 2022
Generate gene to transcript map: WARNING output usegalaxy.eu support text-manipulation , troubleshooting , fastqsanger , salmon	8	2382	February 29, 2020
What to do with trinity output files public-galaxy-server	1	1254	November 21, 2019
Trinity problems? Use rnaSPAdes instead! Resources resources , tool-help , trinity , rnaspades	14	451	April 13, 2024

Genes to transcript file for building expression matrix/ DEseq2/ edgeR differential expression

Related topics