setting up GFF3 output and fasta chunks
doing tblastx of alt-ESTs
couldn't close /tmp/maker_OjPCEj/0/CM044431%2E1.0
No space left on device at /usr/local/bin/../lib/FastaFile.pm line 60.
--> rank=NA, hostname=galaxy-main-set03-14.novalocal
ERROR: Failed while doing tblastx of alt-ESTs
ERROR: Chunk failed at level:4, tier_type:3
FAILED CONTIG:CM044431.1
ERROR: Chunk failed at level:4, tier_type:0
FAILED CONTIG:CM044431.1
examining contents of the fasta file and run log
In short, the cluster node was not able to handle the data. That topic above has links to some resources and discussion about strategies. Maybe compare to the example workflow and explain what you are doing different? You can also share the job and I’ll double check that these are the same.
Sorry for getting back to you so late. I attempted to condense my transcript files as well as run the job on the EU server, and after a month the job simply returned “job failed”. I shared the job history to you. I masked repeats locally and uploaded a masked version of the genome to galaxy. The job was thusly run with RepeatMasker disabled, as well as a file with protein sequences from a closely related species and a file with transcripts also from a closely related species. The transcripts were obtained from NCBI TSA in bulk. The first time I ran maker with the bulk transcripts was when it failed, so I performed clustering with mmseqs2 and filtering with seqkit. I also used TransDecoder to keep only coding regions, and cutadapt to remove adapters. Could you please advise if there’s an issue with my pipeline here causing the run fails, or if I should try performing multiple annotations with parts of the transcript evidences at a time? Thank you so much for your help.
It sounds like you have done quite a bit of troubleshooting!
Are you working with all chromosomes or just one for these tests? (Sorry, I forgot but was it four total before?). You could limit this to just one until you start to get a successful result.
Then, I guess the next option would be to explore how closely related that other species actually is and the quality of those annotations. Do these align with a tool like BLAST to your assembly?What if you subsample the annotation (protein + transcript for some smaller set/genomic region)? Will the tool produce any clues in the job logs?