I’m fairly new to this, so I’m sorry if this is a newbie question.
I tried to download and use the Version 19 16S rRNA Reference (PDS) instead of the older provided taxonomy files in the Galaxy 16S Microbial Analysis with mothur (extended) tutorial (Hands-on: 16S Microbial Analysis with mothur (extended) / 16S Microbial Analysis with mothur (extended) / Microbiome ) and the Classify.seqs step failed. At first it wouldn’t run because the file type was wrong, but I fixed that (I think). Is there something else I need to do to get Classify.seqs to work with the new reference files in the tutorial?
I got the following error for the failed taxonomy file:
/corral4/main/jobs/058/867/58867357/tool_script.sh: line 23: 11594 Done echo ‘classify.seqs( fasta=fasta.dat, reference=alignment.template.dat, taxonomy=tax.taxonomy.dat, method=wang, ksize=8, iters=100, cutoff=80, probs=true, count=count.dat, relabund=false, output=simple, printlevel=-1, processors=’${GALAXY_SLOTS:-8}’ )’
11595 | sed ‘s/ //g’
11596 Segmentation fault (core dumped) | ./mothur
11597 | tee mothur.out.log
For the “segmentation fault” type of message, that can be for a few reasons
Input issue: format, content, size – could be the query or the target
Parameter issue – sometimes default choices don’t “fit the data” well. The tool Help section has resource links, and general scientific use/logic is the same in Galaxy since it the same underlying tool.
The job was so large that it overwhelmed the cluster node the job executed on, and it died. The cluster resources at all of the public Galaxy servers are significant but it is still possible to create a job that is “too large”.
What to try
Check the full logs on the i-info job details page. Sometimes more is reported there. You can also see a peak of all your files and parameters here in a table.
Run a very small query through the tool (could be the tutorial data, or a subset of your data). If it works, then you’ll learn that the full dataset is too large to process and the custom reference files are fine.