Proteogenomics fasta file generation w CustomProDB

Hi,

I’m new to Galaxy and trying to create a FASTA file from transcriptomics data. However, I’m encountering an error when using CustomProDB:

“This job has terminated because it used more memory than it was allocated. Please click the bug icon to report this problem if you need help.”

You can view my history here: Galaxy

I would greatly appreciate any help.

Welcome, @Odysseus_1

This error message usually means that there is a data or parameter problem to resolve. It can happen with any tool. In short, the requested analysis completely overwhelmed the underlying tool and it quit out without more details.

Thanks for sharing your history, I’m reviewing right now, and let us know if you solved this already! More soon. :slight_smile:

Hello @Odysseus_1

The root problem is that the simplified processing in the tutorial you are following doesn’t include all of the steps needed for working with read data. It will be fine to show you the basic outline of steps using the tutorial example data, but with full sized data, you’ll need to do a bit more or there will be a lot of noise in the data that this tool doesn’t understand how to handle.

For read QA see this tutorial, or you can use a workflow that applies the same manipulations but for batch data (one or more pairs in a collection).

  • Hands-on: Quality Control / Quality Control / Sequence analysis

  • Go to Workflows → Public workflows and search with the keyword quality. You can run this directly without even importing it, and input the paired end collection output from the NCBI Faster tool that you already have. One pair or 100 pairs, this will all work fine.

  • The output will be some reports about your data, then the trimmed reads ready to use with BWA-MEM. These may look great, then you are ready for the next step. In your case that is mapping the reads.

  • If the reports do not look good, then import the workflow, add more filters to Cutadapt, save, rerun until you do like the results!

For mapping and variant calling, you can follow steps in this tutorial. Make adjustment for the target database you are using, and your single input.


Tool Parameters

Input Parameter Value
BAM dataset(s) to filter MyBam.bam
Select BAM property to filter on isProperPair
Select properly paired reads true
Select BAM property to filter on isPrimaryAlignment
Select BAM records for primary alignments true
Select BAM property to filter on mapQuality
Filter on read mapping quality (phred scale) >=20
Would you like to set rules? true
Enter rules here Not available.


Finally, now that you have some mapping results and variant calls that have been created from sequences with a bit of polishing, with the BAM filtered in a scientifically appropriate way, the variant calls derived from that BAM will have a much more meaningful result that the CustomProDB tool should be able to process.

Please give this a try and let us know if it actually helps or if you need more help. :slight_smile:

I’m running a test with your original HISAT2 bam file through some of those modified steps. That will take some time to run, plus you can see what I did it happens to turn out Ok! HISAT2 is a fine choice, or you can try with BWA-MEM instead.

Then I have another test using the tutorial for the tool as the baseline. The sample data processes fine through those steps. This is because the initial reads were cleaned up a bit to make the flow easier.