Proteogenomics fasta file generation w CustomProDB

Odysseus_1 · February 11, 2025, 5:08pm

Hi,

I’m new to Galaxy and trying to create a FASTA file from transcriptomics data. However, I’m encountering an error when using CustomProDB:

“This job has terminated because it used more memory than it was allocated. Please click the bug icon to report this problem if you need help.”

You can view my history here: Galaxy

I would greatly appreciate any help.

jennaj · February 11, 2025, 7:27pm

Welcome, @Odysseus_1

This error message usually means that there is a data or parameter problem to resolve. It can happen with any tool. In short, the requested analysis completely overwhelmed the underlying tool and it quit out without more details.

Thanks for sharing your history, I’m reviewing right now, and let us know if you solved this already! More soon.

jennaj · February 15, 2025, 2:00am

Hello @Odysseus_1

The root problem is that the simplified processing in the tutorial you are following doesn’t include all of the steps needed for working with read data. It will be fine to show you the basic outline of steps using the tutorial example data, but with full sized data, you’ll need to do a bit more or there will be a lot of noise in the data that this tool doesn’t understand how to handle.

For read QA see this tutorial, or you can use a workflow that applies the same manipulations but for batch data (one or more pairs in a collection).

Hands-on: Quality Control / Quality Control / Sequence analysis
Go to Workflows → Public workflows and search with the keyword quality. You can run this directly without even importing it, and input the paired end collection output from the NCBI Faster tool that you already have. One pair or 100 pairs, this will all work fine.
The output will be some reports about your data, then the trimmed reads ready to use with BWA-MEM. These may look great, then you are ready for the next step. In your case that is mapping the reads.
If the reports do not look good, then import the workflow, add more filters to Cutadapt, save, rerun until you do like the results!
- Direct link to Workflow at UseGalaxy.org → Quality Control Q20-L20
- For anyone working at a different server, you can copy that workflow over to wherever you are working and it will likely run! Galaxy FAQs (workflows)
- You can also search for this and more public workflows here → GTN Pan-Galactic Workflow Search

For mapping and variant calling, you can follow steps in this tutorial. Make adjustment for the target database you are using, and your single input.

Hands-on: Exome sequencing data analysis for diagnosing a genetic disease / Exome sequencing data analysis for diagnosing a genetic disease / Variant Analysis
You could optionally try this: map with BWA-MEM and toggle on the read groups (defaults are fine with a single sample) then use a tool like Filter BAM with parameters like these below, followed by MarkDuplicates before proceeding to Freebayes. More tutorials for all of these tools are linked from the tool forms.

Tool Parameters

Input Parameter	Value
BAM dataset(s) to filter	MyBam.bam
Select BAM property to filter on	isProperPair
Select properly paired reads	true
Select BAM property to filter on	isPrimaryAlignment
Select BAM records for primary alignments	true
Select BAM property to filter on	mapQuality
Filter on read mapping quality (phred scale)	>=20
Would you like to set rules?	true
Enter rules here	Not available.

Finally, now that you have some mapping results and variant calls that have been created from sequences with a bit of polishing, with the BAM filtered in a scientifically appropriate way, the variant calls derived from that BAM will have a much more meaningful result that the CustomProDB tool should be able to process.

Please give this a try and let us know if it actually helps or if you need more help.

I’m running a test with your original HISAT2 bam file through some of those modified steps. That will take some time to run, plus you can see what I did it happens to turn out Ok! HISAT2 is a fine choice, or you can try with BWA-MEM instead.

https://usegalaxy.org/u/jen-galaxyproject/h/help-for-proteogenomics-https-help-galaxyproject-org-t-proteogenomics-fasta-file-generation-w-customprodb-14700

Then I have another test using the tutorial for the tool as the baseline. The sample data processes fine through those steps. This is because the initial reads were cleaned up a bit to make the flow easier.

Topic		Replies	Views
Workflow troubleshooting post release 24.2 troubleshooting , exceeds-memory-error	9	41	February 25, 2025
Error with kallisto -- This job was terminated because it used more memory than it was allocated usegalaxy.org support exceeds-memory-error	4	1670	May 10, 2019
Targetfinder encountering error because it "uses more memory than allowed" (I don't think this is true); other strategies? usegalaxy.org support exceeds-memory-error , tool-help , targetfinder	1	16	September 18, 2024
CheckM lineage_wf usegalaxy.org support tool-help , checkm_lineage_wf	3	35	February 4, 2025
Error message related to memory-RNA star and MD tag usegalaxy.org support troubleshooting , exceeds-memory-error	1	821	August 27, 2019

Proteogenomics fasta file generation w CustomProDB

Tool Parameters

Related topics