Optimizing STAR Mapping for Large-Scale Genome Annotation: Managing Memory Overflows in RNA-Seq Workflows

Naibin_Duan · August 27, 2024, 1:51am

I am currently working on annotating a new genome.
The genome of this crop is extremely large, ranging from 15G to 18G.
My approach is to align RNA-seq data from multiple samples to this genome,
then input the BAM files into StringTie for assembly to obtain GTF files for each sample, and finally use StringTie Merge to integrate them into a comprehensive genome annotation document.
Therefore, I have designed a workflow.

{
    "a_galaxy_workflow": "true",
    "annotation": "As a basis for this workflow, I used the Galaxy tutorial made by Anthony Bretaudeau.\n\nReferences: - https://training.galaxyproject.org/training-material/topics/genome-annotation/tutorials/annotation-with-maker/tutorial.html",
    "comments": [
        {
            "child_steps": [
                4,
                5,
                8
            ],
            "color": "black",
            "data": {
                "title": "Structural Annotation"
            },
            "id": 1,
            "position": [
                600,
                560
            ],
            "size": [
                240,
                740
            ],
            "type": "frame"
        },
        {
            "child_steps": [
                2,
                3
            ],
            "color": "black",
            "data": {
                "title": "Genome Assembly Quality Analysis"
            },
            "id": 0,
            "position": [
                600,
                0
            ],
            "size": [
                250,
                440
            ],
            "type": "frame"
        }
    ],
    "creator": [
        {
            "class": "Person",
            "email": "mailto:teixeiratuchinski@gmail.com",
            "name": "Giovanna Teixeira Tuchinski"
        }
    ],
    "format-version": "0.1",
    "license": "MIT",
    "name": "Plant Genome Structural Annotation and NLR Annotation",
    "report": {
        "markdown": "\n# Workflow Execution Report\n\n## Workflow Inputs\n```galaxy\ninvocation_inputs()\n```\n\n## Workflow Outputs\n```galaxy\ninvocation_outputs()\n```\n\n## Workflow\n```galaxy\nworkflow_display()\n```\n"
    },
    "steps": {
        "0": {
            "annotation": "Input a genome file in fasta format. \n\nUnless changes are made, for this workflow are best suited genomes from plsnts genetically similar to Solanum lycopersicum.",
            "content_id": null,
            "errors": null,
            "id": 0,
            "input_connections": {},
            "inputs": [
                {
                    "description": "Input a genome file in fasta format. \n\nUnless changes are made, for this workflow are best suited genomes from plsnts genetically similar to Solanum lycopersicum.",
                    "name": "Input genome"
                }
            ],
            "label": "Input genome",
            "name": "Input dataset",
            "outputs": [],
            "position": {
                "left": 0,
                "top": 960
            },
            "tool_id": null,
            "tool_state": "{\"optional\": false, \"format\": [\"fasta\"], \"tag\": null}",
            "tool_version": null,
            "type": "data_input",
            "uuid": "4a353865-cafd-4c54-be39-2e4d762786c2",
            "when": null,
            "workflow_outputs": []
        },
        "1": {
            "annotation": "Use the NLR-Annotator tool to predict NLR-associated loci in a plant genome.\n\nSteuernagel, Burkhard, et al. \u201cThe NLR-Annotator Tool Enables Annotation of the Intracellular Immune Receptor Repertoire.\u201d Plant Physiology, vol. 183, no. 2, Oxford University Press, Mar. 2020, pp. 468\u201382, https://doi.org/10.1104/pp.19.01273.",
            "content_id": null,
            "errors": null,
            "id": 1,
            "input_connections": {},
            "inputs": [
                {
                    "description": "Use the NLR-Annotator tool to predict NLR-associated loci in a plant genome.\n\nSteuernagel, Burkhard, et al. \u201cThe NLR-Annotator Tool Enables Annotation of the Intracellular Immune Receptor Repertoire.\u201d Plant Physiology, vol. 183, no. 2, Oxford University Press, Mar. 2020, pp. 468\u201382, https://doi.org/10.1104/pp.19.01273.",
                    "name": " NLR-Annotator"
                }
            ],
            "label": " NLR-Annotator",
            "name": "Input dataset",
            "outputs": [],
            "position": {
                "left": 1000,
                "top": 600
            },
            "tool_id": null,
            "tool_state": "{\"optional\": false, \"format\": [\"gff\"], \"tag\": null}",
            "tool_version": null,
            "type": "data_input",
            "uuid": "3ad456db-a124-40b3-b216-a102e2b48b45",
            "when": null,
            "workflow_outputs": []
        },
        "2": {
            "annotation": "",
            "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/2.0",
            "errors": null,
            "id": 2,
            "input_connections": {
                "fasta": {
                    "id": 0,
                    "output_name": "output"
                }
            },
            "inputs": [],
            "label": null,
            "name": "Fasta Statistics",
            "outputs": [
                {
                    "name": "stats_output",
                    "type": "tabular"
                }
            ],
            "position": {
                "left": 625.3913880237574,
                "top": 65.90583574348494
            },
            "post_job_actions": {},
            "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/fasta-stats/2.0",
            "tool_shed_repository": {
                "changeset_revision": "0dbb995c7d35",
                "name": "fasta_stats",
                "owner": "iuc",
                "tool_shed": "toolshed.g2.bx.psu.edu"
            },
            "tool_state": "{\"fasta\": {\"__class__\": \"ConnectedValue\"}, \"gaps_option\": false, \"genome_size\": null, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
            "tool_version": "2.0",
            "type": "tool",
            "uuid": "5acf6d10-ec62-46df-8b00-e56d5af7622d",
            "when": null,
            "workflow_outputs": []
        },
        "3": {
            "annotation": "",
            "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.5.0+galaxy0",
            "errors": null,
            "id": 3,
            "input_connections": {
                "input": {
                    "id": 0,
                    "output_name": "output"
                }
            },
            "inputs": [],
            "label": null,
            "name": "Busco",
            "outputs": [
                {
                    "name": "busco_sum",
                    "type": "txt"
                },
                {
                    "name": "busco_table",
                    "type": "tabular"
                },
                {
                    "name": "busco_missing",
                    "type": "tabular"
                },
                {
                    "name": "summary_image",
                    "type": "png"
                },
                {
                    "name": "busco_miniprot",
                    "type": "gff3"
                }
            ],
            "position": {
                "left": 630,
                "top": 180
            },
            "post_job_actions": {},
            "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.5.0+galaxy0",
            "tool_shed_repository": {
                "changeset_revision": "ea8146ee148f",
                "name": "busco",
                "owner": "iuc",
                "tool_shed": "toolshed.g2.bx.psu.edu"
            },
            "tool_state": "{\"adv\": {\"evalue\": \"0.001\", \"limit\": \"3\", \"contig_break\": \"10\"}, \"busco_mode\": {\"mode\": \"geno\", \"__current_case__\": 0, \"miniprot\": true, \"use_augustus\": {\"use_augustus_selector\": \"yes\", \"__current_case__\": 1, \"aug_prediction\": {\"augustus_mode\": \"builtin\", \"__current_case__\": 2, \"augustus_species\": \"tomato\"}, \"long\": true}}, \"input\": {\"__class__\": \"ConnectedValue\"}, \"lineage\": {\"lineage_mode\": \"select_lineage\", \"__current_case__\": 1, \"lineage_dataset\": \"eudicots_odb10\"}, \"lineage_conditional\": {\"selector\": \"download\", \"__current_case__\": 1}, \"outputs\": [\"short_summary\", \"missing\", \"image\"], \"__page__\": null, \"__rerun_remap_job_id__\": null}",
            "tool_version": "5.5.0+galaxy0",
            "type": "tool",
            "uuid": "8c3bccc3-711c-47b5-a080-1322487c9c6a",
            "when": null,
            "workflow_outputs": []
        },
        "4": {
            "annotation": "",
            "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/augustus/augustus/3.4.0+galaxy1",
            "errors": null,
            "id": 4,
            "input_connections": {
                "input_genome": {
                    "id": 0,
                    "output_name": "output"
                }
            },
            "inputs": [],
            "label": null,
            "name": "Augustus",
            "outputs": [
                {
                    "name": "output",
                    "type": "gtf"
                },
                {
                    "name": "protein_output",
                    "type": "fasta"
                },
                {
                    "name": "codingseq_output",
                    "type": "fasta"
                }
            ],
            "position": {
                "left": 620,
                "top": 610
            },
            "post_job_actions": {},
            "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/augustus/augustus/3.4.0+galaxy1",
            "tool_shed_repository": {
                "changeset_revision": "28433faa6e42",
                "name": "augustus",
                "owner": "bgruening",
                "tool_shed": "toolshed.g2.bx.psu.edu"
            },
            "tool_state": "{\"genemodel\": \"complete\", \"gff\": true, \"hints\": {\"usehints\": \"F\", \"__current_case__\": 1}, \"input_genome\": {\"__class__\": \"ConnectedValue\"}, \"model\": {\"augustus_mode\": \"builtin\", \"__current_case__\": 1, \"organism\": \"tomato\"}, \"noInFrameStop\": false, \"outputs\": [\"protein\", \"codingseq\", \"introns\", \"start\", \"stop\", \"cds\"], \"range\": {\"userange\": \"F\", \"__current_case__\": 1}, \"singlestrand\": false, \"softmasking\": false, \"strand\": \"both\", \"utr\": false, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
            "tool_version": "3.4.0+galaxy1",
            "type": "tool",
            "uuid": "2e533046-439d-4076-99aa-b31fda343d71",
            "when": null,
            "workflow_outputs": [
                {
                    "label": "annotation",
                    "output_name": "output",
                    "uuid": "67f1c1df-af43-4115-af38-bc5f3b68c1ee"
                }
            ]
        },
        "5": {
            "annotation": "",
            "content_id": "toolshed.g2.bx.psu.edu/repos/devteam/gffread/gffread/2.2.1.4+galaxy0",
            "errors": null,
            "id": 5,
            "input_connections": {
                "input": {
                    "id": 4,
                    "output_name": "output"
                },
                "reference_genome|genome_fasta": {
                    "id": 0,
                    "output_name": "output"
                }
            },
            "inputs": [
                {
                    "description": "runtime parameter for tool gffread",
                    "name": "chr_replace"
                },
                {
                    "description": "runtime parameter for tool gffread",
                    "name": "reference_genome"
                }
            ],
            "label": null,
            "name": "gffread",
            "outputs": [
                {
                    "name": "output_exons",
                    "type": "fasta"
                }
            ],
            "position": {
                "left": 620,
                "top": 810
            },
            "post_job_actions": {},
            "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/gffread/gffread/2.2.1.4+galaxy0",
            "tool_shed_repository": {
                "changeset_revision": "3e436657dcd0",
                "name": "gffread",
                "owner": "devteam",
                "tool_shed": "toolshed.g2.bx.psu.edu"
            },
            "tool_state": "{\"chr_replace\": {\"__class__\": \"RuntimeValue\"}, \"decode_url\": true, \"expose\": true, \"filtering\": null, \"full_gff_attribute_preservation\": true, \"gffs\": {\"gff_fmt\": \"none\", \"__current_case__\": 0}, \"input\": {\"__class__\": \"ConnectedValue\"}, \"maxintron\": null, \"merging\": {\"merge_sel\": \"none\", \"__current_case__\": 0}, \"reference_genome\": {\"source\": \"history\", \"__current_case__\": 2, \"genome_fasta\": {\"__class__\": \"ConnectedValue\"}, \"ref_filtering\": null, \"fa_outputs\": [\"-w exons.fa\"]}, \"region\": {\"region_filter\": \"none\", \"__current_case__\": 0}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
            "tool_version": "2.2.1.4+galaxy0",
            "type": "tool",
            "uuid": "0cf62ad6-a114-420f-bbe0-a5dbf5b86e46",
            "when": null,
            "workflow_outputs": []
        },
        "6": {
            "annotation": "",
            "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_intersectbed/2.31.1+galaxy0",
            "errors": null,
            "id": 6,
            "input_connections": {
                "inputA": {
                    "id": 4,
                    "output_name": "output"
                },
                "reduce_or_iterate|inputB": {
                    "id": 1,
                    "output_name": "output"
                }
            },
            "inputs": [
                {
                    "description": "runtime parameter for tool bedtools Intersect intervals",
                    "name": "inputA"
                },
                {
                    "description": "runtime parameter for tool bedtools Intersect intervals",
                    "name": "reduce_or_iterate"
                }
            ],
            "label": null,
            "name": "bedtools Intersect intervals",
            "outputs": [
                {
                    "name": "output",
                    "type": "input"
                }
            ],
            "position": {
                "left": 1000,
                "top": 690
            },
            "post_job_actions": {},
            "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_intersectbed/2.31.1+galaxy0",
            "tool_shed_repository": {
                "changeset_revision": "64e2edfe7a2c",
                "name": "bedtools",
                "owner": "iuc",
                "tool_shed": "toolshed.g2.bx.psu.edu"
            },
            "tool_state": "{\"bed\": false, \"count\": false, \"fraction_cond\": {\"fraction_select\": \"default\", \"__current_case__\": 0}, \"genome_file_opts\": {\"genome_file_opts_selector\": \"loc\", \"__current_case__\": 0, \"genome\": null}, \"header\": false, \"inputA\": {\"__class__\": \"RuntimeValue\"}, \"invert\": false, \"once\": false, \"overlap_mode\": [\"-wa\"], \"reduce_or_iterate\": {\"reduce_or_iterate_selector\": \"iterate\", \"__current_case__\": 0, \"inputB\": {\"__class__\": \"RuntimeValue\"}}, \"sorted\": false, \"split\": false, \"strand\": \"-s\", \"__page__\": null, \"__rerun_remap_job_id__\": null}",
            "tool_version": "2.31.1+galaxy0",
            "type": "tool",
            "uuid": "68ec912f-3eaf-446c-af71-a1a950689c5c",
            "when": null,
            "workflow_outputs": []
        },
        "7": {
            "annotation": "",
            "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.11+galaxy1",
            "errors": null,
            "id": 7,
            "input_connections": {
                "reference_genome|genome": {
                    "id": 0,
                    "output_name": "output"
                },
                "track_groups_0|data_tracks_0|data_format|annotation": {
                    "id": 4,
                    "output_name": "output"
                }
            },
            "inputs": [
                {
                    "description": "runtime parameter for tool JBrowse",
                    "name": "reference_genome"
                }
            ],
            "label": null,
            "name": "JBrowse",
            "outputs": [
                {
                    "name": "output",
                    "type": "html"
                }
            ],
            "position": {
                "left": 1000,
                "top": 960
            },
            "post_job_actions": {},
            "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/jbrowse/jbrowse/1.16.11+galaxy1",
            "tool_shed_repository": {
                "changeset_revision": "a6e57ff585c0",
                "name": "jbrowse",
                "owner": "iuc",
                "tool_shed": "toolshed.g2.bx.psu.edu"
            },
            "tool_state": "{\"action\": {\"action_select\": \"create\", \"__current_case__\": 0}, \"gencode\": \"1\", \"jbgen\": {\"defaultLocation\": \"\", \"trackPadding\": \"20\", \"shareLink\": true, \"aboutDescription\": \"\", \"show_tracklist\": true, \"show_nav\": true, \"show_overview\": true, \"show_menu\": true, \"hideGenomeOptions\": false}, \"plugins\": {\"BlastView\": true, \"ComboTrackSelector\": false, \"GCContent\": false}, \"reference_genome\": {\"genome_type_select\": \"history\", \"__current_case__\": 1, \"genome\": {\"__class__\": \"ConnectedValue\"}}, \"standalone\": \"minimal\", \"track_groups\": [{\"__index__\": 0, \"category\": \"Default\", \"data_tracks\": [{\"__index__\": 0, \"data_format\": {\"data_format_select\": \"gene_calls\", \"__current_case__\": 2, \"annotation\": {\"__class__\": \"ConnectedValue\"}, \"match_part\": {\"match_part_select\": false, \"__current_case__\": 1}, \"index\": false, \"track_config\": {\"track_class\": \"NeatHTMLFeatures/View/Track/NeatFeatures\", \"__current_case__\": 3, \"html_options\": {\"topLevelFeatures\": null}}, \"jbstyle\": {\"style_classname\": \"feature\", \"style_label\": \"product,name,id\", \"style_description\": \"note,description\", \"style_height\": \"10px\", \"max_height\": \"600\"}, \"jbcolor_scale\": {\"color_score\": {\"color_score_select\": \"none\", \"__current_case__\": 0, \"color\": {\"color_select\": \"automatic\", \"__current_case__\": 0}}}, \"jb_custom_config\": {\"option\": []}, \"jbmenu\": {\"track_menu\": []}, \"track_visibility\": \"default_off\", \"override_apollo_plugins\": \"False\", \"override_apollo_drag\": \"False\"}}]}], \"uglyTestingHack\": \"\", \"__page__\": null, \"__rerun_remap_job_id__\": null}",
            "tool_version": "1.16.11+galaxy1",
            "type": "tool",
            "uuid": "f7335355-3558-43c9-b672-c9e0b3dcc5df",
            "when": null,
            "workflow_outputs": []
        },
        "8": {
            "annotation": "",
            "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.5.0+galaxy0",
            "errors": null,
            "id": 8,
            "input_connections": {
                "input": {
                    "id": 5,
                    "output_name": "output_exons"
                }
            },
            "inputs": [],
            "label": null,
            "name": "Busco",
            "outputs": [
                {
                    "name": "busco_sum",
                    "type": "txt"
                },
                {
                    "name": "busco_table",
                    "type": "tabular"
                },
                {
                    "name": "busco_missing",
                    "type": "tabular"
                },
                {
                    "name": "summary_image",
                    "type": "png"
                },
                {
                    "name": "busco_gff",
                    "type": "gff3"
                }
            ],
            "position": {
                "left": 620,
                "top": 1040
            },
            "post_job_actions": {},
            "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.5.0+galaxy0",
            "tool_shed_repository": {
                "changeset_revision": "ea8146ee148f",
                "name": "busco",
                "owner": "iuc",
                "tool_shed": "toolshed.g2.bx.psu.edu"
            },
            "tool_state": "{\"adv\": {\"evalue\": \"0.001\", \"limit\": \"3\", \"contig_break\": \"10\"}, \"busco_mode\": {\"mode\": \"tran\", \"__current_case__\": 1}, \"input\": {\"__class__\": \"ConnectedValue\"}, \"lineage\": {\"lineage_mode\": \"select_lineage\", \"__current_case__\": 1, \"lineage_dataset\": \"eudicots_odb10\"}, \"lineage_conditional\": {\"selector\": \"download\", \"__current_case__\": 1}, \"outputs\": [\"short_summary\", \"missing\", \"image\", \"gff\"], \"__page__\": null, \"__rerun_remap_job_id__\": null}",
            "tool_version": "5.5.0+galaxy0",
            "type": "tool",
            "uuid": "764c2a67-8c20-43f1-ba5c-ce54b4db4dc4",
            "when": null,
            "workflow_outputs": []
        },
        "9": {
            "annotation": "In a basic editor or spreadsheet, filter the gene IDs into a txt file.",
            "content_id": "export_remote",
            "errors": null,
            "id": 9,
            "input_connections": {
                "export_type|infiles": {
                    "id": 6,
                    "output_name": "output"
                }
            },
            "inputs": [
                {
                    "description": "runtime parameter for tool Export datasets",
                    "name": "export_type"
                }
            ],
            "label": "ID list",
            "name": "Export datasets",
            "outputs": [
                {
                    "name": "out",
                    "type": "txt"
                }
            ],
            "position": {
                "left": 1300,
                "top": 650
            },
            "post_job_actions": {},
            "tool_id": "export_remote",
            "tool_state": "{\"d_uri\": \"\", \"export_type\": {\"export_type_selector\": \"datasets_auto\", \"__current_case__\": 0, \"infiles\": {\"__class__\": \"RuntimeValue\"}}, \"include_metadata_files\": true, \"invalid_chars\": \"/\", \"__page__\": null, \"__rerun_remap_job_id__\": null}",
            "tool_version": "0.1.0",
            "type": "tool",
            "uuid": "f6a1f688-0c13-4af0-ae03-f4dc45441cde",
            "when": null,
            "workflow_outputs": []
        },
        "10": {
            "annotation": "",
            "content_id": "toolshed.g2.bx.psu.edu/repos/galaxyp/filter_by_fasta_ids/filter_by_fasta_ids/2.3",
            "errors": null,
            "id": 10,
            "input_connections": {
                "header_criteria|identifiers": {
                    "id": 9,
                    "output_name": "out"
                },
                "input": {
                    "id": 4,
                    "output_name": "codingseq_output"
                }
            },
            "inputs": [
                {
                    "description": "runtime parameter for tool Filter FASTA",
                    "name": "header_criteria"
                },
                {
                    "description": "runtime parameter for tool Filter FASTA",
                    "name": "input"
                }
            ],
            "label": null,
            "name": "Filter FASTA",
            "outputs": [
                {
                    "name": "output",
                    "type": "fasta"
                }
            ],
            "position": {
                "left": 1600,
                "top": 550
            },
            "post_job_actions": {},
            "tool_id": "toolshed.g2.bx.psu.edu/repos/galaxyp/filter_by_fasta_ids/filter_by_fasta_ids/2.3",
            "tool_shed_repository": {
                "changeset_revision": "dff7df6fcab5",
                "name": "filter_by_fasta_ids",
                "owner": "galaxyp",
                "tool_shed": "toolshed.g2.bx.psu.edu"
            },
            "tool_state": "{\"dedup\": false, \"header_criteria\": {\"header_criteria_select\": \"id_list\", \"__current_case__\": 1, \"identifiers\": {\"__class__\": \"RuntimeValue\"}, \"id_regex\": {\"find\": \"beginning\", \"__current_case__\": 0}}, \"input\": {\"__class__\": \"RuntimeValue\"}, \"output_discarded\": false, \"sequence_criteria\": {\"sequence_criteria_select\": \"\", \"__current_case__\": 0}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
            "tool_version": "2.3",
            "type": "tool",
            "uuid": "46ac74cf-b556-425a-ba21-09cf24f06eb0",
            "when": null,
            "workflow_outputs": [
                {
                    "label": "predicted-nlr-CDS",
                    "output_name": "output",
                    "uuid": "48d672c0-bb28-49bb-99bd-413b09ac0bce"
                }
            ]
        },
        "11": {
            "annotation": "",
            "content_id": "toolshed.g2.bx.psu.edu/repos/galaxyp/filter_by_fasta_ids/filter_by_fasta_ids/2.3",
            "errors": null,
            "id": 11,
            "input_connections": {
                "header_criteria|identifiers": {
                    "id": 9,
                    "output_name": "out"
                },
                "input": {
                    "id": 4,
                    "output_name": "protein_output"
                }
            },
            "inputs": [
                {
                    "description": "runtime parameter for tool Filter FASTA",
                    "name": "header_criteria"
                },
                {
                    "description": "runtime parameter for tool Filter FASTA",
                    "name": "input"
                }
            ],
            "label": null,
            "name": "Filter FASTA",
            "outputs": [
                {
                    "name": "output",
                    "type": "fasta"
                }
            ],
            "position": {
                "left": 1600,
                "top": 730
            },
            "post_job_actions": {},
            "tool_id": "toolshed.g2.bx.psu.edu/repos/galaxyp/filter_by_fasta_ids/filter_by_fasta_ids/2.3",
            "tool_shed_repository": {
                "changeset_revision": "dff7df6fcab5",
                "name": "filter_by_fasta_ids",
                "owner": "galaxyp",
                "tool_shed": "toolshed.g2.bx.psu.edu"
            },
            "tool_state": "{\"dedup\": false, \"header_criteria\": {\"header_criteria_select\": \"id_list\", \"__current_case__\": 1, \"identifiers\": {\"__class__\": \"RuntimeValue\"}, \"id_regex\": {\"find\": \"beginning\", \"__current_case__\": 0}}, \"input\": {\"__class__\": \"RuntimeValue\"}, \"output_discarded\": false, \"sequence_criteria\": {\"sequence_criteria_select\": \"\", \"__current_case__\": 0}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
            "tool_version": "2.3",
            "type": "tool",
            "uuid": "ef6ee489-a704-46bf-849d-5c0379ba423f",
            "when": null,
            "workflow_outputs": [
                {
                    "label": "predicted-nlr-proteins",
                    "output_name": "output",
                    "uuid": "bcd73363-1da0-4566-9c6e-55b5c7c3eabc"
                }
            ]
        }
    },
    "tags": [
        "Genome",
        "Plant",
        "DNA",
        "QualityAssessment",
        "genome-annotation"
    ],
    "uuid": "0901a080-c94b-4f68-8923-3a6e5059420b",
    "version": 19
}

The difficulty I am currently facing is:

When using STAR for mapping, it requires loading the index of this extremely large genome, along with the sequencing data. A single STAR process can consume over 200G of memory.

When I invoke this workflow, multiple independent STAR mappings are launched in parallel. This easily leads to memory overflow, causing interruptions.

I have thought of two solutions:

Enable shared memory when calling STAR (using the STAR --genomeLoad option). This way, multiple STAR processes can share the index, but this would require modifying the rg_rnaStar.xml file.
Another solution is to prevent the multiple processes in the workflow from running concurrently. Instead, STAR + StringTie assembly should be launched sequentially, one after another in a queue. After the last STAR + StringTie assembly process is completed, StringTie Merge would be executed. However, I am uncertain whether the workflow has an option to introduce a wait trigger mechanism between multiple processes.

jennaj · August 27, 2024, 6:49pm

Hi @Naibin_Duan

This is being run in your personal Galaxy server, correct? And it is not connected to a cluster? Meaning, the database and jobs are all running on the same single computer? You can clarify about any of this.

And, you seem to be using the custom genome feature. Creating an index will make a huge difference! Custom genomes require the fasta to be indexed with every job – creating a dedicated index allows all the jobs to reuse the same index. See here for how that works → Galaxy Server administration / Tutorial List / Data Management & Reference Data

Let’s start there, thanks!

Naibin_Duan · August 28, 2024, 1:23am

@jennaj

Thank you for your response.
Firstly, I can confirm that I have already built the STAR index for this reference genome using the data_manager_star_index_builder.

Therefore, each time I perform STAR alignment for this genome, I do not need to re-index it; I only need to load the index.
However, even so, it still consumes a significant amount of memory. Yes, I am currently using only a single host and do not have access to a compute cluster.
I hope to resolve two issues:

Is it possible to implement memory sharing when loading the genome in STAR within Galaxy?
In STAR’s command line mode, there is an option: --genomeLoad LoadAndKeep to use the genome that’s already loaded.
https://github.com/alexdobin/STAR/issues/583#issuecomment-472406039
Can different threads within a workflow be executed sequentially instead of concurrently?

Naibin_Duan · August 28, 2024, 7:25am

@jennaj I have tested running three parallel STAR processes on a small genome (600M bases), and top shows that the three STAR processes together consume nearly 9% of the memory. The crop genome I want to study is 16G, so it’s clear that running three parallel STAR processes simultaneously is not feasible.

Therefore, I want the multiple STAR processes in my workflow to run one by one, rather than in parallel.

jennaj · August 29, 2024, 4:50pm

Hi @Naibin_Duan

Thanks for explaining more, I think I understand.

I think this is a server configuration function to tune. Big picture: you want to limit your server’s concurrent job execution, yes? So I would suggest looking into that more instead. Please see here Galaxy Job Configuration — Galaxy Project 24.2.dev0 documentation

That said, controlling how many jobs run concurrently in the workflow itself might be possible using workflow parameters. You’ll need to come up with the logic. One job finishes, you do something with that output to confirm it is completed, toggle a true/false parameter that controls the next tool, repeat. See Hands-on: Using Workflow Parameters / Using Workflow Parameters / Using Galaxy and Managing your Data

Naibin_Duan · August 30, 2024, 6:10am

@jennaj I’ll take a closer look at the workflow parameters in the link and then adjust my workflow. Thanks!

Naibin_Duan · September 10, 2024, 8:47am

@jennaj
Hello Jennifer,
I modified the local job_config.xml file by changing:

plugin id=“multilocal” type=“runner” load=“galaxy.jobs.runners.local:LocalJobRunner” workers=“4”/
to:
plugin id=“multilocal” type=“runner” load=“galaxy.jobs.runners.local:LocalJobRunner” workers=“1”/

With this modification, STAR tasks are now executed one by one, and the memory overflow issue no longer occurs.

Of course, It is not difficult create a dedicated job configuration specifically for STAR.
Anyway, the problem has been successfully solved.

Topic		Replies	Views
RNAseq mapping issues usegalaxy.eu support	7	21	June 2, 2025
STAR mapping output BAM files: Aligned.toTranscriptome.out.bam	2	1531	August 16, 2019
RNA Star: Can I generate a temporary index with files from previous assemblies? reference-annotation , reference-genome	2	132	May 13, 2024
RNA star with reference genome from history error usegalaxy.eu support custom-genome , transcriptomics , reference-annotation , reference-genome , custom-build , deseq2 , rna_star	1	313	August 7, 2023
RNA STAR error in the log output file usegalaxy.org support transcriptomics	10	88	July 2, 2024

Optimizing STAR Mapping for Large-Scale Genome Annotation: Managing Memory Overflows in RNA-Seq Workflows

Related topics