ChatGXY returns malformed workflow files

I tried to use the chatbot to create a new workflow and it created a malformed json:

{
  "format_version": "1.0",
  "steps": {
    "0": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/thermorawfileparser/thermorawfileparser/1.4.0",
      "tool_state": {
        "input_file": {"src": "hda", "id": "__raw_input__"},
        "output_format": "mzML",
        "centroid": "true"
      },
      "outputs": {
        "output_file": {"src": "hda", "id": "__mzml_output__"}
      },
      "annotation": "Convert Thermo *.raw* to centroided mzML"
    },

    "1": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/msqc/msqc/1.0.0",
      "tool_state": {
        "input_file": {"src": "hda", "id": "__mzml_output__"},
        "report_type": "HTML"
      },
      "inputs": {
        "input_file": {"src": "output", "id": "0", "output_name": "output_file"}
      },
      "outputs": {
        "report": {"src": "hda", "id": "__qc_report__"}
      },
      "annotation": "Basic QC of converted mzML"
    },

    "2": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/msfragger/msfragger/3.5.0",
      "tool_state": {
        "input_file": {"src": "hda", "id": "__mzml_output__"},
        "fasta_file": {"src": "hda", "id": "__fasta_input__"},
        "enzyme": "Trypsin",
        "missed_cleavages": "2",
        "fixed_mods": ["Carbamidomethyl (C)", "TMT6plex (K)", "TMT6plex (N-term)"],
        "variable_mods": ["Oxidation (M)", "Acetyl (Protein N-term)"],
        "precursor_mass_tolerance": "10ppm",
        "fragment_mass_tolerance": "0.02Da",
        "tmt_label": "__tmt_plex__",
        "output_format": "pepXML"
      },
      "inputs": {
        "input_file": {"src": "output", "id": "0", "output_name": "output_file"},
        "fasta_file": {"src": "hda", "id": "__fasta_input__"}
      },
      "outputs": {
        "pepXML": {"src": "hda", "id": "__pepXML_output__"}
      },
      "annotation": "MSFragger search (TMT‑aware)"
    },

    "3": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/peptide_shaker/peptide_shaker/2.2.0",
      "tool_state": {
        "input_file": {"src": "hda", "id": "__pepXML_output__"},
        "fasta_file": {"src": "hda", "id": "__fasta_input__"},
        "decoy_tag": "rev_",
        "fdr_cutoff": "0.01",
        "minimum_peptides_per_protein": "2"
      },
      "inputs": {
        "input_file": {"src": "output", "id": "2", "output_name": "pepXML"},
        "fasta_file": {"src": "hda", "id": "__fasta_input__"}
      },
      "outputs": {
        "filtered_peptide_report": {"src": "hda", "id": "__filtered_peptides__"},
        "protein_report": {"src": "hda", "id": "__protein_report__"}
      },
      "annotation": "PeptideShaker – FDR filtering and protein inference"
    },

    "4": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/msstats_tmt/msstats_tmt/2.0.0",
      "tool_state": {
        "protein_report": {"src": "hda", "id": "__protein_report__"},
        "tmt_correction_matrix": {"src": "hda", "id": "__tmt_correction_matrix__"},
        "reference_channel": "__reference_channel__",
        "normalization_method": "median",
        "summarization_method": "sum",
        "output_format": "TSV"
      },
      "inputs": {
        "protein_report": {"src": "output", "id": "3", "output_name": "protein_report"},
        "tmt_correction_matrix": {"src": "hda", "id": "__tmt_correction_matrix__"}
      },
      "outputs": {
        "quant_table": {"src": "hda", "id": "__quant_table__"},
        "summary_report": {"src": "hda", "id": "__msstats_summary__"}
      },
      "annotation": "MSstatsTMT – reporter ion extraction, impurity correction, normalization, protein‑level summarization"
    },

    "5": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/limma/limma_voom/1.0.0",
      "tool_state": {
        "input_table": {"src": "hda", "id": "__quant_table__"},
        "design_matrix": {"src": "hda", "id": "__design_matrix__"},
        "contrast_matrix": {"src": "hda", "id": "__contrast_matrix__"},
        "adjust_method": "BH",
        "fdr_cutoff": "0.05"
      },
      "inputs": {
        "input_table": {"src": "output", "id": "4", "output_name": "quant_table"},
        "design_matrix": {"src": "hda", "id": "__design_matrix__"},
        "contrast_matrix": {"src": "hda", "id": "__contrast_matrix__"}
      },
      "outputs": {
        "differential_results": {"src": "hda", "id": "__diff_results__"},
        "volcano_plot": {"src": "hda", "id": "__volcano_plot__"},
        "heatmap": {"src": "hda", "id": "__heatmap__"}
      },
      "annotation": "limma‑voom – statistical testing, volcano plot & heatmap"
    }
  },

  "inputs": {
    "__raw_input__": {
      "label": "Raw file(s) (Thermo *.raw*)",
      "type": "data"
    },
    "__fasta_input__": {
      "label": "Protein FASTA (with decoys if desired)",
      "type": "data"
    },
    "__tmt_correction_matrix__": {
      "label": "TMT correction matrix (CSV from manufacturer)",
      "type": "data"
    },
    "__tmt_plex__": {
      "label": "TMT plex (e.g., 10plex, 11plex, 16plex)",
      "type": "select",
      "options": ["10plex", "11plex", "16plex"]
    },
    "__reference_channel__": {
      "label": "Reference channel (e.g., 126, 127N, 128C)",
      "type": "text"
    },
    "__design_matrix__": {
      "label": "Design matrix (CSV, samples × conditions)",
      "type": "data"
    },
    "__contrast_matrix__": {
      "label": "Contrast matrix (CSV, comparisons)",
      "type": "data"
    }
  },

  "annotations": {
    "name": "TMT‑DDA Proteomics Pipeline",
    "tags": ["proteomics", "TMT", "DDA", "quantification", "MSstatsTMT", "limma"],
    "description": "End‑to‑end workflow for TMT‑labelled DDA proteomics: raw → mzML conversion → QC → MSFragger search → PeptideShaker filtering → MSstatsTMT quantification → limma‑

Auto (Router) gpt-oss-120b-llmlb 6852 tokens
Query
Can you create the workflow for me?
Auto (Router)

Below is a ready‑to‑import Galaxy workflow definition (in the standard JSON “.ga” format) that implements the end‑to‑end TMT‑DDA pipeline described earlier.
You can copy the JSON into a text file, name it something like tmt_dda_workflow.ga, and then Import it into any Galaxy instance (via Workflows → Import → Choose File).

After importing, you’ll only need to set a few “user‑input” parameters (e.g., the FASTA file, TMT plex, and correction matrix) before running the workflow on your own data.
1️⃣ Full Workflow JSON

{
  "format_version": "1.0",
  "steps": {
    "0": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/thermorawfileparser/thermorawfileparser/1.4.0",
      "tool_state": {
        "input_file": {"src": "hda", "id": "__raw_input__"},
        "output_format": "mzML",
        "centroid": "true"
      },
      "outputs": {
        "output_file": {"src": "hda", "id": "__mzml_output__"}
      },
      "annotation": "Convert Thermo *.raw* to centroided mzML"
    },

    "1": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/msqc/msqc/1.0.0",
      "tool_state": {
        "input_file": {"src": "hda", "id": "__mzml_output__"},
        "report_type": "HTML"
      },
      "inputs": {
        "input_file": {"src": "output", "id": "0", "output_name": "output_file"}
      },
      "outputs": {
        "report": {"src": "hda", "id": "__qc_report__"}
      },
      "annotation": "Basic QC of converted mzML"
    },

    "2": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/msfragger/msfragger/3.5.0",
      "tool_state": {
        "input_file": {"src": "hda", "id": "__mzml_output__"},
        "fasta_file": {"src": "hda", "id": "__fasta_input__"},
        "enzyme": "Trypsin",
        "missed_cleavages": "2",
        "fixed_mods": ["Carbamidomethyl (C)", "TMT6plex (K)", "TMT6plex (N-term)"],
        "variable_mods": ["Oxidation (M)", "Acetyl (Protein N-term)"],
        "precursor_mass_tolerance": "10ppm",
        "fragment_mass_tolerance": "0.02Da",
        "tmt_label": "__tmt_plex__",   // user input, e.g., 10plex, 11plex, 16plex
        "output_format": "pepXML"
      },
      "inputs": {
        "input_file": {"src": "output", "id": "0", "output_name": "output_file"},
        "fasta_file": {"src": "hda", "id": "__fasta_input__"}
      },
      "outputs": {
        "pepXML": {"src": "hda", "id": "__pepXML_output__"}
      },
      "annotation": "MSFragger search (TMT‑aware)"
    },

    "3": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/peptide_shaker/peptide_shaker/2.2.0",
      "tool_state": {
        "input_file": {"src": "hda", "id": "__pepXML_output__"},
        "fasta_file": {"src": "hda", "id": "__fasta_input__"},
        "decoy_tag": "rev_",
        "fdr_cutoff": "0.01",
        "minimum_peptides_per_protein": "2"
      },
      "inputs": {
        "input_file": {"src": "output", "id": "2", "output_name": "pepXML"},
        "fasta_file": {"src": "hda", "id": "__fasta_input__"}
      },
      "outputs": {
        "filtered_peptide_report": {"src": "hda", "id": "__filtered_peptides__"},
        "protein_report": {"src": "hda", "id": "__protein_report__"}
      },
      "annotation": "PeptideShaker – FDR filtering and protein inference"
    },

    "4": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/msstats_tmt/msstats_tmt/2.0.0",
      "tool_state": {
        "protein_report": {"src": "hda", "id": "__protein_report__"},
        "tmt_correction_matrix": {"src": "hda", "id": "__tmt_correction_matrix__"},
        "reference_channel": "__reference_channel__",   // user input (e.g., 126)
        "normalization_method": "median",
        "summarization_method": "sum",
        "output_format": "TSV"
      },
      "inputs": {
        "protein_report": {"src": "output", "id": "3", "output_name": "protein_report"},
        "tmt_correction_matrix": {"src": "hda", "id": "__tmt_correction_matrix__"}
      },
      "outputs": {
        "quant_table": {"src": "hda", "id": "__quant_table__"},
        "summary_report": {"src": "hda", "id": "__msstats_summary__"}
      },
      "annotation": "MSstatsTMT – reporter ion extraction, impurity correction, normalization, protein‑level summarization"
    },

    "5": {
      "type": "tool",
      "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/limma/limma_voom/1.0.0",
      "tool_state": {
        "input_table": {"src": "hda", "id": "__quant_table__"},
        "design_matrix": "__design_matrix__",          // user‑provided CSV (samples vs. conditions)
        "contrast_matrix": "__contrast_matrix__",      // user‑provided CSV (comparisons)
        "adjust_method": "BH",
        "fdr_cutoff": "0.05"
      },
      "inputs": {
        "input_table": {"src": "output", "id": "4", "output_name": "quant_table"}
      },
      "outputs": {
        "differential_results": {"src": "hda", "id": "__diff_results__"},
        "volcano_plot": {"src": "hda", "id": "__volcano_plot__"},
        "heatmap": {"src": "hda", "id": "__heatmap__"}
      },
      "annotation": "limma‑voom – statistical testing, volcano plot & heatmap"
    }
  },

  "inputs": {
    "__raw_input__": {
      "label": "Raw file(s) (Thermo *.raw*)",
      "type": "data"
    },
    "__fasta_input__": {
      "label": "Protein FASTA (with decoys if desired)",
      "type": "data"
    },
    "__tmt_correction_matrix__": {
      "label": "TMT correction matrix (CSV from manufacturer)",
      "type": "data"
    },
    "__tmt_plex__": {
      "label": "TMT plex (e.g., 10plex, 11plex, 16plex)",
      "type": "select",
      "options": ["10plex", "11plex", "16plex"]
    },
    "__reference_channel__": {
      "

Even when removing the cutoff section I was not able to import the generate the workflow.

Maybe it would be good to detect this automatically and reprompt in this case? Also a download button would be much appreciated.

That being said I am unsure if that is the intended purpose of the chatbot.

Hi @t0mdavid-m

The chatGXY rover bot is not quite ready to design workflows for you but this is great feedback! I’ll make sure to pass it on to the developers as a desired use case example.

The current best way to create a workflow is to do one of these:

  1. Find an existing workflow and customize it!

  2. Run through your analysis steps with some representative data, then Extract a workflow and polish it more.

Most GTN Tutorials include reusable workflow templates. Once oriented, explore the IWC Workflow Library and try a production workflow for streamlined batch analysis. Any can be imported and customized. You can also go to Workflows → Public Workflows at any public Galaxy server to run the search since most of this cross indexed. Just be aware that anyone can public a workflow! Maybe review the version chain log to make sure you have the “root” version you really want.

Then, a simple example of Extract for workflows is in this tutorial. Search the forum workflow-extract or the GTN with the keyword “workflow” for much more. → Hands-on: A short introduction to Galaxy / A short introduction to Galaxy / Introduction to Galaxy Analyses

Finally, once you have your workflow and are testing it, or should you prefer working on the command line, more sophisticated ways to process batches of data can be done with Planemo. → Running Galaxy workflows — Planemo 0.75.44.dev0 documentation

Please let us know if this helps or not! :slight_smile: