Analyzing ARTIC data with Galaxy Issue

The workflow for analyzing ARTIC data with Galaxy (https://covid19.galaxyproject.org/artic/) was initiated and the following message was received:

The server could not complete the request. Please contact the Galaxy Team if this error persists.
{
“new_history_name”: “COVID-19: variation analysis on ARTIC PE data (PC Test)”,
“history_id”: null,
“resource_params”: {},
“replacement_params”: {},
“parameters”: {
“0”: {
“input”: {
“values”: [
{
“id”: “4d82e63dc7881c73”,
“src”: “hdca”,
“hid”: 92,
“name”: “070920_PC (fastsanger)”,
“keep”: true,
“tags”: []
}
],
“batch”: false
}
},
“1”: {
“input”: {
“values”: [
{
“hid”: 39,
“id”: “bbd44e69cb8906b58ea8b3203e84bbaa”,
“keep”: false,
“name”: “MN_908947.fasta”,
“src”: “hda”,
“tags”: []
}
],
“batch”: false
}
},
“2”: {
“input”: {
“values”: [
{
“hid”: 38,
“id”: “bbd44e69cb8906b5b267746e3ff3a99f”,
“keep”: false,
“name”: “nCoV-2019_v3.bed (as bed)”,
“src”: “hda”,
“tags”: []
}
],
“batch”: false
}
},
“3”: {
“input”: {
“values”: [
{
“hid”: 93,
“id”: “bbd44e69cb8906b56c9107552d482465”,
“keep”: false,
“name”: “ARTIC_SARS_CoV-2_amplicon_info_v3_Prime.tsv”,
“src”: “hda”,
“tags”: []
}
],
“batch”: false
}
},
“5”: {
“single_paired|single_paired_selector”: “paired”,
“single_paired|adapter_trimming_options|disable_adapter_trimming”: “false”,
“single_paired|adapter_trimming_options|adapter_sequence1”: “”,
“single_paired|adapter_trimming_options|adapter_sequence2”: “”,
“single_paired|global_trimming_options|trim_front1”: “”,
“single_paired|global_trimming_options|trim_tail1”: “”,
“single_paired|global_trimming_options|trim_front2”: “”,
“single_paired|global_trimming_options|trim_tail2”: “”,
“overrepresented_sequence_analysis|overrepresentation_analysis”: “false”,
“overrepresented_sequence_analysis|overrepresentation_sampling”: “”,
“filter_options|quality_filtering_options|disable_quality_filtering”: “false”,
“filter_options|quality_filtering_options|qualified_quality_phred”: “”,
“filter_options|quality_filtering_options|unqualified_percent_limit”: “”,
“filter_options|quality_filtering_options|n_base_limit”: “”,
“filter_options|length_filtering_options|disable_length_filtering”: “false”,
“filter_options|length_filtering_options|length_required”: “”,
“filter_options|low_complexity_filter|enable_low_complexity_filter”: “false”,
“filter_options|low_complexity_filter|complexity_threshold”: “”,
“read_mod_options|polyg_tail_trimming|trimming_select”: “”,
“read_mod_options|polyg_tail_trimming|poly_g_min_len”: “”,
“read_mod_options|polyx_tail_trimming|polyx_trimming_select”: “”,
“read_mod_options|umi_processing|umi”: “false”,
“read_mod_options|umi_processing|umi_loc”: “”,
“read_mod_options|umi_processing|umi_len”: “”,
“read_mod_options|umi_processing|umi_prefix”: “”,
“read_mod_options|cutting_by_quality_options|cut_by_quality5”: “false”,
“read_mod_options|cutting_by_quality_options|cut_by_quality3”: “false”,
“read_mod_options|cutting_by_quality_options|cut_window_size”: “”,
“read_mod_options|cutting_by_quality_options|cut_mean_quality”: “”,
“read_mod_options|base_correction_options|correction”: “false”,
“output_options|report_html”: “true”,
“output_options|report_json”: “true”,
“__job_resource|__job_resource__select”: “no”
},
“6”: {
“reference_source|reference_source_selector”: “history”,
“reference_source|index_a”: “auto”,
“fastq_input|fastq_input_selector”: “paired”,
“fastq_input|iset_stats”: “”,
“rg|rg_selector”: “do_not_set”,
“analysis_type|analysis_type_selector”: “illumina”,
“__job_resource|__job_resource__select”: “no”
},
“8”: {
“mode|outtype”: “selected_reads”,
“mode|filter_config|cond_region|select_region”: “no”,
“mode|filter_config|cond_rg|select_rg”: “no”,
“mode|filter_config|quality”: “20”,
“mode|filter_config|library”: “”,
“mode|filter_config|cigarcons”: “”,
“mode|filter_config|inclusive_filter”: [
“1”
],
“mode|filter_config|exclusive_filter”: [
“4”,
“8”,
“256”
],
“mode|filter_config|exclusive_filter_all”: null,
“mode|subsample_config|subsampling_mode|select_subsample”: “fraction”,
“mode|subsample_config|subsampling_mode|factor”: “1.0”,
“mode|subsample_config|subsampling_mode|seed”: “”,
“mode|output_options|reads_report_type”: “retained”,
“mode|output_options|complementary_output”: “false”,
“mode|output_options|adv_output|collapsecigar”: “false”,
“mode|output_options|output_format|oformat”: “bam”,
“mode|output_options|output_format|fmtopt”: “-b”,
“addref_cond|addref_select”: “no”
},
“9”: {
“coverage_cond|coverage_select”: “no”,
“remove_dups”: “false”,
“split_output_cond|split_output_selector”: “no”,
“filter_by_flags|filter_flags”: “nofilter”,
“gc_depth”: “”,
“insert_size”: “”,
“read_length”: “”,
“most_inserts”: “”,
“trim_quality”: “”,
“addref_cond|addref_select”: “no”,
“cond_region|select_region”: “no”,
“sparse”: “false”,
“remove_overlaps”: “false”,
“cov_threshold”: “”
},
“10”: {
“primer|source”: “history”,
“min_len”: “1”,
“min_qual”: “0”,
“window_width”: “4”,
“inc_primers”: “true”
},
“11”: {
“reference_source|ref_selector”: “history”,
“adv_options|keepflags”: “false”,
“adv_options|bq2_handling|replace_bq2”: “keep”,
“adv_options|bq2_handling|defqual”: “2”,
“__job_resource|__job_resource__select”: “no”
},
“12”: {
“strategy|selector”: “dindel”,
“strategy|reference_source|ref_selector”: “history”
},
“13”: {
“stats_regions|region_select”: “all”,
“per_base_coverage”: “false”,
“duplicate_skipping”: “0”,
“plot_specific|n_bins”: “400”,
“plot_specific|paint_chromosome_limits”: “true”,
“plot_specific|genome_gc_distr”: null,
“plot_specific|homopolymer_size”: “3”,
“__job_resource|__job_resource__select”: “no”
},
“14”: {
“reference_source|ref_selector”: “history”,
“regions|restrict_to_region”: “genome”,
“variant_types”: “–call-indels”,
“call_control|set_call_options”: “yes”,
“call_control|coverage|min_cov”: “5”,
“call_control|coverage|max_depth”: “1000000”,
“call_control|pe|use_orphan”: “false”,
“call_control|bc_quals|min_bq”: “30”,
“call_control|bc_quals|min_alt_bq”: “30”,
“call_control|bc_quals|alt_bq|modify”: “”,
“call_control|align_quals|alnqual|use_alnqual”: “”,
“call_control|align_quals|alnqual|alnqual_choice|alnquals_to_use”: “”,
“call_control|align_quals|alnqual|alnqual_choice|extended_baq”: “true”,
“call_control|map_quals|min_mq”: “20”,
“call_control|map_quals|use_mq|no_mq”: “”,
“call_control|map_quals|use_mq|max_mq”: “255”,
“call_control|source_qual|use_src_qual|src_qual”: “”,
“call_control|joint_qual|min_jq”: “0”,
“call_control|joint_qual|min_alt_jq”: “0”,
“call_control|joint_qual|def_alt_jq”: “0”,
“filter_control|filter_type”: “set_custom”,
“filter_control|sig”: “0.0005”,
“filter_control|bonf”: “0”,
“filter_control|others”: “false”,
“__job_resource|__job_resource__select”: “no”
},
“15”: {
“join_identifier”: “_”
},
“16”: {
“filter_by_type|keep_only”: “”,
“filter_by_type|qual|snvqual_filter|snvqual”: “no”,
“filter_by_type|qual|indelqual_filter|indelqual”: “no”,
“coverage|cov_min”: “5”,
“coverage|cov_max”: “0”,
“af|af_min”: “0.05”,
“af|af_max”: “0.95”,
“sb|sb_filter|strand_bias”: “no”,
“flag_or_drop”: “”
},
“17”: {
“results_0|software_cond|software”: “fastp”,
“results_1|software_cond|software”: “samtools”,
“results_1|software_cond|output_0|type|type”: “stats”,
“results_2|software_cond|software”: “qualimap”,
“title”: “”,
“comment”: “”,
“flat”: “false”,
“saveLog”: “false”
},
“19”: {
“reference_source|ref_selector”: “history”,
“regions|restrict_to_region”: “genome”,
“variant_types”: “–call-indels”,
“call_control|set_call_options”: “yes”,
“call_control|coverage|min_cov”: “5”,
“call_control|coverage|max_depth”: “1000000”,
“call_control|pe|use_orphan”: “false”,
“call_control|bc_quals|min_bq”: “30”,
“call_control|bc_quals|min_alt_bq”: “30”,
“call_control|bc_quals|alt_bq|modify”: “”,
“call_control|align_quals|alnqual|use_alnqual”: “”,
“call_control|align_quals|alnqual|alnqual_choice|alnquals_to_use”: “”,
“call_control|align_quals|alnqual|alnqual_choice|extended_baq”: “true”,
“call_control|map_quals|min_mq”: “20”,
“call_control|map_quals|use_mq|no_mq”: “”,
“call_control|map_quals|use_mq|max_mq”: “255”,
“call_control|source_qual|use_src_qual|src_qual”: “”,
“call_control|joint_qual|min_jq”: “0”,
“call_control|joint_qual|min_alt_jq”: “0”,
“call_control|joint_qual|def_alt_jq”: “0”,
“filter_control|filter_type”: “set_custom”,
“filter_control|sig”: “0.0005”,
“filter_control|bonf”: “0”,
“filter_control|others”: “false”,
“__job_resource|__job_resource__select”: “no”
},
“20”: {
“reference_source|reference_source_selector”: “history”,
“isect_union”: “-i”,
“invert”: “true”,
“window_size”: “0”,
“loci”: “false”,
“adv_options|adv_options_selector”: “no”
},
“21”: {
“replacements_0|find_pattern”: “^(#CHROM.+)$”,
“replacements_0|replace_pattern”: “##FILTER=<ID=AmpliconRemoval,Description=“Variant removed upon removal of amplicon”>\n\1”,
“replacements_1|find_pattern”: “(.+\t)PASS(\t.+)”,
“replacements_1|replace_pattern”: “\1AmpliconRemoval\2”
},
“22”: {
“reference_source|reference_source_selector”: “history”,
“isect_union”: “-u”,
“invert”: “false”,
“window_size”: “0”,
“loci”: “false”,
“adv_options|adv_options_selector”: “no”
},
“23”: {
“filter_by_type|keep_only”: “”,
“filter_by_type|qual|snvqual_filter|snvqual”: “no”,
“filter_by_type|qual|indelqual_filter|indelqual”: “no”,
“coverage|cov_min”: “5”,
“coverage|cov_max”: “0”,
“af|af_min”: “0.05”,
“af|af_max”: “0.0”,
“sb|sb_filter|strand_bias”: “mtc”,
“sb|sb_filter|sb_alpha”: “0.001”,
“sb|sb_filter|sb_mtc”: “fdr”,
“sb|sb_filter|sb_compound”: “true”,
“sb|sb_filter|sb_indels”: “false”,
“flag_or_drop”: “–print-all”
},
“24”: {
“inputFormat”: “vcf”,
“genome_version”: “NC_045512.2”,
“outputConditional|outputFormat”: “vcf”,
“csvStats”: “false”,
“udLength”: “0”,
“annotations”: [
“-formatEff”,
“-classic”
],
“intervals”: null,
“transcripts”: null,
“filterOut”: [
“-no-downstream”,
“-no-intergenic”,
“-no-upstream”
],
“filter|specificEffects”: “no”,
“offset”: “default”,
“chr”: “”,
“generate_stats”: “true”,
“noLog”: “true”
}
},
“parameters_normalized”: true,
“batch”: true
}

Has anyone encountered this before? If so, what was the work around to get the analysis running?

Hi @gvestal,
this looks less like an issue related to the particular analysis you are trying to do, but more like a problem with the server or the input data you are using.
Couple of diagnostic questions:

  1. Which server are you working on?
  2. Can you still reproduce the issue?
  3. Are all your input datasets in an ok (green state) - including the individual datasets inside the collection of fastq files?
  4. Did the workflow run configuration complain about anything before you clicked on run?

Cheers,
Wolfgang

Good morning @wm75!

  1. I am in the United States and use the usegalaxy.org portal, but unsure about the exact server.
  2. I have tried the workflow with the source FASTQ files and the post processed samples from this workflow (https://covid19.galaxyproject.org/genomics/1-PreProcessing/) with different samples and get the same issue.
  3. The FASTQ and processed fastsanger files are all in green state.
  4. There was a notice stating “…Some tools are being executed with different versions compared to those available when this workflow was last saved because the other versions are not or no longer available on this Galaxy instance. To upgrade your workflow and dismiss this message simply edit the workflow and re-save it…”
    I’ve tried resaving and running and running from the original workflow file and get the same result.

I can reproduce your error at least on usegalaxy.org and it seems to happen with any arbitrary input files.
I’m really busy right now and it’s not obvious what happens, but if you give me time over the weekend I’ll investigate this further and let you know.

I can’t actually see from your message which specific step gave an error. This seems to be a problem with the versions of tools available on usegalaxy.org - and the workflow in question is listed as available on usegalaxy.eu and usegalaxy.org.au (https://covid19.galaxyproject.org/artic/).

So which files are in a red state and what error messages are you getting? Also, that page that you link to is not ARTIC ampliseq data. The workflow you mentioned is specifically designed for data generated using the ARTIC amplicon sequencing protocol.

@pvanheus there are no datasets! The workflow does not even start scheduling anything - that’s the problem.
I’ll do more testing on different servers and check the wf file carefully as soon as I have a little bit of time.

@gvestal Now this is a bit absurd!
The workflow runs just fine on usegalaxy.eu and all it takes to make it work on usegalaxy.org is saving it again on that server (to make that happen, open the workflow in the editor, then move any of the step boxes a little just to introduce a change and click on the save icon.
In my hands everything works afterwards.
usegalaxy.eu is currently still running Galaxy version 19.09, while usegalaxy.org is on 20.05, so my guess is that there is a tiny incompatibility with the workflow format somewhere (which would be a bug) that prevents some older workflows from running on new server versions. I’m going to investigate this further, but at least you should be able to continue with the covid19 data.

Best wishes,
Wolfgang

Good morning @wm75!

I made the suggested alterations and everything seems to be working. Thank you for the expedient reply and attention to this issues. I appreciate it.

If you have to time to consider it, have you considered using the data from your Artic PE analysis and how it might apply to doing a reference based SARS-CoV-2 assembly? I’m relatively new to bioinformatics, but if bcftools consensus was used with the final vcf file from the ARTIC PE workup and the MN_908947.3 FASTA, could that generate a consensus assembled genome?

Yes in fact I have generated genomes that way in the past.

In an earlier workflow I did for SARS-CoV-2 / ARTIC I used bcftools consensus for a consensus genome. Also I used bedtools Genome Coverage to compute low coverage regions and bedtools MaskFastaBed to mask out low coverage regions of the genome.