Wrapper for datasets_download_genome --force

Hello again, Galaxyans! Long time no see…

Bug on NCBI Datasets Genomes tool

Seems like the - - force option broke the wrapper after the new version of the tool

New version of client (18.25.1) available at https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/datasets.
Error: field(s) [--force] not recognized.

Use dataformat tsv genome <command> --help for detailed help about a command.

Tool:

toolshed.g2.bx.psu.edu/repos/iuc/ncbi_datasets/datasets_download_genome/18.24.0+galaxy0

Command:

[ -f /usr/local/ssl/cacert.pem ] && export SSL_CERT_FILE="/usr/local/ssl/cacert.pem"; datasets download genome accession 'GCF_904425465.1' --assembly-version latest --include genome,rna,protein,cds,gff3,gtf,gbff,seq-report --no-progressbar --dehydrated && dataformat tsv genome --package ncbi_dataset.zip --fields --force > genome_data_report.tsv && unzip ncbi_dataset.zip && datasets rehydrate --directory ./ --gzip --max-workers ${NCBI_DATASETS_MAX_WORKERS:-10} && find ncbi_dataset \( -name "*.faa" -o -name "*.fna" -o -name "*.faa.gz" -o -name "*.fna.gz" \) -exec sh -c 'mv {} $(echo {} | sed "s/.f[an]a\(.gz\)\?$/.fasta\1/")' \; && find ncbi_dataset -name "*.jsonl.gz" -exec sh -c 'mv {} $(dirname {})/$(basename {} .gz)' \; && find ncbi_dataset \( -name "*.gz" ! -name "*fasta.gz" \) -exec gunzip {} \; && find ncbi_dataset -name "*fasta.gz" -exec gunzip {} \; && find ncbi_dataset -name sequence_report.jsonl -exec sh -c 'dataformat tsv genome-seq --inputfile {} > $(dirname {})/$(basename {} .jsonl).tsv' \; && true

Thanks!

Hi @David ,

that command line looks more as if you haven’t selected anything under “Output options” → “Columns in the report”. At least that would explain why nothing follows --fields, which then confuses the cli parser when it encounters --force.

Does that help?

Wolfgang

That’s weird. I did actually select a big bunch of columns before running it. I’ll check it again soon. Tnx

@wm75 , I can’t reproduce the error. So weird. I suspect something browser-related clearing the inputs (somehow).

I tested running with all ‘Columns in the report’ option and got a new error:

New version of client (18.26.0) available at ``https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/LATEST/linux-amd64/datasets``. Error: field(s) [annotinfo-busco-ver], [assminfo-linked-assmaccession], [assminfo-linked-assmtype] not recognized. Use dataformat tsv genome <command> --help for detailed help about a command.

But it just worked fine when I deselected the missing options.

Thank you :call_me_hand:

Hi @David

I was able reproduce your error when selecting all fields. Let’s create a PR!

Error: field(s) [annotinfo-busco-ver], [assminfo-linked-assmaccession], [assminfo-linked-assmtype] not recognized.

Ref → Create a table from the genome data reports

PR → Adjust fields in ncbi-datasets to match current NCBI terms by jennaj · Pull Request #7988 · galaxyproject/tools-iuc · GitHub

Similar reported by others → annotinfo-release-version missing · Issue #503 · ncbi/datasets · GitHub

Thanks for following up! For now, the only workaround would be to not include those fields in your query. :rocket:

Thanks @David for the actionable report!
@jennaj 's PR looks like the correct fix, we just need to get the tests to pass, which seem to be failing because of NCBI connection issues atm.

Glad that we could help!