Input file deleted mid-workflow

ahbedard · December 6, 2019, 5:47pm

I have been running a few different workflows successfully as of late (read alignment/quant to transcriptome w/ Salmon). However, recently my fasta.gz transcriptome input file seems to be randomly deleted in the middle of the workflow, resulting in the Salmon error “Input dataset ‘gencode.vM23.transcripts.fa uncompressed’ was deleted before the job started.” I have run this same workflow for Salmon before using the same transcriptome file without an issue.

Within the same workflow, I am also using a genome based alignment (STAR), which uses an input GTF as a gene model for alignment. This part of the workflow executes successfully with no issue.

I also noticed that when I attempt to rerun a workflow using the same input history, I have to reselect the transcriptome file, since it claims it does not exist anymore, before rerunning (I just reselect the same file from the history, I do not reupload the fasta.gz).

I’m just wondering if this is a typical bug that can be resolved by reuploading the file, or if there is something I am missing in my execution that is causing this issue.

Any help would be deeply appreciated!

jennaj · December 6, 2019, 6:18pm

Hi @ahbedard

Is the dataset actually deleted from the history (check your hidden datasets)?

If so, have you set “Output cleanup” to “Yes” for the Salmon tool, or any upstream tools that use this dataset? This is “No” by default.

workflow-output-cleanup

Salmon (and all other tools I am aware of) require uncompressed fasta as an input. An uncompressed version of the data is created by tools at runtime as a hidden dataset when compressed fasta.gz is input. If upstream tools are creating the hidden uncompressed dataset and have the output cleanup set to “Yes”, the Salmon tool might still be expecting the uncompressed version of the data to be available. This might be buggy behavior, or expected, hard to tell right now.

Please try two things:

A rerun, just to make sure that this wasn’t some transient server issue.
Check any upstream tool’s (that use this same fasta.gz input) “output cleanup” option. If any are set to “Yes”, change that to “No”, and do the cleanup step after Salmon (or, the last tool in your workflow that uses that same fasta.gz data).

We can follow up if those do not resolve the issue. Are you really working at https://usegalaxy.org? I may ask for a share link to 1) a history that contains the inputs + failed run and 2) the workflow used, plus if still available 3) a history where this exact same workflow executed successfully before. You can share that privately in a direct message here.

Topic		Replies	Views
Getting errors for Salmon usegalaxy.org.au support transcriptomics , salmon	3	405	October 18, 2023
Which tool is guilty? Custom local Galaxy install server-admin , workflow , tool-dev , galaxy-local , transcriptomics , snpeff , rna_star	16	1089	April 29, 2021
Active (green) History dataset: unable to use as input as tools say it was deleted Resources troubleshooting , storage-dashboard , resources	3	46	December 17, 2024
Salmon transcripts are not aggregated to genes at all usegalaxy.org support galaxy-local , salmon	1	440	October 17, 2019
Can't unpause job. usegalaxy.org support workflow , paused-jobs , picard_markduplicates	3	24	January 8, 2025

Input file deleted mid-workflow

Related topics