I have been running a few different workflows successfully as of late (read alignment/quant to transcriptome w/ Salmon). However, recently my fasta.gz transcriptome input file seems to be randomly deleted in the middle of the workflow, resulting in the Salmon error “Input dataset ‘gencode.vM23.transcripts.fa uncompressed’ was deleted before the job started.” I have run this same workflow for Salmon before using the same transcriptome file without an issue.
Within the same workflow, I am also using a genome based alignment (STAR), which uses an input GTF as a gene model for alignment. This part of the workflow executes successfully with no issue.
I also noticed that when I attempt to rerun a workflow using the same input history, I have to reselect the transcriptome file, since it claims it does not exist anymore, before rerunning (I just reselect the same file from the history, I do not reupload the fasta.gz).
I’m just wondering if this is a typical bug that can be resolved by reuploading the file, or if there is something I am missing in my execution that is causing this issue.
Is the dataset actually deleted from the history (check your hidden datasets)?
If so, have you set “Output cleanup” to “Yes” for the Salmon tool, or any upstream tools that use this dataset? This is “No” by default.
Salmon (and all other tools I am aware of) require uncompressed fasta as an input. An uncompressed version of the data is created by tools at runtime as a hidden dataset when compressed fasta.gz is input. If upstream tools are creating the hidden uncompressed dataset and have the output cleanup set to “Yes”, the Salmon tool might still be expecting the uncompressed version of the data to be available. This might be buggy behavior, or expected, hard to tell right now.
Please try two things:
A rerun, just to make sure that this wasn’t some transient server issue.
Check any upstream tool’s (that use this same fasta.gz input) “output cleanup” option. If any are set to “Yes”, change that to “No”, and do the cleanup step after Salmon (or, the last tool in your workflow that uses that same fasta.gz data).
We can follow up if those do not resolve the issue. Are you really working at https://usegalaxy.org? I may ask for a share link to 1) a history that contains the inputs + failed run and 2) the workflow used, plus if still available 3) a history where this exact same workflow executed successfully before. You can share that privately in a direct message here.