Ok, thanks @Ken_Saville
Thanks for somehow getting this to me – I’ll review/adjust the trust level with your account so you can post links.
Reviewing the jobs involved, it looks like Snippy itself didn’t report any variants. I reran the job (with all the optional outputs toggled on, include the commands) and the job ran but the BAM was empty for your run. We are investigating why – it might be just some transient cluster hiccup.
But you don’t need to wait. If I rerun everything using the exact same data files throughout the entire process, I am able to get the results I think you are expecting in your custom tutorial. I think I’ve reviewed these data before - really good to see the progress!
I would suggest running through the training in this order:
- Gather and prepare all of the reference data first (including the SnpEFF build step)
- Run the tools using only that prepared reference data (including using the fasta output by the SnpEFF build process with Snippy)
This is how a workflow will work later on, data prep first, then the analysis.
This is my copy of your history with all of the original data plus tests I ran tagged in way that should explain with more details about what I did.
Please let me know once you have a copy, and I can toggle the sharing off and purge my copy. If you need more help, it would be good to start with this as a baseline, but I’ll understand if that is hard to do.
If I was going to test this more, I would probably compare those two versions of the genome fasta. Just from a quick inspection, it seems the original had line wrapping applied and the SNPeff build version didn’t but the bigger difference is the upper versus lower case of the bases. My guess is one or both tools were interpreting the lower case bases as “soft masked”, and ignoring the content, but you could confirm.
I can’t remember exactly how case is handled with both tools – but however the original tool documentation handle data is how it would work when run in Galaxy, too. To find tools to modify or inspect fasta, search the tool panel with that datatype “fasta”, and please ask if you can’t find something. There is also a “diff” utility but that works line-by-line and since the entire unwrapped sequence is on “one line” it won’t be very informative by itself. Genome assembly tools are other way to inspect/compare though.
In short: SnpEFF is extremely picky, so something super minor to a person could be tossing it off. I also toggled on all of the optional parameters – including the log with SnpEFF. We don’t use that data for “reporting” beyond job troubleshooting and certainly don’t send any data any tool back to the original development team (all of that kind of original tool behavior is isolated on our academic servers, so logs are captured but not sent anywhere else).
Apologies for length! But I wanted to split out the odd Snippy failure from the rest. Hope this helps but we can follow up more too. 