snippy tutorial - snpeff step

I went through the snippy tutorial. I am trying to use snp eff at the end to summarize variants. The tutorial says to use the .gbk file becaue it automatically leads to the snpEFF step. But using this option leads to no snippy output, well and empty vcf file

When trying the .fna genome sequence as the reference, snippy works, generating a vcf file with some variants.

I then tried making a snpeff database for stapha -I did this in a different history then copied the database step here. I then ran snpeff, but it showed no variants.

I generally have trouble using snpeff when not using a built-in database.

Here’s my history for this attempt -

This forum won’t let me post the link to my history

Hi @Ken_Saville

It sounds like a problem with the snpeff database. I’d like to review, and determine if this some wrinkle on the server (and which server and tool version) or some data issue we can resolve.

This is the part that seems to be the trouble. The copy was just between histories in the same account? If yes, then there is probably some reference mismatch and not a problem with the copy itself.

You should be able to generate and post a history share link. Do you want to try again as a reply? How to create the link: FAQ: Sharing your History

Let’s start there! :slight_smile:

I can’t post the link to this email either

It got kicked back to me saying I can’t post links

Ok, I’m going to open a direct message chat that you should be able to use. Find it under your account. :slight_smile:

Wow. Thanks. I’ll check it out tomorrow.

Ken Saville, PhD
Professor
Biology Department
Albion College

I got a message saying I can’t post links.

so the history is public as

Saville snippy tutorial

Ok, thanks @Ken_Saville

Thanks for somehow getting this to me – I’ll review/adjust the trust level with your account so you can post links.

Reviewing the jobs involved, it looks like Snippy itself didn’t report any variants. I reran the job (with all the optional outputs toggled on, include the commands) and the job ran but the BAM was empty for your run. We are investigating why – it might be just some transient cluster hiccup.

But you don’t need to wait. If I rerun everything using the exact same data files throughout the entire process, I am able to get the results I think you are expecting in your custom tutorial. I think I’ve reviewed these data before - really good to see the progress!

I would suggest running through the training in this order:

  1. Gather and prepare all of the reference data first (including the SnpEFF build step)
  2. Run the tools using only that prepared reference data (including using the fasta output by the SnpEFF build process with Snippy)

This is how a workflow will work later on, data prep first, then the analysis.

This is my copy of your history with all of the original data plus tests I ran tagged in way that should explain with more details about what I did.

Please let me know once you have a copy, and I can toggle the sharing off and purge my copy. If you need more help, it would be good to start with this as a baseline, but I’ll understand if that is hard to do.

If I was going to test this more, I would probably compare those two versions of the genome fasta. Just from a quick inspection, it seems the original had line wrapping applied and the SNPeff build version didn’t but the bigger difference is the upper versus lower case of the bases. My guess is one or both tools were interpreting the lower case bases as “soft masked”, and ignoring the content, but you could confirm.

I can’t remember exactly how case is handled with both tools – but however the original tool documentation handle data is how it would work when run in Galaxy, too. To find tools to modify or inspect fasta, search the tool panel with that datatype “fasta”, and please ask if you can’t find something. There is also a “diff” utility but that works line-by-line and since the entire unwrapped sequence is on “one line” it won’t be very informative by itself. Genome assembly tools are other way to inspect/compare though.

In short: SnpEFF is extremely picky, so something super minor to a person could be tossing it off. I also toggled on all of the optional parameters – including the log with SnpEFF. We don’t use that data for “reporting” beyond job troubleshooting and certainly don’t send any data any tool back to the original development team (all of that kind of original tool behavior is isolated on our academic servers, so logs are captured but not sent anywhere else).

Apologies for length! But I wanted to split out the odd Snippy failure from the rest. Hope this helps but we can follow up more too. :slight_smile:

This certainly gives me some things to work with. Will go through your suggestions this afternoon. Thank you.

Ken Saville, PhD
Professor
Biology Department
Albion College

I have imported your version of i. so you can purge away

1 Like

Great, glad that helps! I’ve purged my copy of your data and adjusted your account preferences. Posting a link next time should be fine but let us know of course! :slight_smile:

And for this pending update issue

We don’t have the original detailed logs, but we don’t have any other jobs like it either. It was probably just a cluster crash on our side. Thousands of jobs are processing every hour and some tiny fraction (one or two) fail for no meaningful reason. If ever end up with empty results or an odd failure, please try a rerun first then anything persistent you can ask for feedback about same as this time.