Freyja module not working

I am trying to incorporate the Freyja demix (Galaxy Version 1.4.4+galaxy0) tool into a workflow. Every time I try to run it, it crashes. See the full workflow attached for reference. I am also attaching the error I got. It looks like its trying to call a dictionary item that doesn’t exist.

For the inputs I am using:

  1. Tabular ivar variants output with options -t 0.0 and -q 20
  2. Samtools depths tabular output with options -aa and -m 0
  3. Usher barcode file downloaded from the Freyja github

I get correct looking outputs for the variant file and the depth file.

Any suggestions are appreciated!
Thanks,
Chris


Hi @chrisbioinfo

If you run the same tool, with the same data (a rerun of the original failed job in the history panel would test this), does that work? That will let you know if the workflow is a problem versus the tool/inputs/parameters.

From there, do you notice anything odd about the inputs for that job? Meaning, did they process correctly during the upstream steps? I think this is where the problem is but doing the other steps confirms nothing else is going on.

If the inputs look Ok, and the tool is still failing, even when run by itself, do this:

  1. post back a shared history link for an example error
  2. or send in a bug report from the error and include a link to this topic in the comments, then let us know here once that is done and I’ll look for it

This might be a tool configuration issue but I can’t tell yet since I haven’t looked at many errors for this tool yet. But that won’t matter … we can figure it out and fix whatever might be going wrong, data or the tool (maybe dependencies).

Let’s start there :slight_smile:

Hi @chrisbioinfo,

some other specific things you could try are:

  • provide VCF input instead of tabluar (the ivar variants Galaxy tool offers both I think)
  • try to run the tool again, but with its shipped UShER barcodes data instead of a custom one

If it works in one of these scenarios, then we would at least know which input is causing the problem.

Otherwise, @jennaj is correct that a shared history or a bug report would give us more details for debugging.

Thank you both for your suggestions!

Done what you suggested and made the history public ( Galaxy )

First I checked that the format of the outputs matches that which I get when I use the command line. The variant file is correct (at least for the tabular output, I assume the VCF just parses this to VCF format and is right too), however samtools depths outputs a depth file with three columns in galaxy (refname, position, depth) whereas Freyja variants outputs the depth file with four columns (refname, position, refnucleotide, depth). This could easily be the source of the problem.

I checked all your other suggestions as well: The issue is reproducible across runs on the same data as well as on different datasets I tried, for tabular and vcf output, and using the barcodes shipped with the package.

If the source of the problem is the depth file, I would suggest adding the freyja variants command as detailed here (freyja variants — Freyja documentation). I use this at the command line to create the depth and ivar variants files.

Note that there are other issues that I have found in how galaxy implements freyja and ivar that could possibly be fixed with this. I can make a separate thread for a feature request if you want, but ivar variants actually wants to take an mpileup as its input instead of a bam. The current ivar variants galaxy implementation cannot accept an mpileup.

When you run freyja variants it actually outputs the command it uses which is: samtools mpileup -aa -A -d 600000 -Q 20 -q 0 -B -f ref.fa sorted_trimmed.bam | tee (cut -f1-4 >prefix_depths.tsv) | ivar variants -p prefix -q 20 -t 0.0 -r ref.fa

That command would create both the depths and the variant file needed for freyja.

(Also I should note that I did create a bug report for this last week, but I never heard anything back so I went ahead and started this thread… sorry! If this ends up solving the issue you can ignore/close the bug report)

Thank you both again for your help! Let me know what you want me to do next, or if you want any further info.

-Chris

Hmm, the samtools depth output could make for a good suspect, but there is currently an open pull request for a Galaxy-based wastewater analysis workflow at add new workflow for WW sars-cov-2 amliconic analysis by PlushZ · Pull Request #154 · galaxyproject/iwc · GitHub which also connects samtools depth to freyja demix and I suppose it’s working fine.

I’ll explore your shared history a bit more over the weekend and see what I can find.

1 Like

I have a bit more info to share. I generated the variant and depth files with the fryeja tool via command line and uploaded them to Galaxy.

  • When using the two freyja-CL (freyja command line) generated files freyja demix-Galaxy works
  • When using the variant file from ivar variants-galaxy and the depth file from freyja variants-CL, freyja demix-Galaxy works
  • When using the variant file from ivar variants-galaxy and the depth file from samtools depths-Galaxy, freyja demix-Galaxy does not work.

To me this indicates the problem is most likely the output of samtools depth. I hope this extra information helps!

Yeah, I guess you’re correct. I just took a somewhat closer look and the workflow I’ve been linking to above is working on multiple samples simultaneously, i.e. it feeds multiple bams into samtools depth, which then emits coverage columns for each sample meaning freyja will then find that column at index 3 that it is expecting. It isn’t yet entirely clear to me what kind of silent bug exactly this introduces, but I guess this just masks the fact that samtools depth does not produce the expected format.

I need to confirm this with the workflow author, but it sounds convincing.

Now, immediate solutions for you:

  • offering freyja variants should be trivial, the tool is actually installed on usegalaxy.eu
  • as kind of a hack you could insert the required extra column into the samtools depth output - its content will likely not be read by freyja anyway

Finally regarding:

The current ivar variants galaxy implementation cannot accept an mpileup

This is intentional since mpileup is not a nice format for storing in a user’s history. ivar variants simply implements a pipe samtools mpileup | ivar variants instead. You can look at the command line generated by Galaxy by clicking on the (i) icon, then scrolling down to the Job Information section.

I see now. I think adding a column would work, can you suggest a way to do this within a galaxy workflow? I see that galaxy has an awk module so I will start with that. Particularly is there a way to actually insert the correct nucleotide since I have the reference sequences fasta anyway?

If this doesn’t work I may just make an account on usegalaxy.eu, but I would rather get this figured out so I can use freyja demix on other variant callers.

Secondly, I am also now seeing why ivar variant’s Galaxy output does not match the command line. (the predicted lineages were very wrong compared to what the command line gave)

All of the samtools mpileup command options are hard-coded. Would it be possible to add options for these in the ivar tool? In particular (and I am not an expert on this just speculating), the CL version calls samtools using the option -aa to output all possible positions in the mpileup. I believe this becomes critical for demixing variants as low frequency mutations can be indicitive of low abundance linages in a sample. I think the -Q 0 is fine since its probably being overridden by -q in ivar variants.

My galaxy pipeline is an exact copy of my command line freyja pipeline, so I would not expect significantly different results unless some options have been changed. I will keep doing some checks to see if there is some other possible cause for the differences aside from the mpileup options.

Update: Adding a dummy variable (I used N) fixed the issue with freyja demix. Thank you for your help!

The abundances are still incorrect but this is definitely to do with upstream issues with ivar variants and potentially the mapping.

Cool, thanks a lot for reporting back!

So to summarize:

  • the samtools depth output is incompatible with freyja, but can be made compatible with a bit of massaging (I don’t know of any straightforward way though to populate the new column with the correct bases, but N is a good alternative I’d say).

  • you could still simplify your workflow if freyja variants (toolshed.g2.bx.psu.edu/repos/iuc/freyja_variants/freyja_variants/1.4.4+galaxy0) was available on usegalaxy.org (@jennaj could you help here please)

As for your observations with ivar variants:
technically, we could expose the samtools mpileup params, but I’m not convinced we should. The recommended usage of ivar variants is documented as Usage: samtools mpileup -aa -A -d 0 -B -Q 0 --reference [<reference-fasta] <input.bam> | ivar variants -p <prefix> [-q <min-quality>] [-t <min-frequency-threshold>] [-m <minimum depth>] [-r <reference-fasta>] [-g GFF file], i.e. with fixed paramter values, and the Galaxy wrapper does just that - except for the -aa.
From my intuition and also if I understand -aa flag inconsistency of depth 0 and unused · Issue #82 · jaleezyy/covid-19-signal · GitHub correctly, then -aa should only matter for ivar consensus, not ivar variants because the latter is not expected to produce a variant call at zero-coverage positions anyway. It’s likely better to use the exact documented call to avoid confusion, but I’d be surprised if that’s causing differences, but feel free to try on the command line and convince me otherwise :slight_smile:

1 Like

Done, please follow Install freyja_variants at ORG · Issue #748 · galaxyproject/usegalaxy-tools · GitHub