What is the correct input for Stacks: populations??

Hello there! I am trying to compute population genetic statistics using Stacks: populations. I am specifying the input as type “Stacks output,” the data as the “full output from denovo_map…” and I am specifying my correct population map. However, the function keeps failing. I keep getting the error “Unable to open ‘stacks_outputs/batch_1.catalog’” or something similar.

My question is: What am I supposed to be inputting for Stacks: populations???

Thank you!
-Melissa

Hi @melissajbetters

The message is stating that an expected file wasn’t available. That could be due to a content problem with the inputs. But, this also looks a bit like a tool bug, and I’d like to eliminate or confirm that.

Would you be able to share your history please? If you want to do that privately, ask and I’ll set up a private message. That said, posting public directly in this topic is much better so more people can help out. You can unshare after we are done either way. Sharing your History

Hello!

I have made my history viewable here: Galaxy | Europe

I am working with paired-end RAD-seq data. Items 2969-2984 are de novo assembly of the reverse reads. I intended to use these to calculate population statistics using ‘populations’ but when I could not get this to work I tried to use the forward reads instead (items 3719-3734). However, as you can see in this history, the forward reads are not working, either. (Note: I’ve purged most of my failed attempts to retain space on the Galaxy server).

The most recent error I’ve received is: “Error: Unable to locate any file in input directory ‘stacks_outputs/’” after specifying item 3723 as the input (full output of forward read de novo map)

Thank you for your help!
~Melissa

Hi @melissajbetters

Ok, thanks. This doesn’t look like a tool problem so far – it looks more like a data content problem.

The tool form has a linkout to the developer’s tutorial. Try following that to sort out which inputs get input where. You can share that example if you run into another error when translating the process to Galaxy. Stacks: Stacks Manual

That server a few different purposes:

  1. Tool bugs are eliminated
  2. Example analysis can be extracted into a workflow
  3. Then, you can run that workflow to process your own data in batch.

You could even help us to translate that tutorial into the GTN if you wanted to later on (that’s how most are created – from dev resources or publications) … but let’s make sure everything is working as expected first.

Right now, it seems that the “wrong” input is being selected on the form, so when the tool looks for specific files, they are not found. I don’t see an obvious alternative to what you are doing now but that doesn’t mean the correct way cannot be discovered!

And, let’s ping an EU admin. Maybe this really is some tool dependency problem but I missed it. @wm75

Hi jennaj,

Thank you for having a look at the history. I’ve been following the Galaxy tutorial by Le Bras (2023) (RAD-Seq de-novo data analysis) which unfortunately omits what input you load into ‘populations.’ :sweat: The Stacks manual (e.g. Stacks: populations) is also not clear as to what I should be using as the input.

Specifying ‘Stacks output’ as the input type, the input files that have not worked are:

  1. The full output from denovo_map.pl
  2. “Matches to the catalog…” (all x.matches files)
  3. “Haplotypes/alleles recorded for each locus…” (all x.alleles files)
  4. “Model calls from each locus…” (all x.snps files)
  5. “Assembled loci…” (all .tag files)

The files themselves look perfectly fine. I am at my wits end…

*Note: The files above have not worked for the forward nor reverse reads.

Sorry that I missed that tutorial originally!

The population file format is described in this section. It isn’t automatically created, and instead needs to be constructed. Some instructions are in the “Comment: Data formatting”.

Dataset 816 and 1078 look like those contain what you want.

Current format 816

tag sample
TCCGGAGCGC guaymas_1
CTAACACGGC guaymas_2
AGCTTCGATT guaymas_3

Current format 1078

sample count
guaymas_1 3
guaymas_2 3
guaymas_3 3

Combine those, and get rid of the underscore contained in the sample name (since it is interpreted by these tools). When demultiplexing, use sample names that are all oneWord without any underscores so things match up. Some sample names even contain dots in your history – get rid of those too, not just for this tool but collections in general work more reliably without dots in the element identifiers/sample names.

The result would look something like this (but with no header, I’m just labeling for clarity). The idea is to match the example format in the tutorial. The actual sample, tag, and counts seem to come from the upstream demultiplexing steps.

sample_tag count
guaymas1_TCCGGAGCGC 3
guaymas2_CTAACACGGC 3
guaymas3_AGCTTCGATT 3

You can do the manipulations in Galaxy, or on your computer then load it up.

Hope that helps!

Hi jennaj,

Thank you again for your help! What you are referring to is the population map, rather than the actual data file(s) being processed. My current population map for this process is item 3686 in the history.

When reads are demultiplexed, process_radtags automatically names the samples by their radtag, as is seen in the tutorial linked above (e.g. sample_CCCC). However, from what I understand, this tag is not actually supplying the program with any information other than being able to identify your samples. The Stacks manual states:

“If, in addition to your barcodes, you also supply a sample name in an extra column within the barcodes file, process_radtags will name your output files according to sample name instead of barcode.” (section 4.1)

This is what I did (item 816), which is why my samples are named guaymas_1, guaymas_2, etc. They are appended with .1 or .2 automatically by process_radtags because they were processed as paired-end reads. My population map (item 3686) should be correctly formatted, as it successfully ran for denovo_map.pl.

It may be that I just cannot process the paired-end data at all. I may need to start a new history where I just process the forward reads, since I cannot figure out how to get populations to run on the current history.

Best,
Melissa