What is the correct input for Stacks: populations??

melissajbetters · July 27, 2023, 3:04am

Hello there! I am trying to compute population genetic statistics using Stacks: populations. I am specifying the input as type “Stacks output,” the data as the “full output from denovo_map…” and I am specifying my correct population map. However, the function keeps failing. I keep getting the error “Unable to open ‘stacks_outputs/batch_1.catalog’” or something similar.

My question is: What am I supposed to be inputting for Stacks: populations???

Thank you!
-Melissa

jennaj · July 27, 2023, 5:07pm

Hi @melissajbetters

The message is stating that an expected file wasn’t available. That could be due to a content problem with the inputs. But, this also looks a bit like a tool bug, and I’d like to eliminate or confirm that.

Would you be able to share your history please? If you want to do that privately, ask and I’ll set up a private message. That said, posting public directly in this topic is much better so more people can help out. You can unshare after we are done either way. Sharing your History

melissajbetters · July 27, 2023, 7:57pm

Hello!

I have made my history viewable here: Galaxy | Europe

I am working with paired-end RAD-seq data. Items 2969-2984 are de novo assembly of the reverse reads. I intended to use these to calculate population statistics using ‘populations’ but when I could not get this to work I tried to use the forward reads instead (items 3719-3734). However, as you can see in this history, the forward reads are not working, either. (Note: I’ve purged most of my failed attempts to retain space on the Galaxy server).

The most recent error I’ve received is: “Error: Unable to locate any file in input directory ‘stacks_outputs/’” after specifying item 3723 as the input (full output of forward read de novo map)

Thank you for your help!
~Melissa

jennaj · July 28, 2023, 12:09am

Hi @melissajbetters

Ok, thanks. This doesn’t look like a tool problem so far – it looks more like a data content problem.

The tool form has a linkout to the developer’s tutorial. Try following that to sort out which inputs get input where. You can share that example if you run into another error when translating the process to Galaxy. Stacks: Stacks Manual

That server a few different purposes:

Tool bugs are eliminated
Example analysis can be extracted into a workflow
Then, you can run that workflow to process your own data in batch.

You could even help us to translate that tutorial into the GTN if you wanted to later on (that’s how most are created – from dev resources or publications) … but let’s make sure everything is working as expected first.

Right now, it seems that the “wrong” input is being selected on the form, so when the tool looks for specific files, they are not found. I don’t see an obvious alternative to what you are doing now but that doesn’t mean the correct way cannot be discovered!

And, let’s ping an EU admin. Maybe this really is some tool dependency problem but I missed it. @wm75

melissajbetters · July 28, 2023, 4:20pm

Hi jennaj,

Thank you for having a look at the history. I’ve been following the Galaxy tutorial by Le Bras (2023) (RAD-Seq de-novo data analysis) which unfortunately omits what input you load into ‘populations.’ The Stacks manual (e.g. Stacks: populations) is also not clear as to what I should be using as the input.

Specifying ‘Stacks output’ as the input type, the input files that have not worked are:

The full output from denovo_map.pl
“Matches to the catalog…” (all x.matches files)
“Haplotypes/alleles recorded for each locus…” (all x.alleles files)
“Model calls from each locus…” (all x.snps files)
“Assembled loci…” (all .tag files)

The files themselves look perfectly fine. I am at my wits end…

*Note: The files above have not worked for the forward nor reverse reads.

jennaj · July 28, 2023, 4:51pm

Sorry that I missed that tutorial originally!

The population file format is described in this section. It isn’t automatically created, and instead needs to be constructed. Some instructions are in the “Comment: Data formatting”.

Dataset 816 and 1078 look like those contain what you want.

Current format 816

tag	sample
TCCGGAGCGC	guaymas_1
CTAACACGGC	guaymas_2
AGCTTCGATT	guaymas_3

Current format 1078

sample	count
guaymas_1	3
guaymas_2	3
guaymas_3	3

Combine those, and get rid of the underscore contained in the sample name (since it is interpreted by these tools). When demultiplexing, use sample names that are all oneWord without any underscores so things match up. Some sample names even contain dots in your history – get rid of those too, not just for this tool but collections in general work more reliably without dots in the element identifiers/sample names.

The result would look something like this (but with no header, I’m just labeling for clarity). The idea is to match the example format in the tutorial. The actual sample, tag, and counts seem to come from the upstream demultiplexing steps.

sample_tag	count
guaymas1_TCCGGAGCGC	3
guaymas2_CTAACACGGC	3
guaymas3_AGCTTCGATT	3

You can do the manipulations in Galaxy, or on your computer then load it up.

Hope that helps!

melissajbetters · July 28, 2023, 6:05pm

Hi jennaj,

Thank you again for your help! What you are referring to is the population map, rather than the actual data file(s) being processed. My current population map for this process is item 3686 in the history.

When reads are demultiplexed, process_radtags automatically names the samples by their radtag, as is seen in the tutorial linked above (e.g. sample_CCCC). However, from what I understand, this tag is not actually supplying the program with any information other than being able to identify your samples. The Stacks manual states:

“If, in addition to your barcodes, you also supply a sample name in an extra column within the barcodes file, process_radtags will name your output files according to sample name instead of barcode.” (section 4.1)

This is what I did (item 816), which is why my samples are named guaymas_1, guaymas_2, etc. They are appended with .1 or .2 automatically by process_radtags because they were processed as paired-end reads. My population map (item 3686) should be correctly formatted, as it successfully ran for denovo_map.pl.

It may be that I just cannot process the paired-end data at all. I may need to start a new history where I just process the forward reads, since I cannot figure out how to get populations to run on the current history.

Best,
Melissa

Topic		Replies	Views
Stack:population how exclude monomorphic loci? usegalaxy.eu support stacks , variant-analysis	0	502	May 14, 2019
Stacks: Using trimmed sequences as input for de novo map? usegalaxy.eu support stacks , variant-analysis	2	311	July 26, 2023
Long time running Stacks de novo map usegalaxy.eu support server-admin , stacks	4	781	July 15, 2019
Stacks2: process_Radtags won't find paired-end data usegalaxy.eu support	16	894	March 1, 2021
Trouble using a list as input in Stacks: de_novo_map usegalaxy.eu support stacks	2	1012	January 28, 2019

What is the correct input for Stacks: populations??

Related topics