Error loading population file in Stacks: Reference map - Reference-based RAD-seq Tutorial

Hello everyone,

I’m following the RAD-seq tutorial and am having trouble with the “Stacks: Reference Map” section of the population file.

My main problem:
I don’t quite understand how the population assignment works for each sample in the Population_map.txt file. In the tutorial, I see that some samples have a value of “1” and others “2,” but I don’t understand:

How do I know which population to assign to each sample?
What criteria should be used for this assignment?
The error I’m getting:
When I try to run “Stacks: Reference Map” with my Population_map.txt file and the BAM data collection

Any guidance would be very helpful! Thanks in advance

Hi @DIANA_ALEJANDRA_ALVA

Hopefully we can help!

For this

Do you mean the file created at the step here? Hands-on: RAD-Seq Reference-based data analysis / RAD-Seq Reference-based data analysis / Ecology

I see that the link in the tutorial is not working and I’ve ticketed a change to fix up the tutorial → ref-based-rad-seq tutorial.md by jennaj · Pull Request #6435 · galaxyproject/training-material · GitHub)

But that same file can be found here. → Stacks: ref_map.pl

From the example in the guide, you can see that the different populations (column 2) can be named in other ways. The important part is that the term is all oneWord1 with no spaces or special characters. Using numbers was just one easy way to do that in the tutorial (instead of all these words I’m using here :slight_smile: ). Using the full scientific name would be the first impulse for scientists, but that would break one of the “rules”: no spaces allowed! And, a full name might contain dots or other characters, which are also not allowed. Computers like whole, plain, simple terms for important data values so very simple is better and numbers are very simple!

Then, the tool form has another example under one of the input areas for this kind of data. POP1, POP2,.. POPN

Then, for your question here

I can’t see the error you got but in general, how these will correspond to the mapped BAMs for the samples, the “sampleName” for the BAM files should match the first column of the file. Again, simple is better here. Notice in the tutorial example how an underscore was used to fill in what was likely a space originally. Underscores are the one special character that most tools can process in key relational terms.

Your BAM files are in a collection correct? If not yet, try putting them into a list collection.

From here you can check to see what the current sample labels are in your new collection. If these values do not match what is in your population file yet, you can manipulate the sample label values (“element identifiers”) in a mapping file and replace them. Next time, you can set up your sample labels with the original collection of reads, then process through the downstream QA and mapping steps, and the data will keep those sample labels throughout.

The steps could look like:

  1. Create a list collection → FAQ: Creating a dataset collection
  2. Get current labels → Collection Operations → Extract element identifiers
  3. Create a new label next to the old label in a two column file → Hands-on: Data Manipulation Olympics / Data Manipulation Olympics / Introduction to Galaxy Analyses
  4. Update the label → Collection Operations → Relabel identifiers

Please give that a try and let us know if you need more help. Seeing the actual error helps to offer more specific help. This is how to share your work for feedback. → How to get faster help with your question :scientist: