I’m having some trouble with analysing my sequencing results using Lefsa, I’ve been trying to get my head around all the ways in which I’m getting it wrong but honestly I keep just hitting road blocks, whether it’s Galaxy, Conda, Python or R.
I think (hope) it’s down to my actual data structure.
Mostly a guess → labels in the first file are like this “High_Protein” and labels in the second file are like this “High Protein”. And, “KD10__” versus “KD10”. Then, “Sample” versus “NAME”. Then, “Condition” versus “Protein_group”.
Tools that are merging data between files want exact matches for the labels. Plus, R tools don’t like values that include spaces, odd characters, or that start with a number. So – all OneWord, not starting with a number, and only use underscores (optional) as One_Word for compound names.
In Galaxy, the tools have an extra component added in that can smooth that naming out, but it is impossible to be perfect about that, especially values common between different files, so simplify the naming yourself if trouble comes up.