Hello, my problem is as follows : from a collection containing several samples assembled with megahit, I want to retrieve the name of my sample (the identifier element) and add it in front of all the contig IDs in my fasta files, while maintaining the correspondence with the base sample.
Example : In my collection I have this 2 samples : sample_1 and sample_2 and I want to rename >k141_0 flag=1 multi=3.0000 len=313 from assembly of sample 1 to >sample_1_k141_0 flag=1 multi=3.0000 len=313 and >k141_4921 flag=1 multi=5.0000 len=329 from assembly of sample 2 to >sample_2_k141_4921 flag=1 multi=5.0000 len=329…and so on for the rest of the fasta files.
I already try something with Extract element identifiers tool but I can’t manage it for the moment.
This should be possible, some with existing tools and some with some custom manipulations, all within your Galaxy history. Since this involves metadata then the actual data, there isn’t one dedicated tool but that is Ok!
Would you be able to explain a bit more about this part so far? It is where to start.
Are you having trouble getting this output? If the issue is the datatype, you can convert the text file to a tabular format, to allow the other manipulation tools to “see” it.
The downstream manipulation will involve a series of Text Manipulation tools – and can be run on the entire collection in a batch. The easiest would be to convert to your fasta data to tabular, then add in the element identifiers, merge the first two columns with an underscore, and finally at the end to convert back to fasta.
Thank you for your quick reply. I had something like that in mind, but it’s the part about adding the identifier element that’s causing me trouble (I did not specify this, but I would like to do it automatically in a workflow, for example.):
My problem with the Extract element identifiers tool is that it converts the collection into a single text file containing the names of all the samples. I’m stuck here because I don’t understand how to find the correspondence so that the correct sample name in the text file goes to the correct fasta file corresponding to that sample in the collection, automatically in a workflow.
I agree about the element identifier method. What about this instead?
FASTA-to-tabular (split the title to 2 columns)
Add input name as column to an existing tabular file (toggle to prepend)
Tabular-to-FASTA converts tabular file to FASTA format (select both column 1 and column2 – these will be joined with an underscore)
The description content is lost, but it would be most likely lost with the downstream analysis tools anyway. You can always reference the original identifier <> full description content in the tabular file.