Hi,
I am window user and working on fasta files. I want to correct number of total bases in all sequences.
Thx
What do you mean with correct number of bases? Do you mean counting? Or do you want to remove bases?
I mean counting.
Hi,
you can you this tool for example: https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/devteam/fasta_compute_length/fasta_compute_length/1.0.1
Cheers,
Bjoern
Hi
I used : [https://usegalaxy.eu/root?tool_id=toolshed.g2.bx.psu.edu/repos/devteam/fasta_compute_length/fasta_compute_length] to compute length of sequences. The file was generated like this
c3386_g1_i2 len=1028 path=[159:0-115 79:116-1027] | 984 |
---|---|
c3389_g1_i1 len=1737 path=[1:0-1736] | 1650 |
c3389_g2_i1 len=280 path=[3686:0-279] | 268 |
c3391_g1_i1 len=473 path=[1:0-160 162:161-234 518:235-398 162:399-472] | 455 |
c3391_g1_i2 len=270 path=[1:0-160 162:161-234 236:235-269] | 258 |
c3392_g1_i1 len=397 path=[349:0-248 224:249-396] | 385 |
c3393_g1_i1 len=223 path=[201:0-159 361:160-179 381:180-222] | 209 |
c3397_g1_i1 len=617 path=[595:0-616] | 581 |
now i want to correct the count number in fasta file.
For most use cases, description line content in fasta datasets will cause problems with tools and should be removed. Only the identifier is used (first “word” in the “>” title line.
FAQ: https://galaxyproject.org/support/
If for some reason you really do need the lengths in the fasta headers, the “len=NNN” portion of the identifier could be recreated from this data, but not the “path=[coordinates]” portion.
Use tools from the GENERAL TEXT TOOLS
groups. The manipulation would probably involve a workflow such as: Fasta-to-Tabular > Add column to an existing dataset > Merge Columns together > Tabular-to-Fasta. Or, if you are able to construct substitution expressions, use the tool Text transformation with sed
.
Thanks!