total bases in all sequences

I am window user and working on fasta files. I want to correct number of total bases in all sequences.

1 Like

What do you mean with correct number of bases? Do you mean counting? Or do you want to remove bases?

I mean counting.


you can you this tool for example:


1 Like

I used : [] to compute length of sequences. The file was generated like this

c3386_g1_i2 len=1028 path=[159:0-115 79:116-1027] 984
c3389_g1_i1 len=1737 path=[1:0-1736] 1650
c3389_g2_i1 len=280 path=[3686:0-279] 268
c3391_g1_i1 len=473 path=[1:0-160 162:161-234 518:235-398 162:399-472] 455
c3391_g1_i2 len=270 path=[1:0-160 162:161-234 236:235-269] 258
c3392_g1_i1 len=397 path=[349:0-248 224:249-396] 385
c3393_g1_i1 len=223 path=[201:0-159 361:160-179 381:180-222] 209
c3397_g1_i1 len=617 path=[595:0-616] 581

now i want to correct the count number in fasta file.

1 Like


For most use cases, description line content in fasta datasets will cause problems with tools and should be removed. Only the identifier is used (first “word” in the “>” title line.


If for some reason you really do need the lengths in the fasta headers, the “len=NNN” portion of the identifier could be recreated from this data, but not the “path=[coordinates]” portion.

Use tools from the GENERAL TEXT TOOLS groups. The manipulation would probably involve a workflow such as: Fasta-to-Tabular > Add column to an existing dataset > Merge Columns together > Tabular-to-Fasta. Or, if you are able to construct substitution expressions, use the tool Text transformation with sed.