Estimate repeat sequence length from an annotated genome (.gff3) - need help

Hi, I am trying to get an estimate of the repeat sequences from an annotated genome. This has been run through RepeatMasker, so I filtered and pulled out repeatmasker annotations.

There were two entries - match and match_part (please see screenshot below). From there, I am not quite sure how to get the total read length for repeat masked sequences.

Please can I ask if anyone has done this or has any suggestions.

Many thanks

Hi @kdcs

You have the coordinates of each region. Those give you the length of each (end - start = length). You could filter by match versus match_part to remove any double counting. And filter or group by contig names to do more summaries.