Filter reference/subject sequences based on mapping

Hi all,

I’m trying to map some reads against some refseq sequences (many, not a genome). Is there a tool I can use to filter and get which refseq seq have for example at least 100 mapped reads, or by some covarage?

Hi @vebaev

You could try using BLASTN+ for this. The tabular output has the metrics you want to examine.

Thanks, yes BLASTN can do a tabular file but it is one line per read (hundreds lines for a subject seq), so how to summarize all reads per refseq/subject, so I can deside which refseq/subject to filter?

Hi @vebaev

This is a good tutorial that is really more of a cheatsheet for tabular data manipulations. Grouping, counting, filtering, etc → Data Manipulation Olympics