how to perfom Samtools view tagged.bam | grep -c “miRNAname” >> genome-basedcounts.tsv

Hello, I need to do the following work in galaxy: use command ‘Samtools view tagged.bam | grep -c “miRNAname” >> genome-basedcounts.tsv’. Apply ‘sort -k1 | uniq’ on the counts file to retain only the unique miRNA counts.
Would you please tell me what tool I need and give me some instruction

Hi @lotus

It sounds like you want to find all of the query miRNA sequences that only had one hit in the BAM file, correct? Unique mappers? You can use the tool Filter BAM for this.

Any query with mapQ value over 20 is unlikely to have other meaningful hits, but you can use 30 or even higher like 60 if you want more confidence or stringency. Maybe give that a try to see how it works for you?

The method explained with samtools would involve converting the BAM to SAM format, then counting up the number of rows in the SAM file (per unique query name), then filtering based on the count values (number of reported lines). You can do it this way too in Galaxy, maybe with a mini-workflow, and compare to the filter above.

Remember each query will always be present at least once – including lines that represent “no hit” for that query sequence, so you would need add in another filter in there to distinguish between lines reporting a single reported valid hit (single primary hit, usually above some threshold, like the CIGAR or other) and line reporting no hit at all. Most of this is what is captured in a mapQ metric.

This is a nice summary at the Biostars forum → Is there a way to do read filtering (MAPQ> certain value) on a BAM instead of SAM file?. Just keep in mind that different mapping tools may have different “magic numbers” to designate certain mapping conditions. An internet search will tend to find these – please ask if you get stuck. We’ll need to the full name/version of the tool you used in Galaxy to help look it up!

Xref → Hands-on: Data Manipulation Olympics / Data Manipulation Olympics / Introduction to Galaxy Analyses

With an example of using mapQ for filtering hits before variant calling in here → Hands-on: NGS data logistics / NGS data logistics / Introduction to Galaxy Analyses

Hope this helps! :slight_smile:

1 Like