I’m trying to use text reformatting using awk (version 1.1.2) to keep aligned DNA fragments of a certain size range. I want to keep all fragments between size 1 - 120 bp. I’ve used BAM to SAM on aligned files and then text reformatting using awk with the following instruction:
‘ $9 <= 120 && $9 >= 1 || $9 >= -120 && $9 <= -1 ‘
However, I’m only getting fragments between 100-120bp retained.
I know I have fragments smaller than 100 bp as I see them in the SAM file and in other size distribution analyses. I tried removing the second portion of the code (negatives) and still get the same result. Am I missing something? Any help would be appreciated.
maybe consider using Sample, Slice or Filter BAM on flags, fields, and tags using Sambamba with Filter expression template_length > -120 and template_length < 120. No need for BAM to SAM conversion.
As for the awk: maybe try something like $9 > -120 && $9 < 120 or ($9 > -120) && ($9 < 120).
I am somewhat curious how you got mapped fragments of size 1.
Thanks for the reply Igor.
For the awk program both those examples you gave me did not work either, I’m guessing there is something about the syntax that I’m missing. However, the Sample, Slice or Filter BAM worked perfectly, so thanks for that! I would not have found that program on my own.
As for the size range, I didn’t get anything mapping below ~20 bp, but wanted to get all fragments below 120bp, the 1 was just arbitrary.
Again, thanks for your help!
here is my history with awk filtering if you want to check the syntax:
The top three datasets. Filtered for +/-400.
Hope this helps.
Thanks for that, looking at you work I was able to fix my syntax issue with awk and got it to work for me.