Hi ,
We have a single end fastq collection with thousands of files and want to keep only files having more than 200 reads in it. Is there a way to do this with existing tools in Galaxy?
Thanks!
Saurabh
Hi ,
We have a single end fastq collection with thousands of files and want to keep only files having more than 200 reads in it. Is there a way to do this with existing tools in Galaxy?
Thanks!
Saurabh
Hi @microfuge
There isn’t an exact tool but you could string together multiple tools into a workflow to do this.
Process will be something like: count up the number of lines per file, filter on the line count values, capture the identifiers from elements/files that pass the filter, then filter the original collection with those.
The tools you will need will be covered in these:
Thanks @jennaj
I used the tools toolshed.g2.bx.psu.edu/repos/iuc/seqkit_stats/seqkit_stats/2.2.0+galaxy0 and then “Collapse collection into a single dataset” to obtain a tsv file, kept the required entries with awk and then “filter collection” .