Hi everyone,
I am running a UMI-tools extract command that works on HPC:
(example of nextflow command)
umi_tools extract --extract-method=regex
–bc-pattern=".+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})$"
-I ${sample_name}_trimmed.fastq
-S ${sample_name}_umi_cleaned.fastq > ${sample_name}_umi_tools.log
but it does not run properly on Galaxy
I need to specify the following regex so UMI-tools detect the UMI at the end:
.+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})$
However, when I run it on GA, the + and the $ seem to be dropped:
In the log, the pattern comes up as: .(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})
And although UMI runs it fails to correctly detect the adapter and UMI at the end of the read. I end up with a few hundred reads 1-2 bp long instead of several million reads with their proper UMI extracted.
I also tried to use backslashes :.\+(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})\$
and an asterisk
`.*(?P<discard_1>AACTGTAGGCACCATCAAT){s<=2}(?P<umi_1>.{12})``
But both failed
Thanks for your help