Can use BWA-MEM to concatenated files

Hi everyone! I continue to learn work in galaxy.
I have 4files R1 and 4files R2 from 1 patient.
I need to join all R1-files and all R2-files in 2 different files so I used concatenate multiple datasets tool. After this I was trying to use tool map with bwa-mem but it gives me an error. (I also trying use outher tools of concatenate but it doesn’t work)
Also I want to use Set read groups information (by Picard) for duplicates sorting (I think I need LB). But I really dont understand what to write in empty fields for ID and LB.
I read help in bwa-mem2 but still have this error…
My history https://usegalaxy.org/u/sofi_23/h/panel-12392

What I do wrong? Can I sort duplicates in bwa-mem or I need tool Markduplicates?
I’d appreciate any help

Hi @Sofi

Clicking into the logs reports more details about the error. How to find job logs: Troubleshooting errors

Tool Standard Error
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR…
[M::mem_pestat] (25, 50, 75) percentile: (1178, 3439, 6248)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 16388)
[M::mem_pestat] mean and std.dev: (3927.75, 2950.43)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 21458)
[M::mem_pestat] analyzing insert size distribution for orientation RF…
[M::mem_pestat] (25, 50, 75) percentile: (1329, 3521, 6080)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15582)
[M::mem_pestat] mean and std.dev: (3874.29, 2817.50)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 20333)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[mem_sam_pe] paired reads have different names: “NB501410:184:HJY53BGXB:3:11401:17202:1034”, “NB501410:184:HJY53BGXB:2:11101:5667:1046”

The tool is reporting in a few places that it isn’t finding “enough” pairs of R1 and R2 reads. I’m guessing the individual fastq files were not stacked together in the same order.

Suggested ways to troubleshoot:

  1. Concatenate the reads again, and make sure that the ordering is the same between the R1 and R2 files.
  2. Then try running Fastq Info on both datasets together. This is a top level format check.
  3. Once that passes, run some QA on the reads to access the scientific content. At least run FastQC, it doesn’t change the data.
  4. Proceed with trimming if wanted. If you do that, run FastQC again after to review what changed.
  5. After everyone above is done, try mapping again.

These tutorials can help with understanding fastq data (format/content/variants) along with example QA steps/tools.

Also see domain/topic tutorials for protocol-specific QA steps. That entire training site can be searched with keywords like tool names or datatypes, and most tutorials have a workflow. Using tutorial data along with a workflow is a quick way to create a “reference history” to compare to when running your own data through similar tools.


Notes:

If the reads are all from the same sample but sequenced across multiple lanes, what you are doing now is probably the correct way to process the reads.

If the reads are all from different samples (could be the same “patient”), analyzing those samples distinctly at this stage in the processing is usually important. You can use a paired-end list collection instead (R1 + R2 in the same collection), or two list collections (R1 in one, R2 in another), and run those through in batch. See Using dataset collections and search the site with “collection” to find more. There are special ways to organize and tag data that are likely a good choice for you.



The Variant Analysis tutorials have answers for these.

And … maybe consider joining the training that is happening this week. All of the tutorials above are supported, with live help from the authors and instructors. Once you learn the best-practice methods, it will be much easier to understand how to use tools, and it what order, with your own data (potentially with adapted workflows – so so much faster! and reusable). Smörgåsbord 3: A week of Free, Online, Galaxy Training supported by the Global Galaxy Training Community

Hope that helps!