Can use BWA-MEM to concatenated files

Sofi · May 20, 2023, 5:40pm

Hi everyone! I continue to learn work in galaxy.
I have 4files R1 and 4files R2 from 1 patient.
I need to join all R1-files and all R2-files in 2 different files so I used concatenate multiple datasets tool. After this I was trying to use tool map with bwa-mem but it gives me an error. (I also trying use outher tools of concatenate but it doesn’t work)
Also I want to use Set read groups information (by Picard) for duplicates sorting (I think I need LB). But I really dont understand what to write in empty fields for ID and LB.
I read help in bwa-mem2 but still have this error…
My history https://usegalaxy.org/u/sofi_23/h/panel-12392

What I do wrong? Can I sort duplicates in bwa-mem or I need tool Markduplicates?
I’d appreciate any help

jennaj · May 21, 2023, 9:22pm

Hi @Sofi

Clicking into the logs reports more details about the error. How to find job logs: Troubleshooting errors

Tool Standard Error
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR…
[M::mem_pestat] (25, 50, 75) percentile: (1178, 3439, 6248)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 16388)
[M::mem_pestat] mean and std.dev: (3927.75, 2950.43)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 21458)
[M::mem_pestat] analyzing insert size distribution for orientation RF…
[M::mem_pestat] (25, 50, 75) percentile: (1329, 3521, 6080)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15582)
[M::mem_pestat] mean and std.dev: (3874.29, 2817.50)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 20333)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[mem_sam_pe] paired reads have different names: “NB501410:184:HJY53BGXB:3:11401:17202:1034”, “NB501410:184:HJY53BGXB:2:11101:5667:1046”

The tool is reporting in a few places that it isn’t finding “enough” pairs of R1 and R2 reads. I’m guessing the individual fastq files were not stacked together in the same order.

Suggested ways to troubleshoot:

Concatenate the reads again, and make sure that the ordering is the same between the R1 and R2 files.
Then try running Fastq Info on both datasets together. This is a top level format check.
Once that passes, run some QA on the reads to access the scientific content. At least run FastQC, it doesn’t change the data.
Proceed with trimming if wanted. If you do that, run FastQC again after to review what changed.
After everyone above is done, try mapping again.

These tutorials can help with understanding fastq data (format/content/variants) along with example QA steps/tools.

Also see domain/topic tutorials for protocol-specific QA steps. That entire training site can be searched with keywords like tool names or datatypes, and most tutorials have a workflow. Using tutorial data along with a workflow is a quick way to create a “reference history” to compare to when running your own data through similar tools.

Concatenate tool is listed here with usage examples: Data Manipulation Olympics
Search Tutorials (query=bwa-mem)
or start at the home page and navigate by domain

Notes:

If the reads are all from the same sample but sequenced across multiple lanes, what you are doing now is probably the correct way to process the reads.

If the reads are all from different samples (could be the same “patient”), analyzing those samples distinctly at this stage in the processing is usually important. You can use a paired-end list collection instead (R1 + R2 in the same collection), or two list collections (R1 in one, R2 in another), and run those through in batch. See Using dataset collections and search the site with “collection” to find more. There are special ways to organize and tag data that are likely a good choice for you.

The Variant Analysis tutorials have answers for these.

And … maybe consider joining the training that is happening this week. All of the tutorials above are supported, with live help from the authors and instructors. Once you learn the best-practice methods, it will be much easier to understand how to use tools, and it what order, with your own data (potentially with adapted workflows – so so much faster! and reusable). Smörgåsbord 3: A week of Free, Online, Galaxy Training supported by the Global Galaxy Training Community

Hope that helps!

Topic		Replies	Views
NGS tutorial - Mapping trimmed fastq Files usegalaxy.org support	0	449	January 6, 2021
Map with BWA-MEM problem - An error occurred with this dataset: bam format mapping , bwa-mem	3	128	September 17, 2024
LUMPY problem bam mapping	0	276	February 6, 2022
Help me with the error on mem bwa usegalaxy.org support gtn-tutorial , troubleshooting , bwa-mem , tool-help	1	42	October 15, 2024
running BWA MEM2 on ONT fastq file: https://usegalaxy.eu/jobs/submission/success usegalaxy.eu support tool-help	1	16	February 28, 2025

Can use BWA-MEM to concatenated files

Related topics