Hi @Sofi
Clicking into the logs reports more details about the error. How to find job logs: Troubleshooting errors
Tool Standard Error
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR…
[M::mem_pestat] (25, 50, 75) percentile: (1178, 3439, 6248)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 16388)
[M::mem_pestat] mean and std.dev: (3927.75, 2950.43)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 21458)
[M::mem_pestat] analyzing insert size distribution for orientation RF…
[M::mem_pestat] (25, 50, 75) percentile: (1329, 3521, 6080)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 15582)
[M::mem_pestat] mean and std.dev: (3874.29, 2817.50)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 20333)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[mem_sam_pe] paired reads have different names: “NB501410:184:HJY53BGXB:3:11401:17202:1034”, “NB501410:184:HJY53BGXB:2:11101:5667:1046”
The tool is reporting in a few places that it isn’t finding “enough” pairs of R1 and R2 reads. I’m guessing the individual fastq files were not stacked together in the same order.
Suggested ways to troubleshoot:
- Concatenate the reads again, and make sure that the ordering is the same between the R1 and R2 files.
- Then try running Fastq Info on both datasets together. This is a top level format check.
- Once that passes, run some QA on the reads to access the scientific content. At least run FastQC, it doesn’t change the data.
- Proceed with trimming if wanted. If you do that, run FastQC again after to review what changed.
- After everyone above is done, try mapping again.
These tutorials can help with understanding fastq data (format/content/variants) along with example QA steps/tools.
Also see domain/topic tutorials for protocol-specific QA steps. That entire training site can be searched with keywords like tool names or datatypes, and most tutorials have a workflow. Using tutorial data along with a workflow is a quick way to create a “reference history” to compare to when running your own data through similar tools.
- Concatenate tool is listed here with usage examples: Data Manipulation Olympics
- Search Tutorials (query=bwa-mem)
- or start at the home page and navigate by domain
Notes:
If the reads are all from the same sample but sequenced across multiple lanes, what you are doing now is probably the correct way to process the reads.
If the reads are all from different samples (could be the same “patient”), analyzing those samples distinctly at this stage in the processing is usually important. You can use a paired-end list collection instead (R1 + R2 in the same collection), or two list collections (R1 in one, R2 in another), and run those through in batch. See Using dataset collections and search the site with “collection” to find more. There are special ways to organize and tag data that are likely a good choice for you.
The Variant Analysis tutorials have answers for these.
And … maybe consider joining the training that is happening this week. All of the tutorials above are supported, with live help from the authors and instructors. Once you learn the best-practice methods, it will be much easier to understand how to use tools, and it what order, with your own data (potentially with adapted workflows – so so much faster! and reusable). Smörgåsbord 3: A week of Free, Online, Galaxy Training supported by the Global Galaxy Training Community
Hope that helps!