I have 4 ATAC-seq pair-end datasets would like to align with same reference genome (hg38 Canonical) using Bowtie2 under the same condition. Of the 4 datasets, 1 is uploaded to the galaxy library for the purpose of teaching so I have that to be my control dataset. the other 3 are published datasets.
Before mapping, I processed all dataset for pre-trimmed fastqc, cutadapt/ trimmomatic and post-trimmed fastqc and it all went well. However, 2 testing datasets failed to map in Bowtie2 (both gives this error message 'samtools sort: failed to read header from “-” '. the control dataset and the other testing dataset is fine to map.
I couldn’t figure out why but here’s my guess
- I realized in the failed datasets, the size of starting materials (ID_1 or ID_2 fastq.gz) are either 1.4GB or 3.6GB, whereas the other 2 datasets fastq.gz are only either 100MB or 15MB. is the size matter ?
- while compared the Bowtie2 command lines between failed and passed datasets, this command line “/mnt/pulsar/files/staging/3529357/inputs/dataset_8950718.dat” appeared in the failed set and “/mnt/user-data-4/008/909/dataset_8909088.dat” appeared in passed dataset
Not sure if they are related.
Does any know how to fix it ?