Recently, I tried use galaxy bowtie2 tool to mapping chip-seq data. However, some chip-seq data downloaded from SRA has been reported a warning as followed:
##################################
Warning: skipping read ‘Run2_FC2_20100512_Sample1B_1279_21_112_F3/1’ because length (0) <= # seed mismatches (0)
Warning: skipping read ‘Run2_FC2_20100512_Sample1B_1279_21_112_F3/1’ because it was < 2 characters long
Warning: skipping read 'Run2_FC1_2010
################
I am curious about this warnings? Is it fine to go on next step (samtools) or need to map the genome again with bowtie2? And how to solve this problem?
It appears that some of the reads are very short. This may or may not indicate a quality problem and could impact your decision to use this particular data. It could also just be how the data was already trimmed and/or filtered (or not). If you did some trimming on the data yourself, consider examining a sample of the reads before and after, to confirm the appropriate settings were used.
Too-short reads will fall out during mapping. This isn’t a problem to worry about unless 1) a significant portion of the data is very short/unusable or 2) a mapping job fails for memory/resource reasons – then filter out short reads that would never pass the initial mapping criteria to reduce the size of the job.
In short, run FastQC and review the results.
Galaxy Tutorials >> Start here: NGS logistics - this is an introduction to Galaxy’s functionality for the analysis of Next Generation Sequencing data.
In fact, fastqc reported these data have the average length (50bp). So I have done the trimming and filter the low quality and min length (<30). However, it reported the same warnings. According to your advise, this warning would not affect the followed steps. So, I can continue to next step, am I understanding right?
Did you retain the fastq reads over 30 bases or under? Double check the settings used for filtering. Bowtie2 is reporting a read found that is 0 bases long. My guess is that you probably intended to filter for >=30 bases?