I am at the step of using the tophat tool for aligning my rna seq reads to the genome. I have paired end reads. The tophat tool asks me for “Mean Inner Distance between Mate Pairs”. How do I calculate this? I know my sequence length is 101bp. I do not know anything more. Do I need to ask the sequencing company for this information?
one approach to infer the sequence length can be to perform a preliminary alignment with bowtie2 by using a subsample of your dataset (you can generate it with the seqtk_sample tool). The alignment output includes the ISIZE field (ninth column), the inferred insert size. By substracting the lenght of the mate reads from the mean ISIZE value you can infer the inner distance between mate pairs. Negative values would indicate that both reads overlap (that is, inner distance between mate pairs = 0).
Thank you for the reply. Can I use the default value of 50 as another option?
Hi @NIKITA_JHAVERI, I recommend you to test different values (e.g. 100, 50 and 0), and evaluate the alignments in order to infer the optimal parameter value.