Can someone tell me how aligners, such as Bowtie, work on duplicated genes? Let’s say that there are two identical copies of a genes, A and B.
Does Bowtie assign reads that map to this gene randomly to A and B?
Is the mapping quality reduced for reads that map to duplicated genes?
Basically, I am trying to obtain reads to duplicated elements such as ribosomal RNA genes and I would appreciate advise on the best ways to analyze such genes.
Using default parameters, Bowtie2 will retain one mapping site per RNA-seq read. This and other options are configurable under the advanced options on the tool form. If all mapping sites are reported, reads that map to multiple locations will have lower mapQ scores.
Ribosomal content is considered to be contamination in standard transcriptomics protocols, and reference genomes/transcriptomes and reference annotation will exclude rRNA unless specifically constructed to include it.
UCSC includes rRNA data tracks as does Ensembl. Try mapping against just the rRNA sequences as a custom genome (fasta). This post is a good example of the reverse of what it sounds like you want to do: how can i download human ribosomal reference ?
If that doesn’t help, please explain what your analysis goals are with more detail.
Thank you, Jenna. The goal in this case is to use PAR-CLIP data to identify contacts between an E. coli protein and ribosomal RNA (rRNA), since the protein used to generate the CLIP data is known to bind to rRNA. Therefore I was wondering if there is an optimal way to map cross-linked induced mutations on rRNA. E. coli has seven copies of nearly identical rRNA genes.
While I have your attention, I wonder if Galaxy offers a workflow or pipeline for analyzing PAR-CLIP data. If not, is there a way to filter aligned reads mapping to mRNAs that contain diagnostic T>C PAR-CLIP mutations from reads that don’t and to identify binding motifs based on the filtered reads?