Hi, I have a task to identify the insertion sites of my transgene in the genome of cultured CHO cells. I have performed the whole genome sequencing for my cells. I wonder whether there is any tool at Galaxy that can be useful for me? I hope to identify the reads that can be mapped to my transgene as a reference and then assemble those reads to form contigs, which will contain the trasngene and its flanking sequences.
Are you following a published protocol? We don’t have a tutorial for exactly this, so having some outline of the basic steps and an idea of what final outputs you want is a good idea. Galaxy will have the exact or analogous tools, and we can help you to identify them.
For practical first steps, you’ll will want to load up your reads, reference genome, transgene sequence, baseline reference annotation, maybe create a custom annotation record for your transgene, and then optionally prepare a SnpEff reference.
UCSC hosts a version of your reference genome, or you can get the data from NCBI, or you can use what you may already have. Try to prepare all of your baseline reference data at the very start!
If you are completely new to Galaxy, I would strongly suggest taking an hour or so to go through at least one tutorial! Galaxy hosts the common bioinfomatics tools many use for projects like yours, plus utilities for intermediate/custom data parsing, and a robust GUI-based workflow design and execution engine.
Then, full assembly is covered here. You might not need it, and I wouldn’t attempt the full genome – just the region of interest + junctions – since the remainder can be derived directly or characterized with variants. Assembly / Tutorial List
That’s a lot of information! Please review and let us know if you have any questions!