Goseq and genome

Hi all,

I am currently analysing data coming from RNA sequence. My samples come from mice. I would like to use the goseq tool. The available genome is mm10. However the alignment was done with the previous version of the genome (GRCm38.p6). I don’t think that I can use the mm10 version for the goseq. Do you know if I have any other possibility or I should use external tools for the GO analysis (profiler, DAVID etc) ?
Thank you!

Welcome, @des_b

GRCm38 and mm10 are the same genomic assembly. But the “build” can differ between data providers (chromosome attribute). Then, the annotation used can also differ between sources (gene + transcript attributes).

See mouse assemblies here for what is used in Galaxy for GRCm38/mm10UCSC Genome Browser Downloads

If your data is based on a non-UCSC source for GRCm38/mm10 you might be able to convert the chromosome labels using the tool Replace column by values which are defined in a convert file. Common convert file mappings can be obtained through the link in the down in the help section.

Mapping between annotation sources is a bit trickier since those are usually not 1-1 pairings. But, you can try using annotateMyIDs, before deciding to use an annotation that is directly supported by GOseq instead (or, before bothering to create a custom GO mapping input, see next).

The Galaxy wrapped GOseq tool allows for custom GO mappings. Examples of the file formats are down in the tool form help.

The other two tools are also wrapped in Galaxy.

If these tools are used in Galaxy or directly, both will involve making sure that the data inputs and the reference files are based on the same genomic assembly build and a common annotation source.

TL;DR The idea is to make sure all data is from the same assembly (GRCm37, GRCm38) and the build identifiers “match up” as a technical consideration, then the scientific algorithms are applied. Computers are literal when processing important values.

  • Chr1 and chr1 and 1 all mean “Chromosome 1” to a person, but are considered different by tools.
  • Same for feature annotations: GeneA and GeneA.2 and GeneA.20 are all different to tools but could all mean “Gene footprint A” with no particular version to a person.

Resources