CustomProDB troubleshooting

Hi, I’m trying to use CustomProDB to create a protein fast file out of my transcriptomic variant data. The transcripts have been mapped to the genome annotation GRCh38 version 105. which is not included in the current list of annotations in CustomProDB. The latest version of is GRCh38. Is there going to be an upload of a more recent genome annotation? I’m not able to use the current genome annotaion since it return an error every time a I try to run the tool.

I would appreciate your help.

Best regards,
Daniel

Welcome, @Daniel_Flender

I’m wondering if we can solve the problem another way, specifically to sort out of there is a reference genome mismatch problem that is leading to the error.

You can share your entire history for troubleshooting, or we might be able to determine the reason for the error by reviewing:

  1. The header of your BAM input
  2. The header of your VCF input
  3. The job details page, including the full logs
  4. The URL of the server you are working at (UseGalaxy.eu?)
  5. The version of the tool you are using - find this at the top of the tool form (Galaxy Version 1.22.0 ?)

A shared history link will cover all of those details, or the details could be in screenshots or copy/paste of each view. How-to is in the banner at this forum, or see directly → How to get faster help with your question

Reference: Tutorials using galaxyp/custom_pro_db

Let’s start there :slight_smile:

Dear Jennaj,

Thank you for your reply! I’ll be happy to share my galaxy history with you. Here is the link: https://usegalaxy.eu/u/dflender/h/nsclc-hla-seq

I hope this helps.

Kind regards,
Daniel

Hi @Daniel_Flender

Thank you for sharing, I see the problem.

You data is based on an version of GRCh38 that uses chromosome identifiers like: 1, 2, … MT.

Where the version of GRCh38 hosted at UseGalaxy.eu was sourced from UCSC (UCSC Genome Browser Downloads) and uses chromosome identifiers like: chr1, chr2, … chrM.

How did I notice this? By reviewing the identifiers in your data versus the database assignment the tool made for the error output datasets → hg38. The database key hg38 is reserved for the UCSC version of the GRCh38 human genome at UseGalaxy servers.

More details about these reference data differences, and what to do now, is described in this guide → Reference genomes at public Galaxy servers: GRCh38/hg38 example

If you are able to adjust your data labels, I think that will be enough. Why? Your data is limited to the primary autosomes, sex chromosomes, and the mitochondrial chromosome. The actual sequence bases for those artifacts are usually identical for all versions of GRCh38 (only the labels differ). You could also confirm this by comparing your reference genome to the UCSC version of the reference genome.

About the suspected patch release difference: since the patch releases only layer in more non-primary sequences, and do not modify the primary sequences, and your data doesn’t even include those extra sequences, there shouldn’t be a secondary issue about “extra sequences in the data that are not represented in the indexed reference”.

So, a data labeling problem is what you’ll need to address next. Please see the guide above for details.

Hope this helps!