Hi, I’m trying to use CustomProDB to create a protein fast file out of my transcriptomic variant data. The transcripts have been mapped to the genome annotation GRCh38 version 105. which is not included in the current list of annotations in CustomProDB. The latest version of is GRCh38. Is there going to be an upload of a more recent genome annotation? I’m not able to use the current genome annotaion since it return an error every time a I try to run the tool.
I’m wondering if we can solve the problem another way, specifically to sort out of there is a reference genome mismatch problem that is leading to the error.
You can share your entire history for troubleshooting, or we might be able to determine the reason for the error by reviewing:
The header of your BAM input
The header of your VCF input
The job details page, including the full logs
The URL of the server you are working at (UseGalaxy.eu?)
The version of the tool you are using - find this at the top of the tool form (Galaxy Version 1.22.0 ?)
A shared history link will cover all of those details, or the details could be in screenshots or copy/paste of each view. How-to is in the banner at this forum, or see directly → How to get faster help with your question
How did I notice this? By reviewing the identifiers in your data versus the database assignment the tool made for the error output datasets → hg38. The database key hg38 is reserved for the UCSC version of the GRCh38 human genome at UseGalaxy servers.
If you are able to adjust your data labels, I think that will be enough. Why? Your data is limited to the primary autosomes, sex chromosomes, and the mitochondrial chromosome. The actual sequence bases for those artifacts are usually identical for all versions of GRCh38 (only the labels differ). You could also confirm this by comparing your reference genome to the UCSC version of the reference genome.
About the suspected patch release difference: since the patch releases only layer in more non-primary sequences, and do not modify the primary sequences, and your data doesn’t even include those extra sequences, there shouldn’t be a secondary issue about “extra sequences in the data that are not represented in the indexed reference”.
So, a data labeling problem is what you’ll need to address next. Please see the guide above for details.