No link for UCSC to visualise datasets

moonmoondeb · April 19, 2024, 1:33pm

In Galaxy Europe, I have ChIP-Seq analysis data (bigwig and bed files) in histories. in visualisation I have only link for IGV and IGB but I want to see them in UCSC.
I am not sure how I will fix it. Thanks for any advice.

jennaj · April 19, 2024, 4:45pm

Welcome, @moonmoondeb

The connection to UCSC is based on the assigned database key of the dataset.

Check these

Is there a database assigned to the dataset? Seems so since the other applications are connected.
Is that genome hosted by UCSC? Maybe not…

Make sure that what you assign is actually a match for your data and what UCSC hosts.

If you are using a custom genome build for the database key instead, IGV is probably a better choice for a genome browser since you can set up the custom genome in both applications, and Galaxy will connect based on that.

I’m guessing that UCSC doesn’t support the reference genome but we can help to confirm that. Or, you can check directly here → https://genome.ucsc.edu/

Guides

You can ask more questions about any of this if something is not clear. The database assignment and a sample of the dataset files is usually best for this type of troubleshooting. Shared history or screenshots please → How to get faster help with your question

moonmoondeb · April 22, 2024, 2:39pm

Thank you very much for your reply. I am a new user. I did not set a dataset as, I cannot find it from dropdown menu.
I have used a FASTA file for my alignment but the datafile available in UCSC ( Genome assembly GGswu; GCA_024206055.2). How can I visualise this data in UCSC?

jennaj · April 22, 2024, 8:28pm

This is the full name of the genome at UCSC

UCSC Genome Browser on ASM2420605v2 Feb. 2023 chicken (Huxu 2023) (GCA_024206055.2)

These are the supported chicken genomes hosted in the Download’s area at UCSC

UCSC Genome Browser Downloads

I don’t know if a link can be created between UCSC and this genome…

A custom genome in Galaxy that is created with this database key is what I would suggest trying.

hub_3953719_GCA_024206055.2

If that is not enough, you can send an email to the UCSC support mailing list and ask them what to do. You are interested in what the database key should be to link from Galaxy to a hub genome. Please include a link to this topic for context, and so they can decide to post back the solution here if they want. I’ll also watch the mailing list for this.

Link to UCSC Help resources. I didn’t check prior Q&A but you could. Maybe someone has already asked

Genome Browser Contacts

If you do figure this out, and no one else posts about it, I would be interested in what ends up working.

Thanks!

moonmoondeb · April 23, 2024, 12:51pm

Hi Jennaj
Thank you very much for the help.
I am trying to used this dataset: hub_3953719_GCA_024206055.2.
I will let you know how it goes.
With regards
Moonmoon

Maximilian_Haeussler · April 23, 2024, 2:15pm

Hi @jennaj, these are our new assemblies (“Genark”), they have an “internal” ID and the “official” ID and one is automatically mapped to the other. Can you use genome=GCA_024206055 in the UCSC link? Or can someone send us an example link that Galaxy uses to load the file as a custom track and we can suggest a fix to the link to make them work?

jennaj · April 23, 2024, 7:15pm

This is where I am stuck as well: which identifier format to use for track hub genomes. UCSC labels this a “dbkey” and Galaxy labels this a “database” but both reference the same bit of metadata.

My example guess is above in my other reply,. That is the portion of the URL that is usually matched up between UCSC and Galaxy for standard genomes. Did you try that yet? Then if it doesn’t work, I suggested asking UCSC.

If you asked UCSC already, maybe I just missed the Q&A at their Google group? You could post back the link if you want to for extra context. I’m curious now, too!

Thanks!

p.s. Once this is solved (and it should be solvable), let’s be sure to post back what worked here to the Galaxy forum. I might also create an FAQ for that solution since I have a guess that this will start to come up more in the future given the ongoing expansion of the track hub genome suite.

Maximilian_Haeussler · April 24, 2024, 1:54am

The identifier should be GCA_024206055. The hub_xxx is internal. Can we get an example of a link that doesn’t work ?

moonmoondeb · April 24, 2024, 12:12pm

Dear Maximilian
Thank you very much.

moonmoondeb · April 24, 2024, 12:14pm

Dear Jennaj
I have tried with that key that you mention above : hub_3953719_GCA_024206055.2. but that did not create the visualisation link with UCSC. So yesterday I send a mail to UCSC, but did not received any reply yet.

moonmoondeb · April 24, 2024, 12:18pm

Thank you Jenny

Maximilian_Haeussler · April 24, 2024, 3:19pm

Hey @moonmoondeb, this is UCSC. You sent us the link to this forum, so I’m replying here. I think this is something that probably needs a change on the Galaxy side, but I need the link that works. I need a link from Galaxy to UCSC that does work, e.g. for hg38 or hg19 or a description how I can find such a link in the Galaxy UI. once we have that, we can explain to Galaxy how to construct this type of link to UCSC with GCA_xxxx accessions or we have to do something on our side to make the link work. I believe the links already work, but Galaxy is just not showing them, because they haven’t noticed yet that we have thousands of Genbank genomes now. It’s also possible that something on our side was broken at some point and the links got removed because of that from Galaxy, idk.

jennaj · April 24, 2024, 7:45pm

Oh thanks so much @Maximilian_Haeussler !!

This is an example link for an hg38 BED file in Galaxy over to UCSC

https://usegalaxy.org/datasets/127912746/display_at/ucsc_main?redirect_url=http%3A%2F%2Fgenome.ucsc.edu%2Fcgi-bin%2FhgTracks%3Fdb%3Dhg38%26position%3Dchr7%3A155799979-155812463%26hgt.customText%3D%25s&display_url=https%3A%2F%2Fusegalaxy.org%2F%2Froot%2Fdisplay_as%3Fid%3D127912746%26display_app%3Ducsc%26authz_method%3Ddisplay_at

A history with that dataset is here → https://usegalaxy.org/u/jen-galaxyproject/h/test-4-ucsc-links

This is how/where to view that link in the application. It uses the assigned database metadata to “match” with an external application’s dbkey for the connection.

Max, feel free to message me privately here. Or, maybe start up a ticket and we can bring in the developers on that. This would be a good repository for the proposed changes → GitHub - galaxyproject/galaxy: Data intensive science for everyone.. If you want me to start that up instead, no problem, just let me know.

That so so many genomes are now hosted at UCSC is something we definitely try to get connected with Galaxy!

moonmoondeb · April 25, 2024, 10:04am

Dear Maximilian
Thank you very much.
With regards
Moonmoon

moonmoondeb · April 25, 2024, 10:06am

Dear Jennaj
Thank you very much for all the support.
Still I am not very sure what I should do.
Kindly let me know.

jennaj · April 25, 2024, 6:31pm

The link to/from Galaxy and UCSC for this different kind of browser hosting is being investigated. We don’t know which of these is the use case yet:

If there is a simple solution that will work now, either @Maximilian_Haeussler or I will post back and explain how to do that.
If instead this needs some coding changes on either side (very likely), that will take some time to implement.

We might get some more clarification today, or it might take longer. We don’t know since we are still reviewing.

What I would recommend you do for immediate use is to use an alternative application like IGV to visualize the data in a web browser. How-to is in other posts at this forum igv and we have tutorials here → GTN Materials Search (query=igv)

Your question was really good, and I’m very glad you asked about how to do this! Getting this working will be a nice new functionality to have.

Maximilian_Haeussler · April 26, 2024, 11:27am

Hi Jennifer @jennaj, the link that you gave us is not a link to UCSC, it seems to go through a round of redirects at Galaxy, it’s not the final link used to get the user to UCSC. But I can replace hg38 by hg19 in this link and that seems to work. Should it’s probably enough for us for debugging. We’ll get back to you.

moonmoondeb · April 26, 2024, 2:40pm

Dear Jennaj
Thank you very much.
I am using IGV from the beginning, as UCSC was not working. But, I feel for visualisation, UCSC is better than IGV, also UCSC has few more additional tools for making the visualisation nice
I am also using “UCSC Genome Browser on bGalGal1.mat.broiler.GRCg7b Jan. 2021 chicken white leghorn layer X broiler (broiler haplotype 2021 v2 2021) (GCF_016699485.2)” assembly. This is also same condition, no link to UCSC. Along with the previous one if you also figure this out, that will be great help for me. Thank you very much again for all the support.
With regards
Moonmoon

jennaj · April 26, 2024, 6:01pm

Thanks @Maximilian_Haeussler

Thanks! The links I shared is what is parsed out in between the apps, part of the “handshake” I think. But you will know more about this that me.

Also – I was playing around with data, and that UCSC browser was still opened to this genome, and when I clicked in the Table Browser I was both surprised and sorta delighted that the hub genome was available in that view!!

So, as a test I extracted a file to see what dbkey would be assigned to the output. It is the value but with that extra “hub_” portion at the front (hub_3953719_GCA_024206055.2). Here is the history with that data you can share with the developers → https://usegalaxy.eu/u/jenj/h/test-gca024206055

Why this matters: We want data to be able to go through a loop. UCSC > Galaxy > UCSC > Galaxy > repeat. The unique part of that connection is the database/dbkey value if we want to be able to reuse what we have now. Also, I’m not sure how we would be able to even do this a different way – meaning, how would Galaxy know where to route if not using the dbkey? But maybe we can be smart about it and make it work even if some genomes require different handling. Maybe the “hub_” part is used or similar.

Thanks again and great that this is being worked on! Thanks! Jen

Maximilian_Haeussler · April 26, 2024, 9:23pm

Hi Jen, thanks for looking into this. We discussed this here and have a plan forward. The db-key that we send to Galaxy will be something like hub_, unless we are able to change that. The problem is that the hub is server specific, for genome-euro, genome and genome-asia. So I think we will suggest that you remove this part and keep only the Genbank-id or we do this on our end.

On our side, when you send us data, we need to make a change that accepts the genbank-id as the db-parameter and internally replaces that with the hub_id_genbankId, so you don’t have to worry about our internal hub-IDs.

You will also need all genbank-ids that we have here on our side… as a text file: https://hgdownload.soe.ucsc.edu/hubs/UCSC_GI.assemblyHubList.txt

Does this plan make sense?