Building custom track for BAM files

I have two files in the upload for galaxy:

  1. mybam.bam
  2. mybam.bam.bai

When I exported the URL in the UCSC Genome Browser, I got a warning: **Host usegalaxy.bai not found → Name or service not known netConnect() failed failed to read index file (.bai) corresponding to https://usegalaxy.org/api/datasets/hahahahahaha/display?to_ext=bam
**
But I have corresponding index file as well but I still get this warning and somehow I can’t see the reads in the USCS genome browser.

Anyhelp or leads would be appreciated!

Best,
Deep

Hi Deep,

Galaxy can act as a track hub for UCSC Genome Browser and IGV. Also, do not upload the index file to Galaxy. Try the following:

Upload mybam.bam file to Galaxy. During setup of upload job specify the genome build (dbkey) in the 3rd box (unspecified (?)). For example, if the reads were mapped to GRCh38, select hg38. From what I see, boxes in upload menu are not signed anymore, so, if it is confusing, you can add the dbkey after upload. Click at the upload bam file, click at “?” next to database and in Build pull-down menu (in the middle window) select appropriate dbkey, for example, hg38 or mm10, whatever you work with). To connect the bam file with the UCSC Genome Browser click at name of the bam file, click at Visualize (bar graph) icon and in the middle window click at UCSC GB main. Galaxy will establish a connection with UCSC GB and select genome corresponding to the gbkey.

A bam file produced in Galaxy using a built-in genome (index) should have a proper dbkey compatible with UCSC GB, because many genomes in Galaxy were sourced from UCSC GB.

Hope that helps.

Kind regards,

Igor

1 Like

Hi @Deep_Patel

As @igor explained, the mydata.bam data upload will trigger the automatic creation and indexing of a mydata.bam.bai index. This index is a dataset that is not exposed as a separate dataset in your history. Instead, it is part of a composite dataset. You never need to load a bai or fai index up to Galaxy.

You can capture the data URL link to both parts of that composite dataset from the disc icon used for dataset downloads.


You will need to copy and paste both URLs into the UCSC custom track data loading form. AND, importantly, your Galaxy history must be set to a shared state (otherwise, no other outside application can read the data!). The first level “accessible” is enough.

Prior discussion of an issue around this function → send files from Galaxy to UCSC's EU mirror - #11 by jennaj. I don’t think this is what is going on now, but if you still have problems after capturing the URLs from the same dataset in a Galaxy history that is set to to a shared state, please let us know the URL of the Galaxy server you are working at and the URL of the UCSC genome browser you are using, and I’ll see if I can reproduce, and we can follow up from there. The US/EU/AU hosted servers from both projects should all work together fine, but if some combination has a problem, we can get that reported and sorted out.

Please let us know how this goes! :slight_smile: