Which server did you run Convert GTF to BED12
on? That is a server-side dependency problem.
Update: I just ran the `Convert GTF to BED12 tool successfully at Galaxy EU https://usegalaxy.eu. It isnât available at Galaxy Main https://usegalaxy.org (but probably should be, Iâll make a request to add it). Converting to a BED6 is possible at both, but you need a full BED12 for this particular operation.
If your error was at usegalaxy.eu, try a rerun, maybe there was a transient cluster issue leading to the dependency not being found at runtime.
Update2: There are many ways to compare coordinates between files, find overlaps, then report, reformat, summarize, etc.
For example, the first genome you mentioned has annotation at NCBI. One of the formats available is âTabularâ (here: Proteins - Genome - NCBI). That data could be loaded into Galaxy and converted into an interval
format (or the more stringent bed6
format).
This would involve a few steps â reformatting the chromosome names (probably, depends what these are in your peaks file), subtracting â1â from the start coordinate (start coordinates are 0-based in interval/bed files but are 1-based in the NCBI file), rearranging/restricting columns of data (for bed format, for interval it wouldnât matter), then assigning the proper datatype at the end.
Much of what this tutorial is describing is how to format data into compatible formats so that their genomic coordinates can be compared accurately. It uses functions/tools under the top-level tool grouping âGENERAL TEXT TOOLSâ. The manipulations in the tutorial are specific to those particular files/datatypes but the reason why it is in the âIntroductionâ topic section, and contains so many manipulations, is to help people get familiar with some of those tools and manipulating data in general. Many of these tools are command-line utility analogs.
Some of this is explained in âPart2â of the tutorial. Biomart doesnât have your particular two genomeâs annotation, but NCBI does. If you are confused about what the dataset (file) formats should be like, or how to change metadata, or why primary keys like âchromsomesâ names need to match up, these FAQs should help:
Please try to reformat the annotation yourself. It is important to learn how to do this, and that will take some trial and error. But if you get completely stuck, write back and we can help more. I might ask for a history share link (can be sent privately). Keep the history as small as possible (just this analysis) and make sure it contains your peak file and the gene annotation files you have been working with (should include the original GFF3 and the NCBI tabular annotation plus your attempts to manipulate those). It can just be for one of the genomes. Iâm assuming that both have the same format for the peak data, so whatever solution works for one will work for the other.
If your peak data is from a public source, or if you donât mind making it public (at least in part/some subset), we could work out a clean solution then post all of that back here, so others can learn from the example. Or, we could just post back the steps to manipulate the NCBI tabular annotation into an interval dataset (simple history + workflow).