I have a problem when trying to get genomic DNA sequences from a list of genomic coordinates. The file contains regions aroung ChIP-seq peaks and has been obtained trough several Galaxy functions (Map with Bowtie, MACS2 callpeak, compute and cut). It is correctly formatted (as .interval, but I have also tried as .bed) with columns corresponding to chr start end name and its genome version is indicated: [format:interval database:mm9].
When I apply the function “Extract Genomic DNA using coordinates from assembled/unassembled genomes” selecting mm9 from the locally cached genomes I get an empty file with this warning:
*622 warnings, 1st is: Chromosome by name ‘chr19’ was not found for build ‘mm9’. * Skipped 622 invalid lines, 1st is #1, “chr19 3204403 3204904 SAM-to-BAM_on_data_7__converted_BAM_peak_1”.
I get the same when I try to redo the analysis selecting mm10 instead of mm9.
Note that in https://usegalaxy.org/ there are two instances of “Extract Genomic DNA using coordinates from assembled/unassembled genomes”, one is version 3.0.3 (the same available in https://usegalaxy.eu/) but there is also a version 2.2.4 which works perfectly fine on the same .interval file above and it returns the correct fasta file with all the sequences.
Moreover, in https://usegalaxy.org/ the version 3.0.3 of “Extract Genomic DNA using coordinates from assembled/unassembled genomes” does not even display mm9 among the available cached genomes (even If I do the entire pipeline including mapping).
I noticed a problem myself with a mouse genome build a short time ago at https://usegalaxy.org (this ticket has details). Could be related – it is not clear which server is presenting with the chr19 not being found – could you clarify?
I’ll be looking at this more today and will include figuring out the problems you are seeing with the Extract tool(s). Both servers updated how the tools are organized and it seems some small issues were introduced (duplicated tools + unconnected indexes).
For now: Use the version of the tool on the server that is working for you. Make sure the “database” metadata attribute is assigned to your interval/bed input. Either server could be used – these are distinct servers/accounts, so that might involve copying the data from one server to the other.
More feedback soon and thanks for reporting the problem.
Thanks Jennifer.
I get the “Chromosome by name ‘chr19’ was not found etc.” message when using https://usegalaxy.eu/, with version 3.0.3 of the extract tool.
Related to your ticket: when I Choose the source for the reference genome among the locally cached genomes I see two occurrences of mm9 (see picture), but I get the empty file with both.
Dear Helena,
you can access the complete history at /u/mforcato/h/historyerrors (I cannot post the entire link).
Dataset 34 contains the regions that I want to give as input to “extract genomic dna” and Dataset 35 is the empty output I get.
I suspect there is a problem with the updated Extract version 3.0.3 or a problem with the indexes these tools use or both. Since the prior version 2.2.4 was working, and an updated Galaxy release was pending, I had set this aside to focus on other issues, but will go back in and figure this out one.
@hexylena I’ll chat with you direct, or put details in this ticket and ping you there, and you could do the same, or we might need to create an issue ticket against the Extract tool and link everything up if the problem is rooted there…
One small note: The Extract tool (at least up to version 2.2.4) was based on a UCSC tool that technically would accept data with a bed OR interval datatype assigned, but the data content had to be formatted as a bed dataset would be (specifically, the column ordering). Not sure yet if/how version 3.0.3 behaves differently. Likely the same but not confirmed. And this shouldn’t impact finding indexes but those will be part of the review.