I’m getting this error message when I try and run this tool. I have tried converting my bed file to gif and running the previous version of the tool. Can anyone help:
Here’s my history:
I’m getting this error message when I try and run this tool. I have tried converting my bed file to gif and running the previous version of the tool. Can anyone help:
Here’s my history:
Hi @kate2
There is another tool that does the same computation, and a bit more, that you should try instead: bedtools getfasta
Since the extract_genomic_dna tool is retained in the tool panel for legacy reasons, but deprecated, it may not run at all on certain servers (to conserve resources)! At others it may attempt to run, then eventually fail (the failure log has a message to try the other tool).
It seems the UseGalaxy.eu administrators do this for now, despite the updated tool form. I’ll follow up, but for now, the other tool is what to try! ![]()
The # header lines at the top of GTF are ignored by some tools and not by others! If any tool produces a strange message in the logs about column counts being off, or if the annotation seems to have been ignored, try removing the header. Simple tools that run on columns of coordinate data are the most sensitive.
Remove the headers (lines that start with a “#”) with the Select tool using the option “NOT Matching” with the regular expression:
^#
If you were using the headers to capture information like version or sample, reverse the search and keep the “header-only” files somewhere you can reference (dedicated history) then use a persistent #tag on your related datasets. Headers are out of specification for GTF data Genome Browser FAQ but data providers still include them for these provenance reasons. Most bioinformatics people will remove them once the annotation is input to a HTP workflow to avoid strange format issues. Galaxy will try to smooth this out for you but it isn’t always possible and you may need to do some direct data preparation.
We hope this helps! Please let us know how this works out for you. ![]()
Hi @jennaj thank you, bedtools getfasta worked for me, thanks for all the information and quick response.
Hi @jennaj, one more thing! When I convert my interval file to bed (or use that interval file in Bedtools getfasta), the bed (and the fasta file) replace gene name with that term region_nn. I attach a screenshot below. Is there a way to avoid this as I have to then produce a crosslink file to convert back when I get my results back from FIMO.
Hi @kate2
Yes, you can use the 4th column (the “name” attribute in a bed file) to use for the identifier in the outputs.
The toggles are on the form here. You can decide which to use.. maybe test them out to see the exact format differences.
Does this help? And glad this tool is working for you! ![]()
Hi @jennaj
Sadly not! I am using that toggle, but unfortunately my gene names get swopped out for region_01, region_02 etc. when my interval file is converted to BED. I’m not sure why this happens, the gene names are definitely in the fourth column of my interval file.
The tool is great and gives me what I need but I have to create a crosslink file to get my gene names back which is a bit clunky.
Kate
Thanks for explaining more @kate2
I’m wondering how you are doing this:
But then I reviewed your screenshot again! It seems the Get Flanks tool is involved not just getfasta. And the datatype is being converted. We can’t see the exact details for these steps yet.
There are other ways to update an interval format to bed format. One way is cutting out the 6 columns (or 3, or 4, or 5!), then just reassigning the datatype. Empty columns can be filled with default values using direct text manipulation tools tool.
You interval file in dataset 7 is missing the 5th column (score). Adding in that column would allow the direct reassignment of the datatype. Why? Using the auto-converter some information can be lost when the tool is attempting to guess and pad these out for you. The tools will apply that auto-coverter, too, if you input a “bed-like” coordinate input.
Try standardizing your initial bed input yourself first to have more control over how these tools are interpreting your data. The score column is any number between 0-1000. Using 0 is common for “undefined” values like yours.
These steps can be put into a mini-workflow if you have a batch of files to process the same way. But first look at the steps I included in this shared history and I think this will be clearer. Everything is tagged.
https://usegalaxy.org/u/jen-galaxyproject/h/example-bedtools-getfasta
I used two simple text manipulation tools, then a direct datatype reassignment, to bring the interval data into strict bed specification. There are many many other ways to do this. Sed, awk, jupyter, R .. so please do what works best for you! A little workflow you “favorite” is easy to find and reuse – hide the intermediate outputs and consider it analogous to a custom tool.
BED format was originally defined by UCSC, so this is the best guide to follow, as it is the same that developers will use when designing tools. “Most” tools will respect the specification. → Genome Browser FAQ (format1, BED)
Then, INTERVAL is just like BED for the first three columns, then the rules are only that the data is tabular.
Hope this helps again and apologies for not spotting the original issue for this one!
@jennaj many, many thanks for such comprehensive answer. Interestingly, I found when I saved the interval file from notepad ++ using slightly different parameter (save as ‘all files’ with a .bed extension), the file loaded properly into Galaxy with the name header above the gene-id. This then was maintained throughout and my resulting fasta file was suitably annotated at the top of each sequence with the correct associated gene-id.
Thanks again for your time, it really is much appreciated and every time I mess stuff up your answers teach me more!
Great @kate2 glad this all helped!
And yes, big picture: there are a few days to adjust data. Labeling the data before Upload can help with the datatype assignment (to get the fields annotated correctly once in Galaxy). Sometimes adjusting the field assignments on the dataset’s Attributes page using the pencil icon can help too. These are not quite as reliable, so I wouldn’t recommend these for first pass solutions, but you can certainly try! ![]()