Extract Genomic DNA expects two inputs. If the formatting or content is off for either, you will not get valid results.
Query: Genomic coordinates in
- The tool states
interval format as a valid input, which is a less strict format datatype version than
bed. Even so, the data must have
bed column format/ordering for at least the first 3-6 columns (six columns if including strand).
- The tool also states
gff format as a valid input, which is a less strict format datatype version than
gtf. This means that the 9th column (attributes) does need to include the stricter minimum values (gene_id and transcript_id). Do not use a
gff3 input or expect errors.
VCF to pgSnp is fine. The first four columns of data are in
interval format, and if you use
Cut to restrict to the first four columns the data will then be in
bed format. But using
Get flanks will also restrict the output to be in
interval format with
bed column ordering.
Target: A locally-cached index on the server or a Custom genome in
- To use a “locally-cached” genome, that genome must be assigned as the “database” metadata to the query input.
- To use a use Custom genome, you might need to run
NormalizeFasta on your
fasta to remove the description line content (data on the “>” title line after the first whitespace) and wrap the bases to a consistent length (80 is good). The tool will only be able to interpret data that is actually in
fasta format, not
tabular. Transform with
Tabular-to-Fasta, if you need to, first.
Also, check your data for chromosome/identifier naming mismatches. Between the two inputs, the “chromosome” names must be formatted exactly the same and the overall content based on the same reference genome/transcriptome version/build. This means that the identifiers in the first column of your
interval/bed dataset must exactly match what is on the “>” title line of the
The FAQs below have more details:
Hope that helps, but if not, share some more details. You could copy/paste the first few lines of both inputs and/or post back screenshots in a reply. Make sure to expand the datasets to show the currently assigned datatypes or state exactly what those are for each.