MAKER options in Galaxy vs. command line, repeat masking

For the Maker genome annotation tool in Galaxy, in the Repeat Masking section, there are options for choosing a repeat library source.

The listed options are:
DFam (curated)
DFam (full version)
Custom library of repeats
Disable repeats

When using MAKER on the command line, here are the options for Repeat Masking:

model_org=all #select a model organism for RepBase masking in RepeatMasker
rmlib= #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein= #provide a fasta file of transposable element proteins for RepeatRunner
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)

How would I use a DFam database on the command line version?

For example, on Galaxy I use a DFam curated database for arthropoda, which I can type as the repeat source species.
But if I repeat this on the command line, do I run RepeatMasker separately with the database I want, and then input this into Maker?

1 Like

Hi @ssell

Maker runs RepeatMasker as part of the processing.

If you are curious about how Galaxy wrapped the tool, details are here: Galaxy | Tool Shed

  • There is test data available at the development repository and test cases within the tool’s XML file. You could load that data into a Galaxy history, then run simple jobs using the same parameters as the tests do. Galaxy will construct a command line that will be reported under the “Job Details” page for most results (. You wouldn’t use that command line yourself – it is specific to the server – but maybe it is informative.