RepeatMasker wuth dfam H5 database

Hi,
I am trying to run RM using a dfam H5 database, but it always run into error.

I already tested with a custom repeat family file and it work fine, so the prtoblem must be with the dfam H5 database.

I have uploaded dfam database directly in Galaxy and I run using that as database. In the error message I do not get any clear message.

Any tutorial to suggest?

Many thanks
Lapo

Welcome, @Lapo_Ragionieri

Tutorials are linked at the bottom of tool forms, if that tool is included in any.

  • Tutorials for RepeatMasker: screen DNA sequences for interspersed repeats and low complexity regions → Galaxy Training!

You could also try using server indexed databases. If the server where you are working doesn’t have any, you could run this step at a public server that does host some. Any of the UseGalaxy servers should have the dFam curated indexes for model organisms.

Then for your specific error, we would need to see more about the job to help with troubleshooting it: inputs, outputs, parameters. My guess is that the job is exceeding the processing resources but we can try to confirm that. How to share your work is in the banner at this forum, or see here directly. → How to get faster help with your question

Let’s start there :slight_smile:

Dear Jennaj,
I have sucesfully run using own library produced with repeatmodeller and with the available databases (i.e. drosophila). The problem only occurs when I use the downloaded dfam database. I can share my history and you can check directly, how can I share with you or with the team?

Best
Lapo

Hi @Lapo_Ragionieri

This topic has the link to the FAQ that explains how to generate a shared history link. You can post that back here in your next reply.

The query can be something public if you are worried about your own data privacy (that said, very few people will want to analyze your data for you! :star_struck:). But the query doesn’t matter for what we are investigating – as long as that data doesn’t fail itself it will work for this. Maybe consider grabbing data from one of the tutorials for this tool? Then run that query against a database you know that works, and then one that doesn’t, and share that history with both in it so I can compare and try to isolate what is going wrong.

My guess is that the job is too large to process on the cluster nodes but we can investigate. Now, if a job is “too large” for the ORG server, you can try at the EU server too, since they can sometimes allocate more resources. This is what I would be suggesting if this case is backed up with your example error. So you could go head and run that as a test as well if you wanted to, then share both histories back if still a problem?

Thanks! We’d like to figure out what is exactly going wrong. This has come up before but it didn’t get far enough along to trap the problem, and I can’t remember those details anymore. So, I’ll watch for your reply!