Missing Pfam-A reference data in Galaxy PfamScan

Hi,

I’m trying to run PfamScan in Galaxy, but I see that there are no built-in reference datasets for Pfam. Specifically, I cannot find:

Pfam-A.hmm

Pfam-A.hmm.dat

Pfam-A.hmm Stockholm file

Do I need to manually download and upload these files to my history for PfamScan to work, or are these supposed to be pre-installed on the server?

I want to make sure I’m following the correct procedure. Thanks for your guidance!

Welcome @SIDDIQA_FATHIMA_A_AB

Yes, the reference data is supplied by the user at runtime when using PfamScan in Galaxy. This allows you to capture the specific versions you want to use.

An example is in this tutorial from the Galaxy Training Network. → Hands-on: Genome-wide alternative splicing analysis / Genome-wide alternative splicing analysis / Transcriptomics

Down in the Help section on tool forms you’ll find some short notes about how the underlying tool works plus links to external resources. Importantly, you’ll also find special notes about how to use the tool form – if there are any – and then links to GTN tutorials. Not all tools are included but many of the most common tools are!

The tutorial has reference data trimmed down for just one chromosome, but the URL links are a good clue about where to find the full data if you are not sure.

  • In here for the files → Index of /pub/databases/Pfam
  • Or, you can start at the top here → InterPro. Please notice how the file URLs (hover over to see) are pointing to the same place.
  • Capture links and paste those into the Upload tool using all default settings! Then you can adjust datatype format assignments as need after (not common, but I can’t remember for this specific data, so please ask if you get stuck!). → Getting Data into Galaxy

Hope this helps to get you oriented and please let us know if it actually does or if you have any follow up questions! :slight_smile: