Using one of files from composite dataset (e.g. pbed) without downloading it

Hi all,
I wonder if there is any option to use just one file from composite dataset (e.g. pbed) without downloading it if I want to use this file in automatic workflow.

Hi @Alex_Shap

I don’t think this is what you are asking about, but it is what we can help with just in case.

  1. Galaxy has a couple of file format conversion tools that accept pbed as an input. Searching the tool panel with that datatype can find those.
  2. Options to convert any datatype will also be available under the :pencil2: → Edit Attributes → Datatype tab if the format conversion is a direct action eg doesn’t involve parameter choices or other datasets.
  3. If you are not sure what datatype(s) a tool accepts: start up an empty history then load up the tool form. The expected datatypes will be listed out in each input section of the tool form.

I’m not sure what this part means. Are you trying to input the data to a tool in a Galaxy workflow? Or, trying to make the data available by URL to input/visualize at some remote website? Or, want to manipulate the data directly with other tools (text manipulations, and similar).

If the above does not help, would you please explain a bit more about what you are trying to do? I’m guessing that this is followup for the prior question Galaxy Plink doesn't accept pbed composite dataset when uploaded from local disk. I asked for more details but those were not sent in, and greatly helps. Galaxy hosts 1000s of tools and 100s of datatypes, so need to narrow down the use case a bit please :slight_smile:

Thank you very much for your answers,

  1. As far as my previous topic has been closed (Galaxy Plink doesn’t accept pbed composite dataset when uploaded from local disk), I will reply here. I addressed that issue to the developer of the galaxy plink tool on GitHub. He advised to look at files in working directory. So we found the reason of the problem:
    We found that the error occures because Galaxy assigns files’ names the in composite dataset not the way Plink tool expects. When upload composite pbed dataset to Galaxy we get the following files: “Composite Dataset.bim”, “Composite Dataset.bed”, “Composite Dataset.fam”. And the Plink tool expects theese names: “RgeneticsData.bim”, “RgeneticsData.bed”, “RgeneticsData.fam”.
    When we manually changed files’ names to the expected ones on our server, Plink accepted the dataset and worked correctly. But there is no option to change files’ names in the composite dataset using Galaxy’s GUI, so ordinary users with no admin rights are not able work with pbed composite dataset when it is uploaded from local disc.
  2. This is my reply about this topic. My aim is to make a workflow using plink. However I have a couple of steps in my workflow where I must process .bim files directly as text files (.bim is a part of the pbed dataset). The Galaxy version of plink is somewhat restricted in functionality so it cannot produce just .bim file - only the whole dataset (no analogue of the command-line-plink option --make-just-bim). So I wonder if there are any options to break pbed dataset into separate files. It seems to me that I will have to make a custom tool to extract .bim file from pbed.
1 Like

Ok, thanks for explaining. All is very helpful.

I also found issue tickets about this, and one is yours? That is the best way to work with the developers. This forum is mostly about helping to solve tool usage issues for end users :slight_smile: Hopefully the details you have explained can be modeled differently in later updates.

cc @bernt-matthias

Thank you for the prompt replies.
The fastest way to solve my problems was to create custom galaxy tools - 1) for renaming files in composite dataset and 2) for extraction of files from composite dataset. It seemed to be not so easy at first but I succeeded in the end.

1 Like