#Tags not propagated when building Dataset List

I just spent quite a bit of time manually adding multiple tags to each of 60 samples. The tags have a # in front of them, and I expected that this would cause the tags to stay with the samples in downstream analyses. However, when I grouped the 60 samples into a Dataset List, I noticed that none of the tags were carried over.

1) Why weren’t my #tags carried over to the samples when I grouped them into a dataset list?

My first step is to feed each of the 60 samples into Trimmomatic. If I don’t have a dataset list for input with this tool, then I will have to select each dataset individually by clicking.

2) Is there a way to simultaneously select multiple samples when building a Dataset List or when selecting “Multiple Datasets” in the input field with various tools (e.g. shift+click to highlight all samples, etc)?

Thanks,
Jake

1 Like

Hi @JVGen!

Dataset “hashtags” are useful for many cases, but when using collections, “group” and “name” tags or “list identifiers” are a better option.

Please see the tutorials here for how to use these:

When creating a “Dataset List”, there is an “all/none” toggle. Is that what you are looking for?

operations-multiple-datasets

For “Multiple Datasets” input selects fields, multiple datasets can definitely be selected at once. It depends what OS you are using, but on MacOSX holding down “shift” will select everything between the first selection and the last. Holding down “command” will let you pick individual datasets that may not be listed in a contiguous block.

Hope that helps!

Hi Jenna,

Thanks for writing. I was working with one large collection, but that have multiple different hashtags. The reason for this is because there are multiple ways to combine the samples (each sample belongs to multiple different groups).

I’m working on MacOS and using the Google Chrome browser. When working within the “Multiple Datasets” pop-up window, for some reason shift+click does not select all samples between the first and second selected. Instead, it acts as a command+click and just selects the first and second samples, but not those lying in-between.

If I could get the multiple-selection to work, then I could work directly with the samples without adding them to a list, and my hashtags would be retained. Any ideas?

Thanks!

1 Like

Right, I also noticed that the pop-up version of multiple select is not working with the “shift” option (Mac OSX/Chrome) after writing you. This is a brand new function. I just reported it to our developers. This is something they will work on.

Just using the original multiple select will work though – it is the middle button here:

multiple-indiv-dataset-input

Hi Jenna,

That middle, “multiple datasets” button is actually what I’ve been using. After I click it, I see an input window and a folder to the right (see image). To select samples, I click the folder, and I am once again greeted with the pop-up window in which shift+click does not work for sample selection.

How are you adding samples after clicking “multiple datasets”?

Thanks,
Jake


1 Like

Hi - The datasets that are appropriate for the expected input datatype, in your current history, will be listed in a window after clicking just on that middle button.

No need to click on the folder. That will currently list all datasets in the history – some may be the correct datatype, and some will not be. If a dataset that does not match the expected input datatype is selected, you’ll see something like this on the tool form (I selected a vcf dataset in this example, notice that it is labeled as (unavailable):

For your case, it appears that you do not have any fastqsanger or fastqsanger.gz datasets in your history. Fastq data, in the proper format, are required input for the tool Trimmomatic.

FAQs if you’d like more help understanding about datatypes. Bit of a laundry list :slight_smile: but check out the first one to start with for help with fastq data, then review others as needed (this tool now and for others later): Galaxy Support - Galaxy Community Hub

Thanks! Didn’t realize that Trimmomatic didn’t like fastq.gz files.

On this note, I am trying to use bedtools Intersect intervals to remove repetitive regions from my mapped BAM files. BAM files should be an accepted input, but these also aren’t listed when selecting “multiple datasets”. Only my .gff file is found, which contains the regions that I want removed from my bam files. Maybe it is because my BAM files are within a dataset list, and the individual entries are not listed in my main history? I have to click on the list name to see the individual bam files. I think this was done automatically as I started processing dataset lists earlier in my workflow…?20%20AM

1 Like

Make sure that your bam data is in the same active history that you are executing the tool from.

If yes, expand the bam dataset and check the assigned datatype. It should be just “bam”. If some bam variation, the data is either out of specification (missing a header?) or is not coordinate sorted.

I just tested this at Galaxy Main https://usegalaxy.org and the tool selection is picking up bam data (and others that fit the expected input types that I happened to have in this history):

Oh, missed this part. Yes, a collection will not show up unless the input select is for a collection (last icon in that set of three).

If you just want to input one of those bams, use the toggle for “show hidden” at the top of the history panel. This reveals all of the component datasets in collections. You can drag and drop individual datasets over into the input select box. This would be one at a time since you are using collections in a novel way now.