I am working on this tutorial Hands-on: Pathogen detection from (direct Nanopore) sequencing data using Galaxy - Foodborne Edition / Pathogen detection from (direct Nanopore) sequencing data using Galaxy - Foodborne Edition / Microbiome. For Quality Control and Preprocessing, Galaxy doesn’t allow me to select a single fastq.gz sample created file using “Build dataset list” option for the quality Control. Not sure if Galaxy features have changed since this tutorial. Can anyone help with this please.
Hi @Ahmadr215
This should all work as explained in the tutorial. Would you like to post back a screenshot of what you are seeing when stuck? A list collection can contain just one dataset.
Thanks and we can follow up until this is working for you!
I’m guessing that you are not either clicking on the dataset to select it or not using the Select all
button to choose the datasets? Even with just one sample, you’ll need to do this.
Now, I think the collection builder should not allow the Create list
button to be used, and instead block with a warning, when no datasets are selected. I’ll report that! Good catch! We are working to improve this function right now and it seems this was missed.
Please give this another try and be sure to select your dataset before clicking on that final button to submit the creation of the collection.
More soon, I’m making a ticket.
Well, now this is working for me. The tool will create the list without selecting, using whatever datasets were originally included. BUT you might need to use the history refresh icon to populate the history with the collection. The server is probably just busy.
Please try again and let us know if this helps. If you are still having problems, screenshots would still be how we can look into your issue.
Dear Jenna J
Thanks for your response, much appreciated!
I did select both “fastq files” (Barcodes 10 and 11) as you suggested in your post. But the collect data (named “list” here) is not listed when I choose FastQC function. I can select individual files, but not the combined files (not listed in FASTQC dropdown). Further, I noticed as soon as I click on “list” file, it disappears from the active files. I have to click on “show active” to see the combined file.
I can’t even create tag name for the “list” file. I deleted the uploaded files and created file several times (see the bin) and started Fresh. But made no difference.
I closed my Galaxy session and then started again, but it made no difference either. I also clicked , but I still had no luck.
I assume I can proceed with FASTQC with individual files, but I would like to learn how to work with combined files.
I am still not sure what I am doing wrong here. I appreciate if you can help here.
BTW. How I can empty my bin from past files that I don’t want to keep anymore?
HI @Ahmadr215 thanks for the screenshots, I think I notice what is going on. I’ll explain then we can follow up more.
To choose a collection folder as an input, you will need to toggle the input area to look for a folder in the active history. Do that by clicking on the folder.
This is screenshot of where to click.
Notice that this screenshot also has the accepted formats listed for this input area. When the shape of the data (individual dataset, or a collection: list or paired) and the format datatype all match up, and the data is “active”, that data becomes available to the tool form. These select toggles are a type of filter against the history to make sure the tool can interpret the input data chosen.
Your collection in dataset 40 has a datatype accepted by this tool, so you should be able to select it from the history shown in your screenshot.
Now, if you wanted to work with the datasets inside of the collection individually again, yes you could unhide them and then tools would “see” then as potential inputs. The nested datasets are hidden to keep the history organized since working with the collection folder to start up a batch of jobs is one good reason to use collections – inputs and outputs are sorted into folders. When you have two datasets it is not that important, but when there are hundreds or thousands per collection folder, this nesting becomes really important.
For tags, these can be applied to individual datasets before adding them to a collection, or you can add tags with the Apply Rules tool. You can also add tags to a collection using the pencil icon on the folder. And there are even more manipulations!
Some tutorials to explain more about tags:
- All → GTN Materials Search
- Start here → Hands-on: Name tags for following complex histories / Name tags for following complex histories / Using Galaxy and Managing your Data
- Or here → Hands-on: Group tags for complex experimental designs / Group tags for complex experimental designs / Using Galaxy and Managing your Data
Hope this helps! Please give it a try or maybe you discovered how to do this already?
Hi Jeena
Bingo! It worked- thanks for the tips.
I am pretty new to Genomics and found Galaxy is more user-friendly than other packages for learning Pathogen genomics. But, there are lots of tools that I need to learn to enable me to work independently on my own data. I am an epidemiologist and trying to learn more about pathogen genomics.
You have been very helpful with your comprehensive responses! I can’t thank you enough for being so patient with my questions. I will look at the links that you provided in your post.
Great, glad this is working now! Any questions, you can always come back to this forum.