[Bug] Duplication of Datasets After Collection Zip

Upon zipping two datasets into a collection, the datasets are becoming duplicated in my history.

Eg:

History before zip operation:

1: Data_A
2: Data_B

History after zip operation:

1: Data_A
1: Data_A
2: Data_B
2: Data_B
3: data 1 and data 2 (zipped)

Refreshing the history or the entire webpage does not make the duplicates go away. The duplicates also appear in all of the data selection menus when trying to do further analysis.

1 Like

Thanks for reporting the problem. I’m running some tests to see if I can reproduce.

For some datatypes, hidden datasets are expected and are given the same dataset number. Did you unhide any of your data? It can be hidden again if so – most hidden data is not intended to be exposed or the data select menu will get cluttered.

Meanwhile, what Galaxy server are you working at? Will help with scope/troubleshooting.

  • Galaxy Main https://usegalaxy.org?
  • Some other public server (which? URL)?
  • Or in your own Galaxy (what version)?

What kind of data is in the collection?

  • Fastq - paired-end (collection pair), single end (collection list). Compressed or not?
  • BAM results from mapping?
  • Other?

I am working in a private galaxy instance on version 4.4.0.

I did not hide any data before calling the zip method.

The data are paired end fastq files from the tutorial here. I imported each file seperately, and then was using the zip command to combine the reverse and forward read files into a collection. (ie Type1_rep1_forward with Type1_rep1_reverse).

I realized after posting this that zip might not be intended to run on individual data files which could be the source of this bug.

1 Like

Ok, this helps. That’s is an interesting usage and as far as I know, not the intended input. Create the collections first before using collection operations tools.

Collection tools may need a tune-up to prevent this from happening. I’ll run through your use-case and run it by the developers. I’ll link back a ticket if one is created. Thanks for reporting this and sending details to reproduce!

More Collection help

Beginning tutorial for creating a collection: https://galaxyproject.org/learn/

  • Dataset collections - modern studies usually include many samples. Collection are designed to simplify complex, multi-sample analyses as shown in this tutorial.

With advanced functions covered in these tutorials: https://galaxyproject.github.io/training-material/

I sent you the link to the alternative RNA-seq tutorial that uses collections in this post, which hopefully helped:

1 Like

@jennaj Sorry to post this here, but something weird has happened and all of my posts and messages have been hidden. I can’t figure out how to message any of the staff/admins for help, could you point me in the right direction?

I noticed that and we discussed internally, not sure why. Plus wasn’t sure if you did it.

We’ll fix it. Apologies, we are all still learning the new site’s functions!

@CWunder fixed!

1 Like

Thanks for the tutorial link. I am a little confused, however. It seems that both are using the same data set and performing basically the same analysis, but the one from my earlier reply always selects forward strand when available where as the one you linked always selects reverse strand.

Given the data is the same it seems like they both cannot be correct, or am I missing some important nuance here?

@jennaj Thanks! So is there no direct message functionality?

I think it may have been due to something weird going on with embedding urls in replies. I tried linking the site from my earlier reply along with the one you linked and I got a "You cannot post a link from that host" message. I hit submit again thinking it was a bug and that’s when my account got locked and all of my posts hidden. I also received a message that all my posts had been marked by the community as spam.

Seems like some kind of automated content filtering response from the site. This is especially weird to me as 1) I was trying to link to a galaxy training page, so it shouldn’t have been blocked and 2) I was able to use the link in my original post. Maybe this has something to do with not allowing re-posting of the same link?

Edit: I originally thought I accidentally pasted in a unix path instead of a url. However, upon trying to recreate the post making sure I used the appropriate links, I still got the same error. So, I do not believe that was the case and have deleted my earlier post mentioning this possibility.