Datasets visible in "History size" but not in history

Hey,

While generating new datasets within a history, I noticed earlier datasets were disapearing - they were still visible on through storage management and History Size.

Why are they disapearing and how can I make them appear again in the history? They dont seem to be deleted or hidden.

Thank you

Welcome @tl1

Let’s find your data!

I’ll list all the places to look in a history so you can check.

  1. Check your Active, Deleted, and Hidden tabs
  2. Review to see if any of the datasets had an implicit (automatic) format conversion.

The second one is invoked for very simple direct data format changes. Examples could be uncompressing the data (fasta.gz fasta) or converting to an simpler format required by a tool (bambed).

Screenshots

Inputting a compressed fasta.gz dataset to a tool that processes uncompressed fasta will have an extra annotation in the input select area – this example shows the new datatype used will be fasta, even through the original file is in the fastq.gz format.

As the tool processes, a special type of hidden dataset is created – the converted format – and this data is nested within the original dataset to distinguish it from hidden datasets that represent unique data, such as datasets inside of a collection.

To see what this data is, their will be a new arrow icon added to the dataset, along with a number (the count of how many converted items there are – since it could be several!).

To activate this full view of the data, you can click on the converted items icon. If the converted icon is not showing up (stale view), you can try to toggle on the hidden tab first (or refreshing your browser window) then these will display on the datasets but you’ll still need to open the nested view by clicking on the icon.


Why do we do this? Minor changes like compressing/uncompressing data is something we can do for you! Then, to fully document what happened, in the history, we capture all intermediate versions of the data involved. Even a slightly different format may matter – and we want you to have access to the actual inputs. This is part of our mission of complete reproducibility tracking for analysis projects.

Please give this a try in your own history to see if this finds the data! And if you need more help, we can help more here. Would you like to generate a share link to the history and post it back for review?

I’m an admin at UseGalaxy.org, so you could also let me know your public username (NOT email address to a public forum!) and the name of the history, and I can help to review that way as well.

Thanks! :slight_smile:

Update:

I found your history @tl1 – and you do have these converted datasets inside of it. However, there are are 10 instead of one for the alternative tabix format. I’d like to discuss this privately with you.

You may have uncovered a bug! As far as I know, there should only be one converted version of the tabix format.

Would you please write into the UseGalaxy.org administrative email at galaxy-bugs@lists.galaxyproject.org so we can discuss? I’m concerned about how so many converted versions were created and would like to share this with our developers privately for their technical review – with your permission.

I’m glad you wrote in and will watch for your email. Be sure to write in from the email address used for the account. I only reviewed one history so far but can help to review all of these and to get your account corrected. :hammer_and_wrench:

1 Like

Thank you!

Just followed up by email.

Thank you!

1 Like

Great, thank you @tl1 – I have your email and wanted to let you know I am still reviewing. More soon! :scientist:

Hi!

I am wondering if its best to start fresh with new history and reuploaded data (after purging my server).

Will probably do it in a few hours, but waiting for your input since you were taking a look at it.

Thank you