I copied data files from one history to a new history for subsequent export of the latter history. I used drag and drop in the Side-by-Side view for copying 213 selected files from the original to the new history.
It appears that the order of data files in the original history gets reversed when copied to the new history. For instance, the latest data file in the original history, (“614: Bowtie2 on data 594, data 460, and data 459_xxxx: mapping stats”) becomes the first file in the new history (“1: Bowtie2 on data 594, data 460, and data 459_xxxx: mapping stats”), the last but one becomes the second file, etc. It appears that basically the complete history of files created during analyses gets reversed. Potentially important data numbers in data files in the new history thus become meaningless.
I wonder whether and how the original order of data files and file numbers can be maintained when copying numbers of files to a new history?
Dataset numbers cannot be preserved when moving data between histories, and have never worked that way. The number is a meta-index specific to that history. It is exposed as a convenience … and there are much better ways to track data (see below for more).
How it works:
The numbering starts over in a different history
When starting up a workflow with the options set to Send output to New History, any datasets that are inputs to that workflow are copied over into the new output history. The ordering of the datasets is roughly in the order consumed by the workflow.
I just tested the drag-n-drop to copy datasets between histories on the History Multiview form, and it preserved the order for me at UseGalaxy.org.
How to:
click on the Operations on Multiple Datasets checkbox in the “from” history
then check the box for each dataset to move
drag the batch over to the “to” history
Maybe you can also try this way and let us know if that will be a solution for you?
Another way to get rid of the deleted/purged is the Copy History function.
There will be a pop-up where you can choose to only copy the active datasets. Screenshot
If you are currently tracking files based on the dataset numbers, consider organizing your data in these ways instead.
Use Collection folders and Element Identifiers
The persistent “name” of a dataset file inside a collection folder is the Element Identifier
That identifier will flow to any downstream manipulations
You can extract/modify these with batch Collection Operations functions
Use Tags
These can be used with datasets files inside of a collection folder, or individual files.
For individual datasets not in a collection, click on the tag icon inside the expanded dataset. A tag with a # at the start will propagate, or you can leave off the # to annotate a dataset file in a way that will not propagate
There are also two special types of tags to handle data in more sophisticated ways: name tags and group tags
Tags can be added/removed/modified in batches two ways:
New data not yet in Galaxy: Upload → Rule-Based
Data already in Galaxy: Collection Operations → Apply Rules
With both Element Identifiers and Tags, you can use a Dataset Search to find all datasets associated with a particular sample quickly.
This type of search can be done on the regular history view (top of the panel)
or, you can search your entire account under Data → Datasets (let this finishing loading the first time you use it!).
GTN tutorials, example searches. But if anything is not clear, please ask more questions here
I hope this helps, but please ask followup questions if you think more is going on.
Screenshots might help for this one to make sure we are both looking at the same function. Try to capture the full screen if possible.
Your question was categorized as UseGalaxy.org, so you could clarify the server too … maybe another server is running an older release and needs to be updated (?).
I had used the drag-and-drop method you suggested yesterday and again today.
I had selected all the 237 active files in the from-history, then unselected 24 files, then dragged the resulting 213 files to the new, empty history displayed on the multi-view form.
The order of files is preserved, yet it gets reversed. The most recent results file (780 in the from-history) is now file number 1 in the new history. Reference genome sequence and annotations files are the top-most files when they were the first/oldest files in the from-history.
The Copy History method did preserve the order of files correctly. The most recent results file is still the most recent one in the copied history.
I think I will then have to delete unnecessary files from the copied history before exporting that history.
Would it be possible to add an option to the Copy History function so that only dataset files selected in the from-history get copied?
I probably should add even more tags and tags to my datasets, replace numbers in dataset file names by more meaningful expressions, and element identifiers.
For this, you can use Copy Datasets instead. Find the option in the gear icon inside the History panel right above where the datasets are listed. This will present you will a listing of all datasets (select all, or just some) and you can select the target history or send to a new history you are able to name/create.