[Solved] Get datasets from docker mapping directory

Hello everyone.

A few months ago I created for a colleague of mine a Galaxy instance using Docker on a virtual machine on our cloud.

After she was done with her analyses and I explained her how to download her results, I deleted the docker container and the virtual machine. I saved the mapping directory “just in case”.

She recently asked me if I could restart the Galaxy instance since she had a problem when saving her datasets.

Now I have troubles starting a new container using the same mapping directory.
I had to go inside the Docker container and manually launch the startup script to get more information.

I had the following error:

Exception: Your database has version ‘134’ but this code expects version ‘141’. Please backup your database and then migrate the database schema by running ‘sh manage_db.sh upgrade’.

So I backuped my database, upgraded postgresql, reimported the backuped database and launched the startup script again.
Now I can access the Galaxy instance but I cannot log to the admin account (which was used by my colleague to do her analyses. She didn’t bother creating a new account). I have no error message but the “admin” tag doesn’t appear and when I click on “user” it says “connected as” with nothing following.

I even tried to import the database backup into a “virgin” Docker container created from the same Galaxy Docker image but there were no datasets in this new instance.

I know the datasets are there. There are a lot of files in mapping/galaxy-central/database/files/000/ but named dataset_XXX.dat. I don’t know where to find the information about the real datasets names, the histories names…

Is there a way to get back the datasets with their metadata?

Thank you for your help.

2 Likes

All the information you are seeking should be in Galaxy’s database. What database did you use in your configuration? Do you have it backed up from the time after your colleague’s analysis?

1 Like

Hi Marten. Thank you for your message.

I don’t know how to get the information I need in the database. I would like to give my colleague either an archive with all her datasets (with the correct names and extensions) or (best) create a new Galaxy instance with all her work.

I didn’t change anything to the database configuration.
My Galaxy instance is based on quay.io/shiltemann/galaxy-metagenomics, which is based on bgruening/galaxy-ngs-preprocessing:17.05, and this one is based on bgruening/galaxy-stable:17.05.

This is what I did to backup the database:
su - postgres
pg_dumpall > pg_backup.bak

Then I created a new Docker container based on the same image, I copied the pg_backup.bak file in the appropriate directory and I imported it (psql -f pg_backup.bak postgres) but I couldn’t see my datasets in this new instance.

UPDATE

I realized that I didn’t used the right Docker image…

So I just created a new container based on the right Docker image and linked to my mapping directory.
I connected to the container interactively to see what happens.

First the postgresql service was down. Starting failed because of the following error:

  • Error: The cluster is owned by user id 1000 which does not exist any more

So I ran the following commands and then restarted the postgresql service (successfully):

apt-get -y --purge remove postgresql*
sudo rm -Rf /etc/postgresql/
sudo rm -Rf /etc/postgresql-common
sudo rm -Rf /var/lib/postgresql
userdel -r postgres
groupdel postgres
apt-get -y install postgresql-common postgresql-9.3 postgresql-contrib-9.3 postgresql-doc-9.3

Then I got this error:

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: password authentication failed for user “galaxy”
FATAL: password authentication failed for user “galaxy”

I ran the following commands to recreate the galaxy user for postgresl:

su - postgres
createuser -P -s galaxy

But I still get the same error.

This looks like your database password/connection string is incorrect. Check Galaxy’s configuration file (galaxy.yml).

I don’t have the galaxy.yml file (my instance is a bit old…), I guess I must check galaxy.ini file.

The database_connection line looks like this:

#database_connection = sqlite:///./database/universe.sqlite?isolation_level=IMMEDIATE

But this is exactly the same line in the config file of another (working) instance of Galaxy Docker. I’m a bit surprised to see sqlite here…

I’ve replaced it with

database_connection = postgresql://galaxy:galaxy@localhost/galaxy

But I still can’t start Galaxy. I got this:

postgresql: ERROR (abnormal termination)
[…]
==> /home/galaxy/logs/uwsgi.log <==
return self.dbapi.connect(*cargs, **cparams)
File “/galaxy_venv/local/lib/python2.7/site-packages/psycopg2/init.py”, line 164, in connect
conn = _connect(dsn, connection_factory=connection_factory, async=async)
OperationalError: (psycopg2.OperationalError) could not connect to server: Connection refused
Is the server running on host “localhost” (127.0.0.1) and accepting
TCP/IP connections on port 5432?
could not connect to server: Cannot assign requested address
Is the server running on host “localhost” (::1) and accepting
TCP/IP connections on port 5432?

When doing this

tail /var/log/postgresql/postgresql-9.3-main.log

I got the following line:

2019-07-16 08:56:15 UTC LOG: could not receive data from client: Connection reset by peer

Postgresql service is on.

correct

this does not seem like a postgres problem, this probably means your client (Galaxy) application shut down unexpectedly

Given you are using galaxy.ini, you should probably be using the Paste server, not uwsgi. I would try that next.

from docs:

if a Galaxy has a galaxy.ini file configured, it will continue to use Paste by default unless additional steps are taken by the administrator

What version of Galaxy are you at?

I don’t know Paste or uwsgi. I didn’t change anything to Galaxy’s inner architecture or config.
My instance is based on Docker image bgruening/galaxy-stable:17.05.

But I just found a way to solve my problems. Here is what I did:

  1. Create a Docker container based on the same Docker image I used previously
    and mapping to the same mapping directory I used previously and which contains
    my datasets.

sudo docker run -d -p XXXX:80 -v /my/old/mapping/dir/:/export/ --name galaxy myGalaxyImage
sudo docker exec -it galaxy /bin/bash

  1. Back-up Postgresql database (inside Docker container).

su - galaxy
pg_dump -U galaxy -W -F t galaxy > dump_galaxy.tar
exit # exit from galaxy user
cp /home/galaxy/dump_galaxy.tar /export/galaxy-central/

  1. Create a new Docker container still based on the same Docker image
    but mapping to a new, empty mapping directory.

sudo docker run -d -p XXXX:80 -v /my/new/mapping/dir/:/export/ --name galaxy_clean myGalaxyImage

  1. Copy back-uped database from the “old” mapping directory to the “new” mapping directory

sudo cp /my/old/mapping/dir/galaxy-central/dump_galaxy.tar /my/new/mapping/dir/galaxy-central/

  1. Connect to “new” Docker container

sudo docker exec -it galaxy_clean /bin/bash

  1. Restore the back-uped Postgresql database (inside “new” Docker container).

cp /export/galaxy-central/dump_galaxy.tar /home/galaxy
su - galaxy
pg_restore -d galaxy dump_galaxy.tar -c -U galaxy
exit # exit from galaxy user
exit # exit from Docker container

  1. Copy datasets files from “old” container to “new” one
    (which are stored in /export/galaxy-central/database/)

sudo tar czf /my/old/mapping/dir/galaxy-central/database.tar.gz /my/old/mapping/dir/galaxy-central/database
sudo cp /my/old/mapping/dir/galaxy-central/database.tar.gz /my/new/mapping/dir/galaxy-central/database.tar.gz
sudo mv /my/new/mapping/dir/galaxy-central/database /my/new/mapping/dir/galaxy-central/database_bak
sudo tar xzf /my/new/mapping/dir/galaxy-central/database.tar.gz

  1. Restart “new” Docker container

sudo docker restart galaxy_clean

I don’t know if what I did is really clean but at least it allowed me to get back my datasets simply from the mapping directory.
Maybe this can help other people as well.