Unable to unzip file after Downloading Reference genome from NCBI

Hi all,

I have downloaded some zip files from NCBI for my reference genome but I was not able to unzip it on my macbook. Do you have any tips in how to address this issue? Thanks

Hi Santatra,

I’d be happy to help you troubleshoot this issue. Could you provide a little more information to help narrow down the problem? For example, do you get an error message when trying to unzip the file? Also how are you trying to unzip it?

In the meantime, here are some things you can try:

Double check that file is the correct size and wasn’t interrupted during download from NCBI.

Try and gunzip the file on on the built in macOS terminal.

Galaxy
Additionally, you could use the the galaxy upload data button to pull this data into galaxy which will allow you to unzip it and then download it locally. To pull it into galaxy In the Upload Data interface, select the “Paste/Fetch data” tab.

Then paste the FTP link for the data you want to download from the NCBI into the text box. The links will most likely look something like this: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/004/027/855/GCA_004027855.1_ChoDid_v1_BIUU/GCA_004027855.1_ChoDid_v1_BIUU_genomic.fna.gz.

After pasting your link Galaxy should download it to your history and from there.

To uncompress it you can follow these steps:

  • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
  • In the central panel, click on the galaxy-gear Convert tab on the top
  • In the upper part galaxy-gear Convert, select Convert compressed to uncompressed
  • Click the Create dataset button to start the conversion.

From there you can continue to work with the uncompressed file in Galaxy or download it locally to your computer.

1 Like

Hi,
Thank you so much for your help.

When I try to unzip the file with the archive utility on my mac, it tells “Unable to expand this file”. After that, I tried with another unzip software like keka but it ended up with “failed”. I think, something is missing on the file I try to download. Indeed, when I am in the NCBI website and see the size of the file, which is 2GB, but after I download it, the zip file is only 33MB.
I have also used my terminal to unzip the file but did not work.

But now, I have just tried to pull the data into galaxy and it perfectly works. I have followed all you mentioned and now I have my reference genome.

Thank you very much.
Santatra

Great, thanks for letting us know what worked, @santatra!