Download genome into Galaxy

Dear All,

Briefly, I would like to download a species genome (https://www.ncbi.nlm.nih.gov/Traces/wgs/CBVK01?display=contigs) into Galaxy.

However, the fasta of the full genome of the species in many pieces (32). I have tried Getdata/Collection/Paste-Fetch data. There I pasted all the 32 URLs(links) to the files.
After, there was an option to build a list. I created the list. Now I have no idea what to do. How I can reconstruct the full file or one file with all sequences…I mean how I will get the full genome of the species?

I’m new to Galaxy and I’m not a programmer.

Please advice.

Best, Thend

1 Like

This is where the genome fasta and reference annotation is located: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/067/695/GCA_900067695.1_Pabies01

1 Like

Dear Jennaj,

I really appreciate the help, it is much easier this way. I will try to download from that link. May I kindly ask two more questions?

How did you found this link, where did you here about this?

Also, does Galaxy able to handle .gz compressed files? Do I need to uncompress?

And one more, always fna means fasta format?, how do you know which file to choose?

Best, Thend

1 Like

The links are on the genome assembly page at NCBI.

The compression for .gz data is supported by the Upload tool. However, some data will uncompress upon upload to be available to tools. Use “autodectect” with the Upload tool to prevent problems/mismatched datatypes.

Yes, fna is a sub-version of fasta data. It designates a nucleotide fasta sequence. Galaxy will assign the datatype fasta. The reference annotation is named with gff (but is actually in gff3 format) and Galaxy will assign the datatype gff3. The README docs at this level and higher up explain the content at this site.

Format help FAQs are here: https://galaxyproject.org/support/#getting-inputs-right

Thanks!

1 Like

Dear Jennaj,

Thank you very much for the detailed help for all questions.

Best, Thend

1 Like