Download genome into Galaxy

Dear All,

Briefly, I would like to download a species genome (CBVK0000000000.1 Picea abies :: NCBI) into Galaxy.

However, the fasta of the full genome of the species in many pieces (32). I have tried Getdata/Collection/Paste-Fetch data. There I pasted all the 32 URLs(links) to the files.
After, there was an option to build a list. I created the list. Now I have no idea what to do. How I can reconstruct the full file or one file with all sequences…I mean how I will get the full genome of the species?

I’m new to Galaxy and I’m not a programmer.

Please advice.

Best, Thend

1 Like

This is where the genome fasta and reference annotation is located: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/067/695/GCA_900067695.1_Pabies01

1 Like

Dear Jennaj,

I really appreciate the help, it is much easier this way. I will try to download from that link. May I kindly ask two more questions?

How did you found this link, where did you here about this?

Also, does Galaxy able to handle .gz compressed files? Do I need to uncompress?

And one more, always fna means fasta format?, how do you know which file to choose?

Best, Thend

1 Like

The links are on the genome assembly page at NCBI.

The compression for .gz data is supported by the Upload tool. However, some data will uncompress upon upload to be available to tools. Use “autodectect” with the Upload tool to prevent problems/mismatched datatypes.

Yes, fna is a sub-version of fasta data. It designates a nucleotide fasta sequence. Galaxy will assign the datatype fasta. The reference annotation is named with gff (but is actually in gff3 format) and Galaxy will assign the datatype gff3. The README docs at this level and higher up explain the content at this site.

Format help FAQs are here: Galaxy Support - Galaxy Community Hub

Thanks!

1 Like

Dear Jennaj,

Thank you very much for the detailed help for all questions.

Best, Thend

1 Like

@jennaj, I am using maize RNA seq in galaxy for analysis, but there is no inbuilt maize reference genome. I cant download it from NCBI due to its huge size. Can Galaxy team upload Zea mayz reference genome to galaxy

@Ngmmahi_Singh on usegalaxy.eu we have those two genomes installed:

  • Zea_mays_B73_AGP_v4.0
  • Zea_mays_PH207_v1.0

Ciao,
Bjoern

Cool, thanks for the information

Can you download mm10 genome? It isn’t showing up for me.

1 Like

Hi @Gianna_Falco

The reference genome mm10 is already installed at all of the usegalaxy.* servers, and is indexed for most tools.

Some tools require that the input to the tool has the datatype assigned. This can filter options on the tool form. In your case, this would be assigning the metadata for the datatype mm10 to the input. Try doing that and see if it resolves the problem. How-to: Metadata

If you are running your own Galaxy server or using some private Galaxy server, genomes are not added or indexed for tools by default. The genome will usually show up in the list of “databases” on the Upload or Edit attribute forms, but those are just references. The actual data need to be added, and that can include genomes that are not already in the list of databases (when you add the data, it will be added to that list). These prior Q&A topics explain how-to, along with some troubleshooting help. You need to be the administrator of the server to do this. If you are not the administrator, ask them to do it and maybe point them to the help.

If that doesn’t help, please explain with more details:

  1. Where are you working? URL if a public Galaxy server, or describe if other
  2. What tool are you using? Capture the full name and version from the top of the tool form and paste that back in your reply.

Let’s start there :slightly_smiling_face: