Do I need to convert genomic.fna.gz file to fasta for custom genomes, if so, how?

Hello,

I’ve tried to align RNA-seq reads to a custom genome but I cannot see any alignment. Could it be that I directly uploaded genomic.fna.gz without decompressing and converting to FASTA format?

https://galaxyproject.org/learn/custom-genomes/#:~:text=The%20data%20should%20be%20formatted%20as%20FASTA%20prior%20to%20upload%20into%20Galaxy

If so, how can I accomplish it? When I extract the file from the genomic.fna.gz file, I get an FNA file. Should it be additionally converted to FASTA format and how?

Are some other steps, like indexing, needed with Galaxy tools on the FASTA genome file for the following steps before alignment?

Thank you!

1 Like

Hi @ysrbrs

A custom genome fasta should be in uncompressed format. If the data is already loaded in compressed format, it can be uncompressed within Galaxy. Click on the “pencil” icon for the dataset to reach the Edit attributes forms. The second tab (convert) will list the option to uncompress. The result should be a new dataset with the datatype “fasta” assigned. Once done, you can purge the original compressed dataset to recover working space (quota).

A fasta file in your history can be selected on most tool forms as a target or reference genome. If needed, the custom genome can be promoted to a custom build, and that new database added as an attribute to datasets (some intermediate analysis tools interpret the “database” metadata). If you are incorporating reference annotation in your analysis, make sure the genome and annotation are a match.

Before mapping, please review and apply the additional formatting requirements in the FAQs below to avoid problems.

Related Q&A here at Galaxy Help:

Best!

1 Like

Thank you I was able to change it to FASTA format!

I have another concern though. Could you clarify what this means?: “Make sure the chromosome identifiers are a match for other inputs” Custom Genomes

What are other inputs? I basically have one single FASTA file which is the entire genome and RNA-Seq data which will be aligned to the custom genome (FASTA file).

I’m not really sure what to compare to those identifiers that are obtained in Method 1, 2, 3 (https://galaxyproject.org/support/chrom-identifiers/)? Are there also chromosome identifiers in that FASTA file?

https://galaxyproject.org/support/chrom-identifiers/
The link above mentions BAM files that are produced after mapping but I think I’m supposed to do something before mapping?

1 Like

Apparently, indexing is done by default by the alignment tool:
Upload Genome Index To Galaxy For Bowtie Alignment?.

The idea is to upload and check that all the inputs you plan to use in the same analysis will be a match before starting, otherwise you might need to start over from mapping again.

Other data are sometimes incorporated in steps downstream from mapping. Mapping will go fine, but then later steps will fail. Mismatched chromosome identifiers between the genome (fasta) and annotation (GTF) are one of the more common problems people run into. If you don’t plan to add in other reference data, then you just need to make sure the fasta is formatted correctly.