Do I need to convert genomic.fna.gz file to fasta for custom genomes, if so, how?

ysrbrs · June 28, 2021, 4:57pm

Hello,

I’ve tried to align RNA-seq reads to a custom genome but I cannot see any alignment. Could it be that I directly uploaded genomic.fna.gz without decompressing and converting to FASTA format?

https://galaxyproject.org/learn/custom-genomes/#:~:text=The%20data%20should%20be%20formatted%20as%20FASTA%20prior%20to%20upload%20into%20Galaxy

If so, how can I accomplish it? When I extract the file from the genomic.fna.gz file, I get an FNA file. Should it be additionally converted to FASTA format and how?

Are some other steps, like indexing, needed with Galaxy tools on the FASTA genome file for the following steps before alignment?

Thank you!

jennaj · June 28, 2021, 5:20pm

Hi @ysrbrs

A custom genome fasta should be in uncompressed format. If the data is already loaded in compressed format, it can be uncompressed within Galaxy. Click on the “pencil” icon for the dataset to reach the Edit attributes forms. The second tab (convert) will list the option to uncompress. The result should be a new dataset with the datatype “fasta” assigned. Once done, you can purge the original compressed dataset to recover working space (quota).

A fasta file in your history can be selected on most tool forms as a target or reference genome. If needed, the custom genome can be promoted to a custom build, and that new database added as an attribute to datasets (some intermediate analysis tools interpret the “database” metadata). If you are incorporating reference annotation in your analysis, make sure the genome and annotation are a match.

Before mapping, please review and apply the additional formatting requirements in the FAQs below to avoid problems.

Preparing and using a Custom Reference Genome or Build
Mismatched Chromosome identifiers (and how to avoid them)
Common datatypes explained – see fasta and gtf

Related Q&A here at Galaxy Help:

Best!

ysrbrs · June 28, 2021, 7:09pm

Thank you I was able to change it to FASTA format!

I have another concern though. Could you clarify what this means?: “Make sure the chromosome identifiers are a match for other inputs” Custom Genomes

What are other inputs? I basically have one single FASTA file which is the entire genome and RNA-Seq data which will be aligned to the custom genome (FASTA file).

I’m not really sure what to compare to those identifiers that are obtained in Method 1, 2, 3 (https://galaxyproject.org/support/chrom-identifiers/)? Are there also chromosome identifiers in that FASTA file?

https://galaxyproject.org/support/chrom-identifiers/
The link above mentions BAM files that are produced after mapping but I think I’m supposed to do something before mapping?

ysrbrs · June 28, 2021, 9:09pm

Apparently, indexing is done by default by the alignment tool:
Upload Genome Index To Galaxy For Bowtie Alignment?.

jennaj · June 29, 2021, 12:08am

The idea is to upload and check that all the inputs you plan to use in the same analysis will be a match before starting, otherwise you might need to start over from mapping again.

Other data are sometimes incorporated in steps downstream from mapping. Mapping will go fine, but then later steps will fail. Mismatched chromosome identifiers between the genome (fasta) and annotation (GTF) are one of the more common problems people run into. If you don’t plan to add in other reference data, then you just need to make sure the fasta is formatted correctly.

Topic		Replies	Views
Adding a reference genome to map RNAseq contig usegalaxy.org support custom-genome , transcriptomics	1	17	April 16, 2025
Upload Genome Reference	3	138	January 30, 2024
Download genome into Galaxy usegalaxy.org support custom-genome , data-manager , reference-annotation , reference-genome , datacashe	9	3587	May 12, 2021
RNA Star: Can I generate a temporary index with files from previous assemblies? reference-annotation , reference-genome	2	135	May 13, 2024
How to get the genome of interest listed in RNA STAR in Galaxy reference-index , custom-genome , transcriptomics , rna_star	4	401	July 19, 2023

Do I need to convert genomic.fna.gz file to fasta for custom genomes, if so, how?

Related topics