I’ve tried to align RNA-seq reads to a custom genome but I cannot see any alignment. Could it be that I directly uploaded genomic.fna.gz without decompressing and converting to FASTA format?
If so, how can I accomplish it? When I extract the file from the genomic.fna.gz file, I get an FNA file. Should it be additionally converted to FASTA format and how?
Are some other steps, like indexing, needed with Galaxy tools on the FASTA genome file for the following steps before alignment?
A custom genome fasta should be in uncompressed format. If the data is already loaded in compressed format, it can be uncompressed within Galaxy. Click on the “pencil” icon for the dataset to reach the Edit attributes forms. The second tab (convert) will list the option to uncompress. The result should be a new dataset with the datatype “fasta” assigned. Once done, you can purge the original compressed dataset to recover working space (quota).
A fasta file in your history can be selected on most tool forms as a target or reference genome. If needed, the custom genome can be promoted to a custom build, and that new database added as an attribute to datasets (some intermediate analysis tools interpret the “database” metadata). If you are incorporating reference annotation in your analysis, make sure the genome and annotation are a match.
Before mapping, please review and apply the additional formatting requirements in the FAQs below to avoid problems.
Thank you I was able to change it to FASTA format!
I have another concern though. Could you clarify what this means?: “Make sure the chromosome identifiers are a match for other inputs” Custom Genomes
What are other inputs? I basically have one single FASTA file which is the entire genome and RNA-Seq data which will be aligned to the custom genome (FASTA file).
The idea is to upload and check that all the inputs you plan to use in the same analysis will be a match before starting, otherwise you might need to start over from mapping again.
Other data are sometimes incorporated in steps downstream from mapping. Mapping will go fine, but then later steps will fail. Mismatched chromosome identifiers between the genome (fasta) and annotation (GTF) are one of the more common problems people run into. If you don’t plan to add in other reference data, then you just need to make sure the fasta is formatted correctly.