I need to map my results to the reference genome of Candida albicans SC5314 but I couldn’t find it. I’m trying to use RNA STAR. How can I upload the genome?
Upload the genome in FASTA/FASTA.GZ format into your Galaxy account (history) and during RNA_STAR job setup change the source of genome from built-in to “from history”.
Maybe consider HiSAT2. It is less memory intensive compared to RNA_STAR.
Make sure you get both the genome sequence and gene annotation from the same site, to avoid potential mismatch in contig/chromosome names.
with respect to above reply
i request to kindly elaborate/ guide for which data format to be used for
- input/query sequence file [fastq or fastqsanger or fastqsanger.gz]
- reference genome [either fasta or fasta.gz or faa.gz etc.]
- do we need to normalize reference genome fasta
kindly guide as i am getting error message for same either hisat2 or RNA STAR
- fastqsanger or fastqsangergz for most. If you upload reads with default settings, that is what the datatype will be.
- all of those are the same (nucleotide sequence), and datatype fasta (uncompressed) in Galaxy. If you also upload these with defaults, that will result.
- usually. see the help below for how to know. and yes, any target reference genome needs to be formatted in a way that tools can understand them.
We have many resources. I would suggest starting with these.
- Tutorial about basic mapping and related steps → Sequence analysis / Tutorial List
- Quick guide I just wrote about about using custom genomes. Has a few more details about human, but the basic format rules and FAQs linked into here apply for any genome, and I also just updated those FAQs so all is current! See → Reference genomes at public Galaxy servers: GRCh38/hg38 example
- Upload guide with link outs → Getting Data into Galaxy
Hope that helps!