I am analysing the RNAseq data of the mosquitoe Aedes aegypti. I need to map my reads using the reference genome AaegL5.0_GCA_002204515.1.
In the RNA STAR tab, there is no A.aegypti ref genome and it was mentioned below “If your genome of interest is not listed, contact the Galaxy team (–genomeDir)”. I couldn’t find a way to do it.
any advice? how can I solve this? Is there a way to place the ref genome in here.
The UseGalaxy.eu may have this genome indexed already, so check there first.
Or, the fastest way to incorporate this reference genome at the AU or ORG servers is to use it as a custom reference genome. That can be promoted to a custom build to create a custom database “metadata” key that can be assigned to datasets as needed. The input is a simple fasta file. Make sure to get the matching reference annotation at the same time to avoid problems later on.
This prior Q&A has links to the “how-to”. You can click on the tags to find more examples in different contexts. Custom genome + custom build: How to use a genome that is not natively indexed at the server you are working at - #2 by jennaj
The AU server admins can also comment, but as far as I know, new reference genomes will be indexed sometime later this year for all three UseGalaxy.* servers. ping @igor
as @jennaj suggested, use a custom genome. Upload the genome sequence in fasta or fasta.gz format and during alignment step change source of genome to “from history”. Mapping tools such as HiSAT2 and RNA_STAR will index the reference genome in background and map reads.
Thank you @jennaj and @igor. I checked AU and ORG servers but they didn’t contain the reference genome from Aedes aegypti.
So, first I treat the downloaded genome based on “How to use Custom Reference Genomes? tutorial”. Then I use it the normalized genome as a reference. right?
the tutorial is somewhat old. I believe ftp is not supported anymore. Very often the reference genomes come in “ready to use” form. I never used NormalizeFasta, but I guess it would not hurt. Make sure NormalizeFasta does not change the sequence names. The gene annotation file must use the same sequence names, as the reference genome. Try mapping and read counting steps on one sample first, to make sure it works for you. Upload the reference genome and gene annotations using URLs (paste URLs into Galaxy Upload menu >Paste/Fetch Data tab), examine the files using preview (eye icon) and try mapping with, say, HiSAT2 followed by read counting using featureCounts.