RNA-STAR, hg38 GTF reference annotation, Cloudman/AWS options plus local Galaxy "Cloud Bursting" for memory intensive mapping

Wonderful! Very happy that solved the memory/space issues :rocket:

This was the “built-in index” already available in your cloud server, not a fasta from the history (custom genome), correct?

When using an existing built-in tool specific genome index with tools, any additional indexing a tool does during runtime is based on parameters AND the input content. This cannot be saved back for reuse, as far as I know, since the input content will be different for each mapping run.

For mapping tools, the original output BAM is also indexed (by Samtools “sort”) when creating the final coordinate sorted BAM result. This indexing definitely cannot be saved back as it is based on the output content.

Data Manager created indexes are already available on Cloudman and when using most public Galaxy servers for hg38, and can be created using DMs for a local/cloud Galaxy. And not just the baseline genome index, but other important indexes, including Samtools, Picard, 2bit, plus tool-specific indexes, if a genome you want to use is not already pre-indexed. This prior Q&A has much information: the best order to run DMs, links to resources, troubleshooting various issues that can come up, et cetera if you ever need to run DMs. Indexing reference genomes with Data Managers: Resources, tutorials, troubleshooting