RNA Star: Can I generate a temporary index with files from previous assemblies?

Coral · April 26, 2024, 2:38pm

Hi there!

I am working with human RNA-seq data from previous publications and I am trying to replicate the results, obtained through R in these publications, with galaxy.
Specifically, I am having problems with the alignment process.

I have previously been working with the GRCh38.87 genome annotation through RNA STAR to align my reads, and so far has worked well with the human GRCh38 built-in index in galaxy.

However, when I try to replicate the analyses uploading the GRCh38.79 using this URL: “ftp://ftp.ensembl.org/pub/release-79/gtf/homo_sapiens/”, and using the “Homo_sapiens.GRCh38.dna.primary_assembly.fa” file downloaded from this URL: “Index of /pub/release-111/fasta/homo_sapiens/dna” I find that even though RNA STAR runs the job, the output “.bam” file does not produce any data.

In fact said file states: “Could not display BAM file, error was:
file does not contain alignment data”

Has anyone else found themselves in this situation? Am I using the wrong gtf and fasta files for acquiring alignment files, or is it a problem with the generation of a new index?

Thank you very much in advance,

Coral

jennaj · April 26, 2024, 10:16pm

Hi @Coral

You can create a custom genome with any fasta that you want to. To avoid problems, make sure that the data are formatted Ok, that the annotation and fasta are based on the same genomic backbone, and don’t use a native database key unless your version of the reference data is an exact match.

FAQs that will add some context when you are troubleshooting why the BAM seems to be empty.

FAQ: How to use Custom Reference Genomes?
Reference genomes at public Galaxy servers: GRCh38/hg38 example
Troubleshooting resources for errors or unexpected results << how to check your data files

From a quick review of what you are stating – it sounds like you are mixing up data from an Ensembl and UCSC genome release – and that can lead to problems. The second FAQ above covers how to fix this with details. You can ask if you get stuck.

Coral · May 13, 2024, 8:59am

Thank you very much it.
The data format was one of the main problems that didn’t allow RNASTAR to run succesfully.

Topic		Replies	Views
Genome index STAR transcriptomics , igv	1	37	August 13, 2024
STAR/HISAT2 aligning reads from RNA-seq fastq to intronic/unannotated regions usegalaxy.org support	3	1172	August 30, 2022
No options available (Select Reference Genome) server-admin , reference-index , galaxy-local , data-manager , transcriptomics , cvmfs , rna_star	3	1484	June 14, 2022
Bam file to fasta file - Genome assembly usegalaxy.org support genome , assembly	3	4725	February 6, 2019
RNAstar Select Reference Genome - Species Not Available - contact the Galaxy team (--genomeDir) usegalaxy.eu support custom-genome , mapping , reference-annotation , reference-genome , custom-build , reference-transcriptome	1	18	December 4, 2024

RNA Star: Can I generate a temporary index with files from previous assemblies?

Related topics