RNA Star: Can I generate a temporary index with files from previous assemblies?

Hi there!

I am working with human RNA-seq data from previous publications and I am trying to replicate the results, obtained through R in these publications, with galaxy.
Specifically, I am having problems with the alignment process.

I have previously been working with the GRCh38.87 genome annotation through RNA STAR to align my reads, and so far has worked well with the human GRCh38 built-in index in galaxy.

However, when I try to replicate the analyses uploading the GRCh38.79 using this URL: “ftp://ftp.ensembl.org/pub/release-79/gtf/homo_sapiens/”, and using the “Homo_sapiens.GRCh38.dna.primary_assembly.fa” file downloaded from this URL: “Index of /pub/release-111/fasta/homo_sapiens/dna” I find that even though RNA STAR runs the job, the output “.bam” file does not produce any data.

In fact said file states: “Could not display BAM file, error was:
file does not contain alignment data”

Has anyone else found themselves in this situation? Am I using the wrong gtf and fasta files for acquiring alignment files, or is it a problem with the generation of a new index?

Thank you very much in advance,

Coral

Hi @Coral

You can create a custom genome with any fasta that you want to. To avoid problems, make sure that the data are formatted Ok, that the annotation and fasta are based on the same genomic backbone, and don’t use a native database key unless your version of the reference data is an exact match.

FAQs that will add some context when you are troubleshooting why the BAM seems to be empty.

From a quick review of what you are stating – it sounds like you are mixing up data from an Ensembl and UCSC genome release – and that can lead to problems. The second FAQ above covers how to fix this with details. You can ask if you get stuck.

Thank you very much it.
The data format was one of the main problems that didn’t allow RNASTAR to run succesfully.

1 Like