How to map a collection of individual samples against a custom reference genome with RNA STAR

Kai_Buechner · March 26, 2025, 9:39pm

Hi all,

I am a complete newby to RNAseq and so far, I followed along the " Reference-based RNA-Seq data analysis"-tutorial with my own data.
In my project, I try to find differentially expressed genes in five different mutants (sampled as biological triplicates). The organism is an unconventional yeast, so I imported the genome fasta and the gtf file from NCBI. When I run the tool, I get the error message pictured.
My two questions: 1) is it right, that the tool “wants” to only map one query sequence since I only feed it one reference genome?
2) If so, how do I a workflow to map all paired end reads of my different samples againt the same reference genome?
Thanks for your help and cheers,
Kai

igor · March 28, 2025, 3:54am

Hi @Kai_Buechner
Maybe try HiSAT2. It is similar to RNA_STAR but requires less memory. Test the approach on one sample first. I assume you have gzipped fastq files in history. You also need a genome assembly of the species you work with.
During HiSAT2 job setup change Source of the reference genome to From history and select the fasta file containing the reference genome. Change Is this single end or paired end to Paired-end and select files with F and R reads from one sample. Activate both options in Summary options section: the output is useful for visualisation with MultiQC. Click Run Tool. HiSAT2 will index the genome and map reads. Wait for completion of the job. Check the results including summary file. You can check alignment on IGV.

If you are happy with the results, use re-run option and replace FASTQ files (use files from another sample). You don’t have many samples, so it should not take too long.

Kind regards,
Igor

Kai_Buechner · March 28, 2025, 7:47am

Hi @igor
Thank you for your help! I’ll get to work on it.

Have a great day,
Kai

jennaj · March 31, 2025, 7:55pm

Hi @Kai_Buechner

As a test to make sure everything is working as expected, I started up a simple paired end collection mapping in this history.

https://usegalaxy.org/u/jen-galaxyproject/h/pair-end-collection-mapped

The other option is to map without the multi-sample collection as @igor is describing.

You original error message is odd! Mapping multiple samples against the same single reference genome should definitely be possible, all together, in the same run. If you would like to share back your history, we can troubleshoot any problems that my example and Igor’s help did not resolve.

Remember that RNA Star is very picky about reference data formats! So if you are supplying your own reference data, content issues can lead to all sort of odd error messages! We can usually sort those out here if we can see the example. How to share is in the banner of this forum. The entire context usually matters and the shared history link is the best way to communicate those details.

Please let us know if you solve this!

XRef

Kai_Buechner · April 6, 2025, 10:04am

Hi all,
first of all: thank you @igor and @jennaj for taking the time to tackle my problem. I found the solution and try to write it up as comprehensively as I can:
I used the NCBI Datasets Genomes tool from the “Get Data” menu to import the dataset. I did not realise that the genome was imported as folder nested within a folder; this is what threw RNA STAR off. When I downloaded the fasta.gz and uploaded it again, it worked very well.
The next issue I had was the .gff3 annotation file. It did not have the right annotation format so I first asked an LLM to write a short python program to unify the gene_id for all instances and then replaced the several different delimiters (“\t” “,” and “;”) uniformly with tab delimiters. That solved the problems I had, hopefully these solutions can help other newcomers in the same situation.
Have a great upcoming week, everyone!
Kai

Topic		Replies	Views
HISAT2 Error--Input library collection must match # inputs for reference genome? usegalaxy.org support tool-help , hisat2	2	29	November 20, 2024
STAR/HISAT2 aligning reads from RNA-seq fastq to intronic/unannotated regions usegalaxy.org support	3	1189	August 30, 2022
RNA STAR mapping for human genome hg38 usegalaxy.org support server-admin	0	479	February 28, 2020
RNA star with reference genome from history error usegalaxy.eu support custom-genome , transcriptomics , reference-annotation , reference-genome , custom-build , deseq2 , rna_star	1	312	August 7, 2023
STAR GTF file error for newbie usegalaxy.org support mapping , transcriptomics , reference-annotation , featurecounts	4	749	April 24, 2023

How to map a collection of individual samples against a custom reference genome with RNA STAR

Related topics