Hello, i would like to built my genome index to use STAR but i want to use a fasta with the cDNAs like this one “Mus_musculus.GRCm38.cdna.all.fa” so the option " --genomeFastaFiles /Mus_musculus.GRCm38.cdna.all.fa". I guess i do not need the option “–SjdbGTFile”. I want to do this because i want the reads that align to exons-genes to be align and recover the unmapped reads for further analyisis. Is this possible? what should i do? thank you very much
Welcome, @CARTAS_ESPINEL_IRENE
Are you running the tool in Galaxy?
If yes, and working at a public server, see this topic from earlier today for how to use the Custom Genome and Custom Build functions (any tool). Please see Bowtie2 brucella reference - #2 by jennaj.
If yes, and working on your own local server, we happened to have another topic today, and that answer will also apply to this use case. Please see STAR --genomeDir option - #2 by jennaj.
With that context, for your question here
Wherever you are running RNA STAR, if you want to filter reads based on matches (or non-matches) to exon genomic footprints
- when aligning to a reference genome, you will need to inform the tool about where those exons are located (the coordinates). The reference annotation is how you do that.
- when aligning to a reference transcriptome, then you have already filtered out the non-transcript regions.
You could test this out: align with the annotation and without, to the transcriptome versus genome. Then examine the BAMs and reference annotation in a genome browser like UCSC or IGV. Then review: where are the reads aligning? Are you including or excluding mapped reads as expected? Then tune parameters, and rerun, until you get the desired result.
Hope this helps!