I’ve been trying to follow the available Galaxy tutorials for scRNA-seq analysis (https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/scrna-preprocessing/tutorial.html) and adapt to my dataset but have gotten stuck so I am hoping to receive a suggestion or two. I think one of my main issues is trying to reconcile what parts of the tutorial apply to my datasets (plate-based, SMART-Seq). The advice from my collaborator was to take the demultiplexed samples from him and use RNA Star then StringTie then Seurat. So I am pretty sure these tools should work and am trying to reconcile with Galaxy-specific advice in the tutorial.
Thus far I have taken one of my eights lanes of samples (49 paired reads from 49 cells from single biological sample, i.e. total 98 raw fq.gz files) from Import, Flatten Collection, RNA Star, Filter BAM datasets, to StringTie to get a collection of x98 gene abundance estimates. The subseqent Column Join to Seurat step worked once, but StringTie ran without a gtf file in that instance so didn’t give me any gene names. I have since added the gtf reference file [gencode.v34.annotation.gtf.gz] to the StringTie inputs and now the gene abundance are failing to merge using Column Join. Have I missed something in the tutorial directions? My understanding is that this is exactly the data set type that Column Join should work with, to give me a single table of gene abundance estimates for all samples in the lane? Or potentially have I gone off in the wrong direction because this all depends on using the UMI tools earlier in the tutorial which I didn’t think would work with my datasets? I thought that the gene abundance estimate outputs from the StringTie analysis were equivalent to this.
Any guidance would be much appreciated.