Alignment and counting of high number SMARTseq samples

leonvgurp · May 18, 2020, 1:33pm

Hi all,

I’m running into a problem of which I’m not sure what’s going wrong…

I downloaded single cell transcriptomics data into Galaxy directly from the SRA archive (project SRP150746; only the SMARTseq data, so excluding the 10x genomics multiplexed paired-ended data). This is a collection of 2100+ runs of single cells that were processed using the SMARTseq2 protocol, and were deposited as single ended reads (each run = 1 cell). In Galaxy, I transformed sra to single ended fastq, which resulted in a single collection of 2100+ fastq files. This collection was then mapped against reference genome Ensembl GRCm38.1, resulting in a collection of 2100+ bam files.

I am now trying to count these files using HTseq, which should result in a collection of 2100+ count tables that I could then download as a collection in one go, but this is not working very well. Each time I try, it takes forever to get all the jobs started, or results in a non-descriptive error message (job failed), or the process hangs (only results in a grey entry in my history with jobs pending), or produces high number of failed tasks.

Either way, I am unsure how to interpret this. Am I simply asking too much from galaxy by counting so many files? It seems unlikely. Am I setting this up wrong (details below)? Is there a better tool to use?

Any help would be greatly appreciated!

Job Information

Galaxy Tool ID: toolshed.g2.bx.psu.edu/repos/lparsons/htseq_count/htseq_count/0.9.1
Galaxy Tool Version: 0.9.1
Tool Version: None
Mode: Union
Stranded: Yes
Minimum alignment quality: 10
Feature type: exon
ID Attribute: gene_name
Set advanced options: simple

script provided by galaxy:

samtools sort -n --output-fmt=SAM -o ‘name_sorted_alignment.sam’ ‘/data/dnb02/galaxy_db/files/019/719/dataset_19719780.dat’ && htseq-count --mode=union --stranded=yes --minaqual=10 --type=‘exon’ --idattr=‘gene_name’ --order=name --format=sam ‘name_sorted_alignment.sam’ “/data/dnb02/galaxy_db/files/018/240/dataset_18240765.dat” | awk ‘{if ($1 ~ “no_feature|ambiguous|too_low_aQual|not_aligned|alignment_not_unique”) print $0 | “cat 1>&2”; else print $0}’ > ‘/data/dnb02/galaxy_db/job_working_directory/008/296/8296638/galaxy_dataset_21542278.dat’ 2> ‘/data/dnb02/galaxy_db/job_working_directory/008/296/8296638/galaxy_dataset_21542280.dat’

Topic		Replies	Views
How to find SAM/BAM to count matrix in Galaxy usegalaxy.org support troubleshooting	6	314	March 26, 2024
Smart-seq2 data analysis usegalaxy.org support mapping , tool-help , single-cell , rna-seq , rna_starsolo	2	642	November 2, 2023
Group HTseq counts	4	2227	June 6, 2019
scRNAseq Galaxy workflow workflow , single-cell	13	2395	December 11, 2020
Processsing in TrimGalore has frozen usegalaxy.org support queued-gray-datasets	3	117	April 30, 2024

Alignment and counting of high number SMARTseq samples

Job Information

Related topics