I have been trying to perform a scRNA sequence experiment with the Galaxy tutorial Generating a single cell matrix using Alevin and keep hanging up when I get to the SalmonKallistoMtxTo10x step. The failure happens both with the data set in the tutorial and with a smaller data set directly from 10x genomics. Details on data set and run parameters below. Any suggestion on how to get past this blockage would be helpful. The error output files are empty.
1k PBMCs from a Healthy Donor (v3 chemistry)
Single Cell Gene Expression Dataset by Cell Ranger 3.0.0
Peripheral blood mononuclear cells (PBMCs) from a healthy donor (the same cells were used to generate pbmc_1k_v2, pbmc_10k_v3). PBMCs are primary cells with relatively small amounts of RNA (~1pg RNA/cell).
• 1,222 cells detected
• Sequenced on Illumina NovaSeq with approximately 54,000 reads per cell
• 28bp read1 (16bp Chromium barcode and 12bp UMI), 91bp read2 (transcript), and 8bp I7 sample barcode
• run with --expect-cells=1000
GTF2GeneList extracts a complete annotation table or subsets thereof from an Ensembl GTF using rtracklayer (Galaxy Version 1.42.1+galaxy6)
Ensembl GTF file
Feature type for which to derive annotation
transcript
Field to place first in output table
transcript_id
Suppress header line in output?
Yes
transcript_id,gene_id
Append version to transcript identifiers?
Yes
Flag mitochondrial features?
No
Filter a FASTA-format cDNA file to match annotations?
Yes
Annotation field to match with sequences.
transcript_id
Rename galaxy-pencil the annotation table to Map
Rename galaxy-pencil the uncompressed filtered FASTA file to Filtered FASTA
Alevin Quantification and analysis of 3’ tagged-end single-cell sequencing data (Galaxy Version 1.3.0+galaxy2)
Tool Parameters
Input Parameter Value
Select a reference transcriptome from your history or use a built-in index? history
Transcripts fasta file 10 Filtered FASTA uncompressed (Hidden)
Kmer length 31
Perfect Hash False
Single or paired-end reads? paired
4 pbmc_1k_v3_S1_L001_R1_001.fastq.gz
5 pbmc_1k_v3_S1_L001_R2_001.fastq.gz
Relative orientation of reads within a pair Mates are oriented toward each other (I = inward)
Specify the strandedness of the reads read comes from the reverse strand (SR)
protocol 10x chromium v3 Single Cell protocol
Transcript to gene map file 9 Map
Retrieve all output files True
optional
Whitelist file
noDedup False
dumpBfh False
dumpFeatures True
dumpUmiGraph False
dumpMtx True
forceCells Not available.
expectCells Not available.
numCellBootstraps Not available.
minScoreFraction Not available.
keepCBFraction 1.0
lowRegionMinNumBarcodes Not available.
maxNumBarcodes Not available.
freqThreshold 3
SalmonKallistoMtxTo10x Transforms .mtx matrix and associated labels into a format compatible with tools expecting old-style 10X data (Galaxy Version 0.0.1+galaxy5)
Tool Parameters
Input Parameter Value
.mtx-format matrix 11 quants_mat.mtx
Tab-delimited genes file 13 quants_mat_cols.txt
Tab-delimited barcodes file 14 quants_mat_rows.txt
Prefix to prepend to cell names / barcodes Empty
This tool SalmonKallistoMtxTo10x needs some tool developer then administrative changes to work at UseGalaxy.org. I suspect Alevin has this problem as well (not confirmed yet).
Please try running this tutorial, or any workflows based on it, at the UseGalaxy.eu server instead for now.
If you need to move existing data between the servers, the how-to is in this FAQ.
Ps: @rituu-vermaa , Thanks for submitting the bug reports. I replied to some of those already re: mixed up inputs. That is a bit common with this tutorial since it repeats groups of steps with slightly different criteria a few times, which is arguably confusing but unavoidable when doing the work step-by-step and not with a workflow/collection yet. You could consider adding in dataset #tags to help keep track of the different runs (example screenshot in that tutorial with tags included). Later on when using a workflow and collections, tags or not, that will be less likely to happen, and is one reason why both of those functions are popular and worth learning about (search the GTN tutorials with keywords for help with those: “collection” and/or “workflow”).
Hope that helps, and apologies for the confusing tool trouble on top of a complicated tutorial!