star fusion validation failure

jobs

#1

I am trying to validate Star-fusion for detecting gene fusions in RNASeq data and seem to have some problems in implementing the package through the usegalaxy.org server. The results with very clean Tophat2 fusion validated reads were nothing like they were supposed to be. I posted a query on a Star fusion bulletin board and got a response from the folks at Broad institute who developed the package. They looked at what I did and got the correct answers. Something is different in the program implementation and I would like to know how to do it correctly

Here is what I did:
Downloaded the CTAT file GRCh37_v19_CTAT_lib_Feb092018.source_data.tar.gz at https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/ and undbundled the individual files.

Uploaded to the Galaxy server http://usegalaxy.org
ref_annot.cdna.allvsall.outfmt6.toGenes.sorted
ref_annot.gtf
ref_genome.fa

Uploaded from NCBI by fastq-dump
SRR6796340 Anaplastic Large T-Cell Lymphoma Expect NPM1-ALK chr5->chr2
SRR6796341 Anaplastic Large T-Cell Lymphoma- technical replicate
SRR6796358 Diffuse Histiocytic Lymphoma Expect NPM1-ALK chr5->chr2
SRR6796374 Large Cell Immunoblastic Lymphoma Expect NPM1-ALK chr5->chr2
SRR6796384 Bladder Transitional Cell Carcinoma Expect FGFR3-TACC3 chr4->chr4 high counts
SRR6796352 Chronic Myeloid Leukemia Expect BCR-ABL1 chr9->chr22

I ran RNA Star first then Star-fusion or Star-fusion in control of the RNA Star aligner program. The run parameters are below.

RNA STAR
Dataset Information
Number: 87
Name: RNA STAR on data 51, data 53, and data 73: chimeric junctions
Created: Sat Dec 29 19:01:27 2018 (UTC)
Filesize: 4.9 GB
Dbkey: ?
Format: interval
Job Information
Galaxy Tool ID: toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.6.0b-1
Galaxy Tool Version: 2.6.0b-1
Tool Version:
Tool Standard Output: stdout
Tool Standard Error: stderr
Tool Exit Code: 0
History Content API ID: bbd44e69cb8906b5b3d043534aebc71b
Job API ID: bbd44e69cb8906b5f73bf7040b268a06
History API ID: de36650321118931
UUID: 8fe466e2-9dee-48ca-91b4-b395c6757088
Tool Parameters
Input Parameter Value
Single-end or paired-end reads paired
RNA-Seq FASTQ/FASTA file, forward reads 73: SRR6796352 (fastq-dump)
RNA-Seq FASTQ/FASTA file, reverse reads 73: SRR6796352 (fastq-dump)
Custom or built-in reference genome history
Select a reference genome 53: ref_genome.fa
Gene model (gff3,gtf) file for splice junctions 51: ref_annot.gtf
Length of the genomic sequence around annotated junctions 100
Count number of reads per gene TRUE
Would you like to set output parameters (formatting and filtering)? no
Other parameters (seed, alignment, limits and chimeric alignment) star_fusion
Job Resource Parameters no
Inheritance Chain
STAR-Fusion
Dataset Information
Number: 91
Name: STAR-Fusion on data 48, data 51, and others: fusion_candidates.final
Created: Sat Dec 29 21:47:32 2018 (UTC)
Filesize: 17.9 KB
Dbkey: ?
Format: tabular
Job Information
Galaxy Tool ID: toolshed.g2.bx.psu.edu/repos/iuc/star_fusion/star_fusion/0.5.4-3
Galaxy Tool Version: 0.5.4-3
Tool Version: software version: STAR-Fusion_v0.5.4
Tool Standard Output: stdout
Tool Standard Error: stderr
Tool Exit Code: 0
History Content API ID: bbd44e69cb8906b569580af0b3b0f6c0
Job API ID: bbd44e69cb8906b539bb05ac571e27ea
History API ID: de36650321118931
UUID: 8426b56c-6591-4871-94eb-1e7f258e39ad
Tool Parameters
Input Parameter Value
Use output from earlier STAR run or let STAR Fusion control running STAR use_chimeric
Chimeric junction file from STAR (with STAR-Fusion settings) 87: RNA STAR on data 51, data 53, and data 73: chimeric junctions
Source for sequence to search history
Select the reference genome (FASTA file) 53: ref_genome.fa
Gene model (gff3,gtf) file for splice junctions and fusion gene detection 51: ref_annot.gtf
Result of BLASTĀ±blastn of the reference fasta sequence with itself 48: ref_annot.cdna.allvsall.outfmt6.toGenes.sorted
Settings to use default
Job Resource Parameters no

Either way the results were the same. None of the fusions listed were the correct ones and at most one fusion partner on the correct chromosome distant from the expected break point. What mystifies is that the people at the Broad Institute got the expected results what one would expect. When running RNA Star for linking to fusion, the Star fusion instance on Galaxy on the server it generates 4 output files, including one with the chimeric fusion junctions which I input into the Star fusion run. Is this the correct one to use? I noticed that the Github Star fusion tutorial instructions indicate downloading a mutation resource and then using something out of this zip folder. Sounds like the versions of Star-fusions are differently set up. This might be close to where the error is being generated.

If you can figure out what I did wrong in setting this up I would greatly appreciate suggestions on how to fix it.