I am trying to validate Star-fusion for detecting gene fusions in RNASeq data and seem to have some problems in implementing the package through the usegalaxy.org server. The results with very clean Tophat2 fusion validated reads were nothing like they were supposed to be. I posted a query on a Star fusion bulletin board and got a response from the folks at Broad institute who developed the package. They looked at what I did and got the correct answers. Something is different in the program implementation and I would like to know how to do it correctly
Here is what I did:
Downloaded the CTAT file GRCh37_v19_CTAT_lib_Feb092018.source_data.tar.gz at Index of /Trinity/CTAT_RESOURCE_LIB and undbundled the individual files.
Uploaded to the Galaxy server http://usegalaxy.org
ref_annot.cdna.allvsall.outfmt6.toGenes.sorted
ref_annot.gtf
ref_genome.fa
Uploaded from NCBI by fastq-dump
SRR6796340 Anaplastic Large T-Cell Lymphoma Expect NPM1-ALK chr5->chr2
SRR6796341 Anaplastic Large T-Cell Lymphoma- technical replicate
SRR6796358 Diffuse Histiocytic Lymphoma Expect NPM1-ALK chr5->chr2
SRR6796374 Large Cell Immunoblastic Lymphoma Expect NPM1-ALK chr5->chr2
SRR6796384 Bladder Transitional Cell Carcinoma Expect FGFR3-TACC3 chr4->chr4 high counts
SRR6796352 Chronic Myeloid Leukemia Expect BCR-ABL1 chr9->chr22
I ran RNA Star first then Star-fusion or Star-fusion in control of the RNA Star aligner program. The run parameters are below.
RNA STAR | |
---|---|
Dataset Information | |
Number: | 87 |
Name: | RNA STAR on data 51, data 53, and data 73: chimeric junctions |
Created: | Sat Dec 29 19:01:27 2018 (UTC) |
Filesize: | 4.9 GB |
Dbkey: | ? |
Format: | interval |
Job Information | |
Galaxy Tool ID: | toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.6.0b-1 |
Galaxy Tool Version: | 2.6.0b-1 |
Tool Version: | |
Tool Standard Output: | stdout |
Tool Standard Error: | stderr |
Tool Exit Code: | 0 |
History Content API ID: | bbd44e69cb8906b5b3d043534aebc71b |
Job API ID: | bbd44e69cb8906b5f73bf7040b268a06 |
History API ID: | de36650321118931 |
UUID: | 8fe466e2-9dee-48ca-91b4-b395c6757088 |
Tool Parameters | |
Input Parameter | Value |
Single-end or paired-end reads | paired |
RNA-Seq FASTQ/FASTA file, forward reads | 73: SRR6796352 (fastq-dump) |
RNA-Seq FASTQ/FASTA file, reverse reads | 73: SRR6796352 (fastq-dump) |
Custom or built-in reference genome | history |
Select a reference genome | 53: ref_genome.fa |
Gene model (gff3,gtf) file for splice junctions | 51: ref_annot.gtf |
Length of the genomic sequence around annotated junctions | 100 |
Count number of reads per gene | TRUE |
Would you like to set output parameters (formatting and filtering)? | no |
Other parameters (seed, alignment, limits and chimeric alignment) | star_fusion |
Job Resource Parameters | no |
Inheritance Chain | |
STAR-Fusion | |
Dataset Information | |
Number: | 91 |
Name: | STAR-Fusion on data 48, data 51, and others: fusion_candidates.final |
Created: | Sat Dec 29 21:47:32 2018 (UTC) |
Filesize: | 17.9 KB |
Dbkey: | ? |
Format: | tabular |
Job Information | |
Galaxy Tool ID: | toolshed.g2.bx.psu.edu/repos/iuc/star_fusion/star_fusion/0.5.4-3 |
Galaxy Tool Version: | 0.5.4-3 |
Tool Version: | software version: STAR-Fusion_v0.5.4 |
Tool Standard Output: | stdout |
Tool Standard Error: | stderr |
Tool Exit Code: | 0 |
History Content API ID: | bbd44e69cb8906b569580af0b3b0f6c0 |
Job API ID: | bbd44e69cb8906b539bb05ac571e27ea |
History API ID: | de36650321118931 |
UUID: | 8426b56c-6591-4871-94eb-1e7f258e39ad |
Tool Parameters | |
Input Parameter | Value |
Use output from earlier STAR run or let STAR Fusion control running STAR | use_chimeric |
Chimeric junction file from STAR (with STAR-Fusion settings) | 87: RNA STAR on data 51, data 53, and data 73: chimeric junctions |
Source for sequence to search | history |
Select the reference genome (FASTA file) | 53: ref_genome.fa |
Gene model (gff3,gtf) file for splice junctions and fusion gene detection | 51: ref_annot.gtf |
Result of BLAST±blastn of the reference fasta sequence with itself | 48: ref_annot.cdna.allvsall.outfmt6.toGenes.sorted |
Settings to use | default |
Job Resource Parameters | no |
Either way the results were the same. None of the fusions listed were the correct ones and at most one fusion partner on the correct chromosome distant from the expected break point. What mystifies is that the people at the Broad Institute got the expected results what one would expect. When running RNA Star for linking to fusion, the Star fusion instance on Galaxy on the server it generates 4 output files, including one with the chimeric fusion junctions which I input into the Star fusion run. Is this the correct one to use? I noticed that the Github Star fusion tutorial instructions indicate downloading a mutation resource and then using something out of this zip folder. Sounds like the versions of Star-fusions are differently set up. This might be close to where the error is being generated.
If you can figure out what I did wrong in setting this up I would greatly appreciate suggestions on how to fix it.