I was trying to detect some fusions on galaxy using a public GEO dataset FASTQ files to train on but then I faced an obstacle that everytime I run Arriba it gives me an error saying “This job was terminated because it used more memory than it was allocated”
I used RNA STAR as my aligner, using GENCODE FASTA file as my reference genome and GENCODE GTF as my annotation file. I used these two as well for Arriba.
STAR command on galaxy:
gunzip -c ‘/jetstream2/scratch/main/jobs/68966065/inputs/dataset_2eb47a98-5fc8-4320-89a9-75bdae642e92.dat’ > refgenome.fa && mkdir -p tempstargenomedir && STAR --runMode genomeGenerate --genomeDir ‘tempstargenomedir’ --genomeFastaFiles refgenome.fa --sjdbOverhang ‘100’ --sjdbGTFfile ‘/jetstream2/scratch/main/jobs/68966065/inputs/dataset_6af21bbd-87c1-4ca3-8940-08daff76b9eb.dat’ --sjdbGTFfeatureExon ‘exon’ --genomeSAindexNbases 12 --runThreadN ${GALAXY_SLOTS:-4} --limitGenomeGenerateRAM $((${GALAXY_MEMORY_MB:-31000} * 1000000)) && STAR --runThreadN ${GALAXY_SLOTS:-4} --genomeLoad NoSharedMemory --genomeDir tempstargenomedir --readFilesIn ‘/jetstream2/scratch/main/jobs/68966065/inputs/dataset_7bbd3e7b-773a-4c2b-8de6-d4a2c13c3322.dat’ ‘/jetstream2/scratch/main/jobs/68966065/inputs/dataset_52a6a41c-8d00-47b4-948f-4977ce20c733.dat’ --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --twopassMode None --quantMode - --outSAMattrIHstart 1 --outSAMattributes NH HI AS nM ch --outSAMprimaryFlag OneBestScore --outSAMmapqUnique 50 --outSAMunmapped Within --outFilterType Normal --outFilterMultimapScoreRange 1 --outFilterMultimapNmax 50 --outFilterMismatchNmax 10 --outFilterMismatchNoverLmax 0.3 --outFilterMismatchNoverReadLmax 1.0 --outFilterScoreMin 0 --outFilterScoreMinOverLread 0.66 --outFilterMatchNmin 0 --outFilterMatchNminOverLread 0.66 --outSAMmultNmax -1 --outSAMtlen 1 --seedSearchStartLmax 50 --seedSearchStartLmaxOverLread 1.0 --seedSearchLmax 0 --seedMultimapNmax 10000 --seedPerReadNmax 1000 --seedPerWindowNmax 50 --seedNoneLociPerWindow 10 --alignIntronMin 21 --alignIntronMax 0 --alignMatesGapMax 0 --alignSJoverhangMin 5 --alignSJstitchMismatchNmax 0 -1 0 0 --alignSJDBoverhangMin 5 --alignSplicedMateMapLmin 0 --alignSplicedMateMapLminOverLmate 0.66 --alignWindowsPerReadNmax 10000 --alignTranscriptsPerWindowNmax 100 --alignTranscriptsPerReadNmax 10000 --alignEndsType Local --peOverlapNbasesMin 0 --peOverlapMMp 0.01 --chimSegmentMin 5 --chimScoreMin 0 --chimScoreDropMax 200 --chimScoreSeparation 5 --chimScoreJunctionNonGTAG -1 --chimSegmentReadGapMax 0 --chimFilter banGenomicN --chimJunctionOverhangMin 5 --chimMainSegmentMultNmax 10 --chimMultimapNmax 0 --chimMultimapScoreRange 1 --limitOutSJoneRead 1000 --limitOutSJcollapsed 1000000 --limitSjdbInsertNsj 1000000 --outBAMsortingThreadN ${GALAXY_SLOTS:-4} --outBAMsortingBinsN 50 --winAnchorMultimapNmax 50 --limitBAMsortRAM $((${GALAXY_MEMORY_MB:-0}*1000000)) --chimOutType WithinBAM && samtools view -b -o ‘/jetstream2/scratch/main/jobs/68966065/outputs/dataset_33b1e4ef-3de9-4352-87f3-981a3da0bb8b.dat’ Aligned.sortedByCoord.out.bam
Arriba command on galaxy:
ln -sf ‘/corral4/main/objects/9/6/b/dataset_96b1bdc6-dedc-47a3-9529-9ec40f5fc78f.dat’ genome.fa && ln -sf ‘/corral4/main/objects/6/a/f/dataset_6af21bbd-87c1-4ca3-8940-08daff76b9eb.dat’ genome.gtf && arriba -x ‘/corral4/main/objects/3/3/b/dataset_33b1e4ef-3de9-4352-87f3-981a3da0bb8b.dat’ -a ‘genome.fa’ -g ‘genome.gtf’ -f ‘blacklist’ -o fusions.tsv -O fusions.discarded.tsv && samtools sort -@ ${GALAXY_SLOTS:-1} -m 4G -T tmp -O bam ‘/corral4/main/objects/3/3/b/dataset_33b1e4ef-3de9-4352-87f3-981a3da0bb8b.dat’ > Aligned.sortedByCoord.out.bam && samtools index Aligned.sortedByCoord.out.bam && convert_fusions_to_vcf.sh ‘genome.fa’ fusions.tsv fusions.vcf && mkdir fusion_bams && extract_fusion-supporting_alignments.sh fusions.tsv Aligned.sortedByCoord.out.bam ‘fusion_bams/fusion’ && draw_fusions.R --fusions=‘fusions.tsv’ --alignments=‘Aligned.sortedByCoord.out.bam’ --annotation=‘/corral4/main/objects/6/a/f/dataset_6af21bbd-87c1-4ca3-8940-08daff76b9eb.dat’ --output=fusions.pdf --transcriptSelection=provided
Is there any solution to allow Arriba to detect fusions? and Is trimming my annotation file and FASTA file to detect fusions only an efficient way to use less memory and consume less time?