Tophat error for paired data analysis (deprecated tool)

I am new user and i am doing tophat for paired data but it gives error wihle i execute it Please help me

error
An error occurred with this dataset:
Settings:
Output files: “genome.*.bt2”
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default

1 Like

Welcome, @Bhanu!

Error message:

Dataset Error

An error occured while running the tool toolshed.g2.bx.psu.edu/repos/devteam/tophat2/tophat2/2.1.1 .

Tool execution generated the following messages:

Fatal error: Tool execution failed Building a SMALL index [2019-04-19 00:07:39] Beginning TopHat run (v2.1.1) ----------------------------------------------- [2019-04-19 00:07:39] Checking for Bowtie Bowtie version: 2.2.8.0 [2019-04-19 00:07:39] Checking for Bowtie index files (genome)… [2019-04-19 00:07:39] Checking for reference FASTA file [2019-04-19 00:07:39] Generating SAM header for genome Error: cannot determine record type in input file /galaxy-repl/main/files/030/978/dataset_30978310.dat

Problems:

  • Tophat is a deprecated tool that does not accept compressed fastq inputs (datatype “fastqsanger.gz”). This is the root cause of the error. The successful Tophat jobs in your history used uncompressed fastq input(s) (datatype “fastqsanger”).
  • All Cuff*tools are also deprecated and may present with problems (buggy usage or server-side issue). This includes Cufflinks, Cuffmerge, Cuffdiff.
  • Deprecated tools will not be fixed. It is your choice to:
    • Modify your analysis/data to try to get successful jobs, understanding that problems may come up. This can include scientific result content issues that may not be obvious/result from putatively “successful” (green) jobs.
    • Switch to using current tools/methods. This tutorial is a good place to start: RNA-seq: Discovering and quantifying new transcripts - an in-depth transcriptome analysis example.

Uncompress the fastq reads

Needs to be done before using the deprecated tools. Not needed for other tools.

Click on the pencil icon edit-attributes per-dataset to reach the “Edit Attributes” functions. Under the tab “Convert” there will be a pull-down option to uncompress the data. A new uncompressed dataset will be added to your history for each convert run. These are what should be added to any dataset collections you want to use with tools that do not correctly interpret compressed fastq inputs (including Tophat). I see that you used the FastqGroomer tool to do this before, and that is an alternative (although “grooming” is not really needed for your data – it already has correctly scaled quality scores reported by FastQC).

FAQs:

Fix the custom genome format

I would also recommend fixing the formatting of your custom genome (dataset 1) before running more jobs with any tool. Specifically, the description line content on fasta title lines should be removed with the tool NormalizeFasta. Be sure to use the updated fasta version for all steps in an analysis, or expect errors/content issues.

FAQ:

Note

I was able to find your problematic job (as an administrator) but for next time, you’ll want to include the error message from a tool error in your post to allow for community feedback/support. This can be usually found by clicking on the “bug” icon bug-icon within the dataset.

You do not need to actually submit the bug to capture the error message or if you can figure out the problem using our support/help FAQ resources (also linked from the bug reporting form).

FAQs:

All Support FAQs: Galaxy Support - Galaxy Community Hub
All Tutorials: Learn & Teach Galaxy - Galaxy Community Hub

Hope that gives you some options!