Generate gene to transcript map: WARNING output

The tool is specifically having trouble parsing the “>” fasta assembly title lines.

This is an example of the type of “>” title lines this particular tool is designed to parse: https://github.com/galaxyproject/tools-iuc/blob/master/tools/trinity/test-data/raw/Trinity.fasta

If you ran Trinity successfully in Galaxy the format will be Ok by default. If you ran Trinity someplace else and uploaded the result, compare your title lines (transcript IDs) with the example.

Assemblies generated by other methods would not be appropriate inputs for this tool. That said, it could still be possible to parse out transcript identifiers and genes using other tools (general Text Manipulation tools) – it depends on the content of your data.

Note: This tool wasn’t always available in Galaxy. This is prior Q&A from about a year ago when this specific tool’s functionality was recreated with a simple workflow. It could be used now as an example for custom parsing (IF your title lines have transcript nomenclature that encodes some type of transcript-to-gene relationship/grouping):

If you still need help after reviewing:

Post back 10-20 of your assembly “>” title lines for some help in figuring out how to parse the data or some help in determining if this type of parsing is even possible with your given data’s content. Please don’t include the sequences, just the “>” title lines – and enough lines that the data is representative of the whole.

The Select tool can be used to isolate title lines from large fasta datasets. Use the option “Matching” with the regular expression: ^>

The regular expression means:

  • ^ the start of a line
  • > a “greater than” symbol (how fasta title lines are designed)

Fasta format FAQ: Datatypes - Galaxy Community Hub

Small disclaimer: How Trinity formats title lines changed a bit between releases, so that may be where your problem is. But the help here should help even in that case – the workflow could definitely be tuned for any Trinity-based format.