GTF2GeneList finds features but stores 0 bytes in annotation.txt

I downloaded the rat gtf file from here: ftp://ftp.ensembl.org/pub/release-84/gtf/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.84.gtf.gz

When I attempt to use GTF2GeneList to extract the information and create an annotation table, the program appears to work (turns green when done). A .tsv annotation table is generated as expected, but the annotation-matched sequences is not saved properly (0 bytes of data - see below). As a result, Tximport will not work on this data (forever paused). Do you have suggestions on what I am doing wrong? Thanks!
image

@jennaj @admins any ideas?

Hi @Laura_Harris

Would you please create a history with just that failure, then share it back? Maybe there is a problem with some setting/option or an actual bug with the tool itself.

And … one troubleshooting tip – when uploading GTF data, leave all the options at default so that the file is uncompressed with the datatype gft detected and assigned when the data is added to a history. Setting the datatype manually can introduce a few problems. Maybe try loading the original file that way when creating the history that you are sharing. Plus, running the tool one more time with a freshly loaded file in isolation would eliminate any temporary reasons for the failure.

I hope this link works for what you needed @jennaj Thanks!

Showing that using the compressed version produces the same result: Galaxy

1 Like

Thanks for sharing @Laura_Harris

Try removing the header lines – those confuse some tools since “strict” GTF format specification is technically without any headers. Some data providers will include them anyway as a way of including provenance information. And, some tools are smart enough to ignore them but not all. Much depends on what the original tool author decided and how much the Galaxy tool wrapper author could handle them outside of the core program (e.g. not always possible).

How-to: Working with GFF GFT GTF2 GFF3 reference annotation

Hopefully that works but let us know and can look closer.

Hi @Laura_Harris

Update: Nope, that was wrong too. The problem is that the annotation data is not enough for this specific output. It needs a second input as well. This is noted down in the tool form help section (I missed it too!).

Help

What it does

Given an Ensembl GTF file, it will extract all information on chromosomes, coordinates, and attributes provided at the specified feature level. Mitochondrial features can also be flagged. See GitHub - ebi-gene-expression-group/atlas-gene-annotation-manipulation: Scripts for processing gene (or exon/ transcript) annotation.

You can also supply a fasta-format file of sequences, which can be filtered by identifier to match annotation and/or used a source of information for transcripts un-annotated in the GTF. This can be useful for tools such as Alevin which need a transcript-to-gene mapping and a transcriptome file without any missing entries (with respect to annotation).

Let us know if that addresses the problem. Sorry for my confusion – we have SO many tools wrapped in Galaxy, and all are each ever so slightly different.

No luck with your suggestions, but I figured out DESeq2 uses Tximport and outputs normalized counts, so I used that to get around my problem. Not sure why DESeq2 works but Tximport itself does not, but I’ll take whatever help I can get. Thanks for trying!

1 Like