Error message when using GTF2GeneList

Hi, I’m new to the Galaxy. I was trying to follow the tutorial using my own data, but I encountered the issue at the very beginning. I downloaded these two files from Ensemble" Mus_musculus.GRCm39.dna.toplevel.fa and Mus_musculus.GRCm39.111.chr.gff3. But when I used GTF2GeneList function following the guide in the tutorial, I received the warning message:

Fatal error: Exit code 1 ()
Warning message:
In .local(con, format, text, …) :
gff-version directive indicates version is 3, not 2
Error in eval(quote(list(…)), env) : object ‘first_field’ not found
Calls: die … cat → paste → standardGeneric → eval → eval → eval
Execution halted

Could anyone help me with this? Thank you

Hi @nahiznan
The error message indicates the file might be in GFF3 format. What datatype (format) do you see for this dataset in Galaxy? Click at name and check value next to ‘format’. Does it say GFF or GFF3 or GTF? If it is GTF (=GFF2) change it to GFF3 via Edit attributes (pencil icon).
Alternatively, upload to Galaxy GTF file, not GFF.
Kind regards,
Igor

Hello, @igor,
Thank you for your prompt reply. I checked the format, and it was GFF3. However, I noticed that in the tutorial, the fasta file was a cDNA file, and mine is a DNA fasta. Would it be a problem?
Should I use a different tool to generate the gene map?

Thank you!
Best regards,
nahiznan

I also downloaded the GTF file and ran GTF2GeneList with the same fasta file. It failed again but with a different warning message:

This job was terminated because it used more memory than it was allocated.

Hi @nahiznan
it seems the issue is in the data format. The tool expects GFF2 (=GTF). Maybe convert GFF3 file into GTF using gffread or any other appropriate tool. However, the tutorial requires a file with transcripts, not genomic DNA.

Out of memory error: amount of memory requested by the job exceeded the allocation. Admins can increase the memory allocation. What Galaxy server do you use? However, I suspect the out of memory error was caused by use of genome assembly instead of transcriptome (fasta file you mentioned in reply). The best source of mouse data is UCSC Genome Browser (download section). Try transcriptome file before talking to the server admins.

Kind regards,
Igor

1 Like

Hi @nahiznan

I just saw your bug report at UseGalaxy.org, and @igor 's guess is correct. You need to supply a reference transcriptome, not a reference genome.

This FAQ helps to explain the difference, and how any tool matches up common identifiers between input reference files (and not just for DE analysis) → FAQ: Extended Help for Differential Expression Analysis Tools

You can get all the data from UCSC for this one.

Oh, I see. Thank you, @igor and @jennaj!
And thanks for sharing the link to FAQ. It’s helpful!

1 Like