GFFCompare issue?

Sandip_De · December 13, 2020, 4:42pm

Hi, I am comparing an assembled transcript file (GTF) with a reference annotation file (GFF) using the GFFCompare tool in Galaxy. In parameters, the only criteria am selecting is “discard ‘duplicate’ query transfrags within a single sample (-D)”. The transcript accuracy report is looking like this-

gffcompare v0.11.2 | Command line was:

#gffcompare -r ref_annotation -D -e 100 -d 100 -p TCONS gffread_on_data_171__gtf

#= Summary for dataset: gffread_on_data_171__gtf

Query mRNAs : 74051 in 18336 loci (60300 multi-exon transcripts)

(9152 multi-transcript loci, ~4.0 transcripts per locus)

Reference mRNAs : 46683 in 36960 loci (37011 multi-exon)

Super-loci w/ reference transcripts: 0

#-----------------| Sensitivity | Precision |
Base level: 0.0 | 0.0 |
Exon level: 0.0 | 0.0 |
Intron level: 0.0 | 0.0 |
Intron chain level: 0.0 | 0.0 |
Transcript level: 0.0 | 0.0 |
Locus level: 0.0 | 0.0 |

 Matching intron chains:       0
   Matching transcripts:       0
          Matching loci:       0

      Missed exons:  206513/206513	(100.0%)
       Novel exons:  204613/204613	(100.0%)
    Missed introns:  168155/168155	(100.0%)
     Novel introns:  115436/115436	(100.0%)
       Missed loci:   36960/36960	(100.0%)
        Novel loci:   18336/18336	(100.0%)

Total union super-loci across all input datasets: 18336
74051 out of 74051 consensus transcripts written in gffcmp.combined.gtf (0 discarded as redundant)

This is not making any sense to me. Why the precision and sensitivity are of 0 value. Thank you so much for your input.

Sandip_De · December 13, 2020, 9:58pm

Hi all, can anyone help me with this? I just taking an assembled GTF transcript file and comparing it to an annotated NCBI GFF file (reference file). And I am getting the above transcript report. What wrong am I doing?

When I am comparing the same NCBI GFF file as input with the annotated NCBI GFF file (reference file), as expected the sensitivity and precision is 100%.

Please help. Thanks…

gallardoalba · December 15, 2020, 11:01am

Hi @Sandip_De,
one of the causes may be that the version of the reference genome used to generate the assembly and the one corresponding to the reference annotation file are different. Could you please provide me with detailed information about how the gffread_on_data_171__gtf file was generated?

Regards

Sandip_De · December 15, 2020, 3:41pm

Hi gallardoalba, you are spot on. I have generated the GFF/GTF file from a new genome assembly and the reference annotation is from the previous genome assembly. My aim is to identify new isoforms/transcripts and genes from this new assembly.
Just wondering, is there a way to compare the new GTF from the new genome assembly with the old GTF from the old genome assembly? Thanks for your help.

gallardoalba · December 16, 2020, 8:04pm

Hi @Sandip_De, I guess that there is not a tool for that, but it would be possible to write a script. What is your goal for comparing both files?

Sandip_De · December 16, 2020, 8:28pm

Comparing two GFF files should identify new genes, transcripts, alternative transcription start, and end sites etc. I found a software developed for this purpose but not available in Galaxy- ParsEval- parallel comparison and analysis of gene structure annotations (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-187). Is there a way to add that to the Galaxy tool shed?

gallardoalba · December 16, 2020, 8:51pm

Yes, there is a tutorial. Let me know if you consider that you can integrate it. Otherwise I can do it for you.

Regards.

Sandip_De · December 16, 2020, 9:46pm

Can you please integrate the software into Galaxy? Honestly, I browsed through the tutorial link, and did not make much sense to me.

Thanks for your help.

Sandip_De · December 16, 2020, 9:51pm

I really appreciate your help.

gallardoalba · December 16, 2020, 9:57pm

Sure, I will notify you when the tool is available, probably in a couple of weeks.

Regards.