Quality of Computationally Inferred Gene Ontology Annotations
Gene Ontology (GO) has established itself as the undisputed standard for protein function annotation. Most annotations are inferred electronically, i.e. without individual curator supervision, but they are widely considered unreliable. At the same time, we crucially depend on those automated annotations, as most newly sequenced genomes are non-model organisms. Here, we introduce a methodology to systematically and quantitatively evaluate electronic annotations. By exploiting changes in successive releases of the UniProt Gene Ontology Annotation database, we assessed the quality of electronic annotations in terms of specificity, reliability, and coverage. Overall, we not only found that electronic annotations have significantly improved in recent years, but also that their reliability now rivals that of annotations inferred by curators when they use evidence other than experiments from primary literature. This work provides the means to identify the subset of electronic annotations that can be relied upon—an important outcome given that >98% of all annotations are inferred without direct curation. In the UniProt Gene Ontology Annotation database, the largest repository of functional annotations, over 98% of all function annotations are inferred in silico, without curator oversight. Yet these “electronic GO annotations” are generally perceived as unreliable; they are disregarded in many studies. In this article, we introduce novel methodology to systematically evaluate the quality of electronic annotations. We then provide the first comprehensive assessment of the reliability of electronic GO annotations. Overall, we found that electronic annotations are more reliable than generally believed, to an extent that they are competitive with annotations inferred by curators when they use evidence other than experiments from primary literature. But we also report significant variations among inference methods, types of annotations, and organisms. This work provides guidance for Gene Ontology users and lays the foundations for improving computational approaches to GO function inference.