Consistent over-estimation of gene number in complex plant genomes.
The first comprehensive comparison of gene content between higher plant species provided the unexpected conclusions that rice contained about twice as many genes as Arabidopsis, and that about half of the rice genes had no obvious homologs in any other organism. Our subsequent analyses indicate that most of these "extra, novel" rice genes are mis-annotated segments of transposable elements, especially retrotransposons. Aggressive annotation of a randomly selected subset of the rice genome suggests that the gene number is less than 40000. The five fantasies of automated plant gene discovery are described and a protocol is provided to minimize (or at least predict) the inaccuracy of future plant genome annotations.