A Penalized Nonparametric Maximum Likelihood Approach to Species Richness Estimation
We propose a class of penalized nonparametric maximum likelihood estimators (NPMLEs) for the species richness problem. We use a penalty term on the likelihood because likelihood estimators that lack it have an extreme instability problem. The estimators are constructed using a conditional likelihood that is simpler than the full likelihood. We show that the full-likelihood NPMLE solution given by Norris and Pollock can be found (with great accuracy) by using an appropriate penalty term on the conditional likelihood, so it is an element of our class of estimators. A simple and fast algorithm for the penalized NPMLE is developed; it can be used to greatly speed up computation of the unconditional NPMLE. It can also be used to find profile mixture likelihoods. Based on our goal of attaining high stability while retaining sensitivity, we propose an adaptive quadratic penalty function. A systematic simulation study, using a wide range of scenarios, establishes the success of this method relative to its competitors. Finally, we discuss an application in the gene number estimation using expressed sequence tag (EST) data from genomics.