An Improved Implementation of Effective Number of Codons (Nc)
The effective number of codons (Nc) is a widely used index for characterizing codon usage bias because it does not require a set of reference genes as does codon adaptation index (CAI) and because of the freely available computational tools such as CodonW. However, Nc, as originally formulated has many problems. For example, it can have values far greater than the number of sense codons; it treats a 6-fold compound codon family as a single-codon family although it is made of a 2-fold and a 4-fold codon family that can be under dramatically different selection for codon usage bias; the existing implementations do not handle all different genetic codes; it is often biased by codon families with a small number of codons. We developed a new Nc that has a number of advantages over the original Nc. Its maximum value equals the number of sense codons when all synonymous codons are used equally, and its minimum value equals the number of codon families when exactly one codon is used in each synonymous codon family. It handles all known genetic codes. It breaks the compound codon families (e.g., those involving amino acids coded by six synonymous codons) into 2-fold and 4-fold codon families. It reduces the effect of codon families with few codons by introducing pseudocount and weighted averages. The new Nc has significantly improved correlation with CAI than the original Nc from CodonW based on protein-coding genes from Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Escherichia coli, Bacillus subtilis, Micrococcus luteus, and Mycoplasma genitalium. It also correlates better with protein abundance data from the yeast than the original Nc.