Quantitative Metagenomic Analyses Based on Average Genome Size Normalization
Over the past quarter-century, microbiologists have used DNA sequence information to aid in the characterization of microbial communities. During the last decade, this has expanded from single genes to microbial community genomics, or metagenomics, in which the gene content of an environment can provide not just a census of the community members but direct information on metabolic capabilities and potential interactions among community members. Here we introduce a method for the quantitative characterization and comparison of microbial communities based on the normalization of metagenomic data by estimating average genome sizes. This normalization can relieve comparative biases introduced by differences in community structure, number of sequencing reads, and sequencing read lengths between different metagenomes. We demonstrate the utility of this approach by comparing metagenomes from two different marine sources using both conventional small-subunit (SSU) rRNA gene analyses and our quantitative method to calculate the proportion of genomes in each sample that are capable of a particular metabolic trait. With both environments, to determine what proportion of each community they make up and how differences in environment affect their abundances, we characterize three different types of autotrophic organisms: aerobic, photosynthetic carbon fixers (the Cyanobacteria); anaerobic, photosynthetic carbon fixers (the Chlorobi); and anaerobic, nonphotosynthetic carbon fixers (the Desulfobacteraceae). These analyses demonstrate how genome proportionality compares to SSU rRNA gene relative abundance and how factors such as average genome size and SSU rRNA gene copy number affect sampling probability and therefore both types of community analysis.