Sequence-based source tracking of Escherichia coli based on genetic diversity of beta-glucuronidase.
High levels of fecal bacteria are a concern for recreational waters; however, the source of contamination is often unknown. This study investigated whether direct sequencing of a bacterial gene could be utilized for detecting genetic differences between bacterial strains for microbial source tracking. A 525-nucleotide segment of the gene for beta-glucuronidase (uidA) was sequenced in 941 Escherichia coli isolates from the Clinton River-Lake St. Clair watershed, 182 E. coli isolates from human and animal feces, and 34 E. coli isolates from a combined sewer. Environmental isolates exhibited 114 alleles in 11 groups on a genetic tree. Frequency of strains from different genetic groups differed significantly (p < 0.03) between upstream reaches (Bear Creek-Red Run), downstream reaches, and Lake St. Clair beaches. Fecal E. coli uidA sequences exhibited 81 alleles that overlapped with the environmental set. An algorithm to assign alleles to different host sources averaged approximately 75% correct classification with the fecal data set. Using the same algorithm, the percent of environmental isolates assignable to humans decreased significantly between Bear Creek-Red Run (30 +/- 3%) and the beaches (17 +/- 2%) (p < 0.05). Birds accounted for approximately 50% of assignable environmental isolates. For combined sewer isolates, the same algorithm assigned 51% to humans. These experiments demonstrate differences in the frequency of different E. coli strains at different locations in a watershed, and provide a "proof in principle" that sequence-based data can be used for microbial source tracking.