Switchgrass Genomic Diversity, Ploidy, and Evolution: Novel Insights from a Network-Based SNP Discovery Protocol
Switchgrass (Panicum virgatum L.) is a perennial grass that has been designated as an herbaceous model biofuel crop for the United States of America. To facilitate accelerated breeding programs of switchgrass, we developed both an association panel and linkage populations for genome-wide association study (GWAS) and genomic selection (GS). All of the 840 individuals were then genotyped using genotyping by sequencing (GBS), generating 350 GB of sequence in total. As a highly heterozygous polyploid (tetraploid and octoploid) species lacking a reference genome, switchgrass is highly intractable with earlier methodologies of single nucleotide polymorphism (SNP) discovery. To access the genetic diversity of species like switchgrass, we developed a SNP discovery pipeline based on a network approach called the Universal Network-Enabled Analysis Kit (UNEAK). Complexities that hinder single nucleotide polymorphism discovery, such as repeats, paralogs, and sequencing errors, are easily resolved with UNEAK. Here, 1.2 million putative SNPs were discovered in a diverse collection of primarily upland, northern-adapted switchgrass populations. Further analysis of this data set revealed the fundamentally diploid nature of tetraploid switchgrass. Taking advantage of the high conservation of genome structure between switchgrass and foxtail millet (Setaria italica (L.) P. Beauv.), two parent-specific, synteny-based, ultra high-density linkage maps containing a total of 88,217 SNPs were constructed. Also, our results showed clear patterns of isolation-by-distance and isolation-by-ploidy in natural populations of switchgrass. Phylogenetic analysis supported a general south-to-north migration path of switchgrass. In addition, this analysis suggested that upland tetraploid arose from upland octoploid. All together, this study provides unparalleled insights into the diversity, genomic complexity, population structure, phylogeny, phylogeography, ploidy, and evolutionary dynamics of switchgrass. Recent advances in sequencing technologies have enabled large-scale surveys of genetic diversity in model species with a wholly or partly sequenced reference genome. However, thousands of key species, which are essential for food, health, energy, and ecology, do not have reference genomes. To accelerate their breeding cycle via marker assisted selection, high-throughput genotyping is required for these valuable species, in spite of the absence of reference genomes. Based on genotyping by sequencing (GBS) technology, we developed a new single nucleotide polymorphism (SNP) discovery protocol, the Universal Network-Enabled Analysis Kit (UNEAK), which can be widely used in any species, regardless of genome complexity or the availability of a reference genome. Here we test this protocol on switchgrass, currently the prime energy crop species in the United States of America. In addition to the discovery of over a million SNPs and construction of high-density linkage maps, we provide novel insights into the genome complexity, ploidy, phylogeny, and evolution of switchgrass. This is only the beginning: we believe UNEAK offers the key to the exploration and exploitation of the genetic diversity of thousands of non-model species.