MolBioLib: A C++11 Framework for Rapid Development and Deployment of Bioinformatics Tasks
Motivation: We developed MolBioLib to address the need for adaptable next-generation sequencing analysis tools. The result is a compact, portable, and extensively tested C++11 software framework and set of applications tailored to the demands of next-generation sequencing data and applicable to many other applications. MolBioLib is designed to work with common file formats and data types used both in genomic analysis and general data analysis. A central relational-database-like Table class is a flexible and powerful object to intuitively represent and work with a wide variety of tabular datasets, ranging from alignment data to annotations. MolBioLib has been used to identify causative SNPs in whole genome sequencing, detect balanced chromosomal rearrangements, and compute enrichment of mRNAs on microtubules, typically requiring applications of under 200 lines of code. MolBioLib includes programs to perform a wide variety of analysis tasks such as computing read coverage, annotating genomic intervals, and novel peak calling with a wavelet algorithm. While MolBioLib was designed primarily for bioinformatics purposes, much of its functionality is applicable to a wide range of problems. Complete documentation and an extensive automated test suite are provided.