A breakdown of quality flaws in Wikipedia
The online encyclopedia Wikipedia is a successful example of the increasing popularity of user generated content on the Web. Despite its success, Wikipedia is often criticized for containing low-quality information, which is mainly attributed to its core policy of being open for editing by everyone. The identification of low-quality information is an important task since Wikipedia has become the primary source of knowledge for a huge number of people around the world. Previous research on quality assessment in Wikipedia either investigates only small samples of articles, or else focuses on single quality aspects, like accuracy or formality. This paper targets the investigation of quality flaws, and presents the first complete breakdown of Wikipedia's quality flaw structure. We conduct an extensive exploratory analysis, which reveals (1) the quality flaws that actually exist, (2) the distribution of flaws in Wikipedia, and (3) the extent of flawed content. An important finding is that more than one in four English Wikipedia articles contains at least one quality flaw, 70% of which concern article verifiability.