There is huge amount of textual data on the web. In this paper we show how a general purpose NLP tool can be used to grade linguistic quality of the texts gathered from the web. The described approach is of interest for “small languages”, such as Latvian, with very limited NLP tools available. Massively parallel grid computing has been used to parse a rather complete Latvian web archive.