Streaming-Data Algorithms for High-Quality Clustering
As data gathering grows easier, and as researchers discover new ways to interpret data, streamingdata algorithms have become essential in many fields. Data stream computation precludes algorithms that require random access or large memory. In this paper, we consider the problem of clustering data streams, which is important in the analysis a variety of sources of data streams, such as routing data, telephone records, web documents, and clickstreams. We provide a new clustering algorithms with theoretical guarantees on its performance. We give empirical evidence of its superiority over the commonly-used k--Means algorithm. We then adapt our algorithm to be able to operate on data streams and experimentally demonstrate its superior performance in this context.