CiteULike is a free online bibliography manager. Register and you can start organising your references online.

A Comparison of Approaches to Large-Scale Data Analysis Export

In SIGMOD’09 (29 June 2009)

Citation Format

[Posts]

View FullText article


X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

There is currently considerable enthusiasm around the MapReduce(MR) paradigm for large-scale data analysis [17]. Although thebasic control flow of this framework has existed in parallel SQLdatabase management systems (DBMS) for over 20 years, somehave called MR a dramatically new computing model [8, 17]. Inthis paper, we describe and compare both paradigms. Furthermore,we evaluate both kinds of systems in terms of performance and de-velopment complexity. To this end, we define a benchmark con-sisting of a collection of tasks that we have run on an open sourceversion of MR as well as on two parallel DBMSs. For each task,we measure each system’s performance for various degrees of par-allelism on a cluster of 100 nodes. Our results reveal some inter-esting trade-offs. Although the process to load data into and tunethe execution of parallel DBMSs took much longer than the MRsystem, the observed performance of these DBMSs was strikinglybetter. We speculate about the causes of the dramatic performancedifference and consider implementation concepts that future sys-tems should take from both kinds of architectures.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.