Tags

DICODE_EU's library 13 articles

 
 

Detecting the origin of text segments efficiently

  [CiTO]
In Proceedings of the 18th international conference on World wide web (2009), pp. 61-70, doi:10.1145/1526709.1526719
posted to crawler google by DICODE_EU  on 2010-08-05 12:55:11 ** along with 5 people and 1 group AlisonBabeu ChaTo MaineC markymaypo pool007 ARTFL

Abstract

In the origin detection problem an algorithm is given a set S of documents, ordered by creation time, and a query document D. It needs to output for every consecutive sequence of k alphanumeric terms in D the earliest document in $S$ in which the sequence appeared (if such a document exists). Algorithms for the origin detection problem can, for example, be used to detect the "origin" of text segments in D and thus to detect novel content in D. They ...

 

Dremel: Interactive Analysis of Web-Scale Datasets

  [CiTO]
In The 36th International Conference on Very Large Data Bases, Vol. 3 (September 2010)
posted to dremel google mapreduce by DICODE_EU on 2010-08-05 12:52:59 ** along with 1 person myui
 

IRLbot: scaling to 6 billion pages and beyond

  [CiTO]
In Proceedings of the 17th international conference on World Wide Web (2008), pp. 427-436, doi:10.1145/1367497.1367556

Abstract

This paper shares our experience in designing a web crawler that can download billions of pages using a single-server implementation and models its performance. We show that with the quadratically increasing complexity of verifying URL uniqueness, BFS crawl order, and fixed per-host rate-limiting, current crawling algorithms cannot effectively cope with the sheer volume of URLs generated in large crawls, highly-branching spam, legitimate multi-million-page blog sites, and infinite loops created by server-side scripts. We offer a set of techniques for dealing with ...

 

Dynamo: amazon's highly available key-value store

  [CiTO]
SIGOPS Oper. Syst. Rev. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, Vol. 41, No. 6. (2007), pp. 205-220, doi:10.1145/1294261.1294281
posted to cassandra dynamo by DICODE_EU  on 2010-08-05 12:48:51 ** along with 53 people and 3 groups ahgharaibeh akshayk AlbanG bwhite charoy chillyc conservatism cschris damaru dennis2008 dmeister ejones Filex fmc frdr gimmedonutnow HenryR icecube jahwa jmlon jorritschippers kawamoto krokicki kzk_mover ljcamargos los MaineC mdwelsh mliroz mogwaing momo54 mrkn msaeida muli myui newdawn nimis nlong oster parnell rfonseca rhc rstata santazhang Skizzler tatemura viktors wanghc wbf yingbo yoavtock yukikine yunluyang Cloud Computing Papers ecoo-ce SOSP

Abstract

Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way ...

 

Pregel: a system for large-scale graph processing

  [CiTO]
In Proceedings of the 28th ACM symposium on Principles of distributed computing (2009), pp. 6-6, doi:10.1145/1582716.1582723
posted to google graphs pregel by DICODE_EU  on 2010-08-05 12:47:29 ** along with 7 people and 1 group arsyed daniel51 jweslley kzk_mover livingthingdan MaineC vagoskar Cloud Computing Papers

Abstract

An abstract is not available. ...

 

Interpreting the data: Parallel analysis with Sawzall

  [CiTO]
Sci. Program., Vol. 13, No. 4. (October 2005), pp. 277-298
posted to google hive pig sawzall by DICODE_EU  on 2010-08-05 12:46:59 **/Average rating 3.0 along with 17 people and 1 group ajbattle alimeh dpandiar fgx3prak jenvor jweslley kawamoto kuenishi kzk_mover MaineC mdecauwer mfisk mliroz myui pedrobmarcos rhc yingbo Cloud Computing Papers

Abstract

Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and ...

 

The Google file system

  [CiTO]
SIGOPS Oper. Syst. Rev. In Proceedings of the nineteenth ACM symposium on Operating systems principles, Vol. 37, No. 5. (October 2003), pp. 29-43, doi:10.1145/945445.945450
posted to gfs google hadoop hdfs by DICODE_EU  on 2010-08-05 12:46:31 **/Average rating 4.5 along with 87 people and 6 groups aali beowulf Bonson bouvin chaff chillyc cite0000 conservatism cybrpunk damaru datom davbo davidleitner dcordeiro dmeister dpandiar eblood egh ejones electrum En_t_end epaulson fgeller Filex frdr gtsysqual gurmeet2 gustavobrand hgfernan ihaque imrchen JanHendric JeffreyPalmer jliegl jshudzina kaiser42 karhendana konstantinosangistalis kuenishi kzk_mover laurobeltrao ljcamargos m_brugger mafish MaineC markls mdwelsh mikeliddell mliroz mrt2k9 mzygmunt navylq neilc newdawn Nicolas_Torzec pedagand pedrobmarcos peterylh poga pprett qfzhang qiwangcs rfonseca rhc richakhandelwal rijo rmbrad ronnix sachingarg santazhang Scis0000002 scottmoody sdw135 skirgizov ssn sunyibo tader tatemura urvoy verma7 vizee wanggy wcgnudt xyll yarapavan yingbo zlandau Cloud Computing Papers dopsy large-scale-ml semantic-multicast-routing Semantic-Social-Networks TELEPAT-ES

Abstract

An abstract is not available. ...

 

Bigtable: A Distributed Storage System for Structured Data

  [CiTO]
ACM Trans. Comput. Syst., Vol. 26, No. 2. (June 2008), pp. 1-26, doi:10.1145/1365815.1365816
posted to bigtable cassandra google hbase hypertable by DICODE_EU  on 2010-08-05 12:45:39 ** along with 36 people and 3 groups agebhar1 AlbanG ariels cartagema chaff conservatism Continuum cschris dmeister dpandiar En_t_end erain germoglio jmlon jonathanbp jordideu kawamoto kristal krzstefaniak linekin lipyeow m_brugger malawski matteodellamico maximiliense mertnuhoglu micsik mliroz myui nes1983 qiwangcs rhc ubi wangxinxi wbf wolfg ACGT2010_UpcomingChallengesStorageAnalysisGenomicsData Cloud Computing Papers DICODE

Abstract

Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, ...

 

The Chubby lock service for loosely-coupled distributed systems

  [CiTO]
In Proceedings of the 7th symposium on Operating systems design and implementation (2006), pp. 335-350
posted to chubby google zookeeper by DICODE_EU  on 2010-08-05 12:45:02 ** along with 6 people cschris dpandiar eddymier jenvor qiwangcs swm

Abstract

We describe our experiences with the Chubby lock service, which is intended to provide coarse-grained locking as well as reliable (though low-volume) storage for a loosely-coupled distributed system. Chubby provides an interface much like a distributed file system with advisory locks, but the design emphasis is on availability and reliability, as opposed to high performance. Many instances of the service have been used for over a year, with several of them each handling a few tens of thousands of clients concurrently. ...

 

Pig latin: a not-so-foreign language for data processing

  [CiTO]
In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (2008), pp. 1099-1110, doi:10.1145/1376616.1376726
posted to hadoop pig sawzall by DICODE_EU  on 2010-08-05 12:43:12 **/Average rating 3.0 along with 26 people and 1 group agulli ajbattle akshayk AlbanG alimeh conservatism digicore dmeister jenvor jliegl jweslley kzk_mover MaineC maropu mfisk mliroz mogwaing msalloum myui pedrobmarcos rhc tatemura timodonnell vasiakalavri zhaomin zmarty Cloud Computing Papers

Abstract

There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively expensive at this scale. Besides, many of the people who analyze this data are entrenched procedural programmers, who find the declarative, SQL style to be unnatural. The success of the more procedural map-reduce programming model, and its ...

 

MapReduce: simplified data processing on large clusters

  [CiTO]
In Proceedings of the 6th conference on Symposium on Opearting Systems Design \& Implementation - Volume 6 (2004), pp. 10-10
posted to google hadoop mapreduce by DICODE_EU  on 2010-08-05 12:42:48 **/Average rating 5.0 along with 19 people and 2 groups APRegier BigPeteB conservatism dmeister eddymier gingi jaychoo karhendana Kuvik MaineC milanbok mscscpp nakee peterylh saymen suarezadrian vizee wangxinxi yeminjiao DICODE ParComp

Abstract

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. ...

 

Map-Reduce for Machine Learning on Multicore

  [CiTO]
In NIPS (2006), pp. 281-288
posted to machinelearning mapreduce by DICODE_EU  on 2010-08-05 12:42:35 **/Average rating 3.0 along with 50 people and 4 groups agulli amatos Amos_G bhaddow c3r cafajar coleslaw conservatism daniel51 davbo dmeister donade eddymier euclid geraldoasarmentont gycheng jahwa jenvor Jingbo jjrodriguez joshdsullivan jweslley karelvdv klatifch kshameer kzk_mover mikeliddell milanbok mpotamias myui Niederb nliu82 nlong nojhan pedrobmarcos pprett pwais rhc rueycheng Scis0000002 sdvillal takeha-e wangxinxi wentrue yeminjiao zakk0610 zhangsi zhaomin zzb3886 zzztimbo Bioinformatics Cloud Computing Papers large-scale-ml ParComp
 

Hive: a warehousing solution over a map-reduce framework

  [CiTO]
Proc. VLDB Endow., Vol. 2, No. 2. (August 2009), pp. 1626-1629
posted to hive pig sawzall by DICODE_EU  on 2010-08-05 12:42:21 **/Average rating 3.0 along with 15 people and 1 group alimeh armstrongmsg azurezyq cdelcite conservatism digicore lipyeow mdecauwer mliroz mogwaing pedrobmarcos rhc socwangnan vasiakalavri zmarty Cloud Computing Papers

Abstract

The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop [3] is a popular open-source map-reduce implementation which is being used as an alternative to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse. ...

Note: You may cite this page as: http://www.citeulike.org/user/DICODE_EU

Create CiTO

Create a CiTO relationship by dragging the [CiTO] link onto another article.

Alternatively, drag two articles into the two boxes below. This is useful when the two articles are not on the same page - the articles will be remembered between pages.

This article...

...this one

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.