<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   xmlns="http://purl.org/rss/1.0/"
   xmlns:dc="http://purl.org/dc/elements/1.1/"
   xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
   xmlns:dcterms="http://purl.org/dc/terms/"

>
<channel rdf:about="http://www.citeulike.org/about">
<pubDate>Sun, 27 Jul 2008 07:32:09 BST</pubDate>


	<title>CiteULike: pdlug's google</title>
	<description>CiteULike: pdlug's google</description>


	<link>http://www.citeulike.org/user/pdlug/tag/google</link>
	<dc:publisher>CiteULike.org</dc:publisher>
	<dc:language>en-gb</dc:language>
	<dc:rights>Copyright &#169; 2004-2008 citeulike.org</dc:rights>
	<items>
    <rdf:Seq>
        <rdf:li rdf:resource="http://www.citeulike.org/user/pdlug/article/2152671"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/pdlug/article/2719467"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/pdlug/article/936194"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/pdlug/article/983570"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/pdlug/article/430834"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/pdlug/article/591891"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/pdlug/article/409469"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/pdlug/article/416473"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/pdlug/article/90472"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/pdlug/article/251211"/>
        <rdf:li rdf:resource="http://www.citeulike.org/user/pdlug/article/227597"/>

	</rdf:Seq>
	</items>
	</channel>


<item rdf:about="http://www.citeulike.org/user/pdlug/article/2152671">
    <title>Google's MapReduce programming model -- Revisited</title>
    <link>http://www.citeulike.org/user/pdlug/article/2152671</link>
    <description>&lt;i&gt;Science of Computer Programming, Vol. 70, No. 1. (1 January 2008), pp. 1-30.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Google's MapReduce programming model serves for processing large data sets in a massively parallel manner. We deliver the first rigorous description of the model including its advancement as Google's domain-specific language Sawzall. To this end, we reverse-engineer the seminal papers on MapReduce and Sawzall, and we capture our findings as an executable specification. We also identify and resolve some obscurities in the informal presentation given in the seminal papers. We use typed functional programming (specifically Haskell) as a tool for design recovery and executable specification. Our development comprises three components: (i) the basic program skeleton that underlies MapReduce computations; (ii) the opportunities for parallelism in executing MapReduce computations; (iii) the fundamental characteristics of Sawzall's aggregators as an advancement of the MapReduce approach. Our development does not formalize the more implementational aspects of an actual, distributed execution of MapReduce computations.</description>
    <dc:title>Google's MapReduce programming model -- Revisited</dc:title>

    <dc:creator>Ralf Lammel</dc:creator>
    <dc:identifier>doi:10.1016/j.scico.2007.07.001</dc:identifier>
    <dc:source>Science of Computer Programming, Vol. 70, No. 1. (1 January 2008), pp. 1-30.</dc:source>
    <dc:date>2007-12-20T18:37:33-00:00</dc:date>
    <prism:publicationYear>2008</prism:publicationYear>
    <prism:publicationName>Science of Computer Programming</prism:publicationName>
    <prism:volume>70</prism:volume>
    <prism:number>1</prism:number>
    <prism:startingPage>1</prism:startingPage>
    <prism:endingPage>30</prism:endingPage>
    <prism:category>distributed</prism:category>
    <prism:category>google</prism:category>
    <prism:category>language</prism:category>
    <prism:category>languages</prism:category>
    <prism:category>mapreduce</prism:category>
    <prism:category>parallel</prism:category>
    <prism:category>programming</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/pdlug/article/2719467">
    <title>Google News Personalization: Scalable Online Collaborative Filtering</title>
    <link>http://www.citeulike.org/user/pdlug/article/2719467</link>
    <description>&lt;i&gt;(8 May 2007)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Several approaches to collaborative filtering have been studied but seldom have the studies been reported for large (several millions of users and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptible for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.</description>
    <dc:title>Google News Personalization: Scalable Online Collaborative Filtering</dc:title>

    <dc:creator>Abhinandan Das</dc:creator>
    <dc:creator>Mayur Datar</dc:creator>
    <dc:creator>Ashutosh Garg</dc:creator>
    <dc:creator>Shyam Rajaram</dc:creator>
    <dc:source>(8 May 2007)</dc:source>
    <dc:date>2008-04-25T21:00:08-00:00</dc:date>
    <prism:publicationYear>2007</prism:publicationYear>
    <prism:category>collaborative-filtering</prism:category>
    <prism:category>filtering</prism:category>
    <prism:category>google</prism:category>
    <prism:category>machine-learning</prism:category>
    <prism:category>mapreduce</prism:category>
    <prism:category>recommendation</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/pdlug/article/936194">
    <title>&#60;!-- CHANGE --&#62;Bigtable: A Distributed Storage System for Structured Data</title>
    <link>http://www.citeulike.org/user/pdlug/article/936194</link>
    <description>&lt;i&gt;OSDI '06, pp. 205-218.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.</description>
    <dc:title>&#60;!-- CHANGE --&#62;Bigtable: A Distributed Storage System for Structured Data</dc:title>

    <dc:source>OSDI '06, pp. 205-218.</dc:source>
    <dc:date>2006-11-08T12:02:38-00:00</dc:date>
    <prism:publicationName>OSDI '06</prism:publicationName>
    <prism:startingPage>205</prism:startingPage>
    <prism:endingPage>218</prism:endingPage>
    <prism:category>data</prism:category>
    <prism:category>database</prism:category>
    <prism:category>db</prism:category>
    <prism:category>distributed</prism:category>
    <prism:category>google</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/pdlug/article/983570">
    <title>Physics the google way</title>
    <link>http://www.citeulike.org/user/pdlug/article/983570</link>
    <description>&lt;i&gt;(21 Nov 2004)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Are we smarter now than Socrates was in his time? Society as a whole certainly enjoys a higher degree of education, but humans as a species probably don't get intrinsically smarter with time. Our knowledge base, however, continues to grow at an unprecedented rate, so how then do we keep up? The printing press was one of the earliest technological advances that expanded our memory and made possible our present intellectual capacity. We are now faced with a new technological advance of the same magnitude--the internet--but how do we use it effectively? A new tool is available on Google (&#60;a href=&#34;http://www.google.com&#34;&#62;this http URL&#60;/a&#62;) that allows a user not only to numerically evaluate equations, but to automatically perform unit analysis and conversion as well, with most of the fundamental physical constants built in.</description>
    <dc:title>Physics the google way</dc:title>

    <dc:creator>David Ward</dc:creator>
    <dc:source>(21 Nov 2004)</dc:source>
    <dc:date>2006-12-07T16:58:47-00:00</dc:date>
    <prism:publicationYear>2004</prism:publicationYear>
    <prism:category>education</prism:category>
    <prism:category>fun</prism:category>
    <prism:category>google</prism:category>
    <prism:category>phsics</prism:category>
    <prism:category>search</prism:category>
    <prism:category>web</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/pdlug/article/430834">
    <title>MapReduce: Simplified Data Processing on Large Clusters</title>
    <link>http://www.citeulike.org/user/pdlug/article/430834</link>
    <description>&lt;i&gt;OSDI '04, pp. 137-150.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a _map_ function that processes a key/value pair to generate a set of intermediate key/value pairs, and a _reduce_ function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. &#60;P&#62; Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter- machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. &#60;P&#62; Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day. &#60;P&#62;</description>
    <dc:title>MapReduce: Simplified Data Processing on Large Clusters</dc:title>

    <dc:creator>Jeffrey Dean</dc:creator>
    <dc:creator>Sanjay Ghemawat</dc:creator>
    <dc:source>OSDI '04, pp. 137-150.</dc:source>
    <dc:date>2005-12-08T17:08:27-00:00</dc:date>
    <prism:publicationName>OSDI '04</prism:publicationName>
    <prism:startingPage>137</prism:startingPage>
    <prism:endingPage>150</prism:endingPage>
    <prism:category>algorithm</prism:category>
    <prism:category>algorithms</prism:category>
    <prism:category>cmpsci</prism:category>
    <prism:category>cs</prism:category>
    <prism:category>distributed</prism:category>
    <prism:category>google</prism:category>
    <prism:category>mapreduce</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/pdlug/article/591891">
    <title>Finding Scientific Gems with Google</title>
    <link>http://www.citeulike.org/user/pdlug/article/591891</link>
    <description>&lt;i&gt;(18 Apr 2006)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;We apply the Google PageRank algorithm to assess the relative importance of all publications in the Physical Review family of journals from 1893--2003. While the Google number and the number of citations for each publication are positively correlated, outliers from this linear relation identify some exceptional papers or &#34;gems&#34; that are universally familiar to physicists.</description>
    <dc:title>Finding Scientific Gems with Google</dc:title>

    <dc:creator>P Chen</dc:creator>
    <dc:creator>H Xie</dc:creator>
    <dc:creator>S Maslov</dc:creator>
    <dc:creator>S Redner</dc:creator>
    <dc:source>(18 Apr 2006)</dc:source>
    <dc:date>2006-04-20T13:17:44-00:00</dc:date>
    <prism:publicationYear>2006</prism:publicationYear>
    <prism:category>academic</prism:category>
    <prism:category>citation</prism:category>
    <prism:category>citations</prism:category>
    <prism:category>google</prism:category>
    <prism:category>informationretrieval</prism:category>
    <prism:category>ir</prism:category>
    <prism:category>pagerank</prism:category>
    <prism:category>search</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/pdlug/article/409469">
    <title>The egalitarian effect of search engines</title>
    <link>http://www.citeulike.org/user/pdlug/article/409469</link>
    <description>&lt;i&gt;(1 Nov 2005)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Search engines have become key media for our scientific, economic, and social activities by enabling people to access information on the Web in spite of its size and complexity. On the down side, search engines bias the traffic of users according to their page-ranking strategies, and some have argued that they create a vicious cycle that amplifies the dominance of established and already popular sites. We show that, contrary to these prior claims and our own intuition, the use of search engines actually has an egalitarian effect. We reconcile theoretical arguments with empirical evidence showing that the combination of retrieval by search engines and search behavior by users mitigates the attraction of popular pages, directing more traffic toward less popular sites, even in comparison to what would be expected from users randomly surfing the Web.</description>
    <dc:title>The egalitarian effect of search engines</dc:title>

    <dc:creator>Santo Fortunato</dc:creator>
    <dc:creator>Alessandro Flammini</dc:creator>
    <dc:creator>Filippo Menczer</dc:creator>
    <dc:creator>Alessandro Vespignani</dc:creator>
    <dc:source>(1 Nov 2005)</dc:source>
    <dc:date>2005-11-27T04:40:53-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:category>academic</prism:category>
    <prism:category>google</prism:category>
    <prism:category>network</prism:category>
    <prism:category>networks</prism:category>
    <prism:category>paper</prism:category>
    <prism:category>search</prism:category>
    <prism:category>social</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/pdlug/article/416473">
    <title>Science in the web ageStart your engines</title>
    <link>http://www.citeulike.org/user/pdlug/article/416473</link>
    <description>&lt;i&gt;Nature, Vol. 438, No. 7068. (30 November 2005), pp. 554-555.&lt;/i&gt;</description>
    <dc:title>Science in the web ageStart your engines</dc:title>

    <dc:creator>Jim Giles</dc:creator>
    <dc:identifier>doi:10.1038/438554a</dc:identifier>
    <dc:source>Nature, Vol. 438, No. 7068. (30 November 2005), pp. 554-555.</dc:source>
    <dc:date>2005-11-30T19:43:01-00:00</dc:date>
    <prism:publicationYear>2005</prism:publicationYear>
    <prism:publicationName>Nature</prism:publicationName>
    <prism:issn>0028-0836</prism:issn>
    <prism:volume>438</prism:volume>
    <prism:number>7068</prism:number>
    <prism:startingPage>554</prism:startingPage>
    <prism:endingPage>555</prism:endingPage>
    <prism:publisher>Nature Publishing Group</prism:publisher>
    <prism:category>academic</prism:category>
    <prism:category>google</prism:category>
    <prism:category>nature</prism:category>
    <prism:category>publishing</prism:category>
    <prism:category>search</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/pdlug/article/90472">
    <title>MapReduce: Simplified Data Processing on Large Clusters</title>
    <link>http://www.citeulike.org/user/pdlug/article/90472</link>
    <description>&lt;i&gt;OSDI (2004)&lt;/i&gt;</description>
    <dc:title>MapReduce: Simplified Data Processing on Large Clusters</dc:title>

    <dc:creator>Jeffrey Dean</dc:creator>
    <dc:creator>Sanjay Ghemawat</dc:creator>
    <dc:source>OSDI (2004)</dc:source>
    <dc:date>2005-02-09T03:00:03-00:00</dc:date>
    <prism:publicationYear>2004</prism:publicationYear>
    <prism:publicationName>OSDI</prism:publicationName>
    <prism:category>distributed</prism:category>
    <prism:category>distributedcomputing</prism:category>
    <prism:category>google</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/pdlug/article/251211">
    <title>The Google File System</title>
    <link>http://www.citeulike.org/user/pdlug/article/251211</link>
    <description>&lt;i&gt;(October 2003)&lt;/i&gt;</description>
    <dc:title>The Google File System</dc:title>

    <dc:creator>Sanjay Ghemawat</dc:creator>
    <dc:creator>Howard Gobioff</dc:creator>
    <dc:creator>Shun-Tak Leung</dc:creator>
    <dc:source>(October 2003)</dc:source>
    <dc:date>2005-07-10T05:32:09-00:00</dc:date>
    <prism:publicationYear>2003</prism:publicationYear>
    <prism:category>distributed</prism:category>
    <prism:category>filesystem</prism:category>
    <prism:category>google</prism:category>
</item>



<item rdf:about="http://www.citeulike.org/user/pdlug/article/227597">
    <title>Interpreting the Data: Parallel Analysis with Sawzall (Draft)</title>
    <link>http://www.citeulike.org/user/pdlug/article/227597</link>
    <description>&lt;i&gt;Scientific Programming Journal&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on. We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design -- including the separation into two phases, the form of the programming language, and the properties of the aggregators -- exploits the parallelism inherent in having data and computation distributed across many machines.</description>
    <dc:title>Interpreting the Data: Parallel Analysis with Sawzall (Draft)</dc:title>

    <dc:creator>Rob Pike</dc:creator>
    <dc:creator>Sean Dorward</dc:creator>
    <dc:creator>Robert Griesemer</dc:creator>
    <dc:creator>Sean Quinlan</dc:creator>
    <dc:source>Scientific Programming Journal</dc:source>
    <dc:date>2005-06-14T12:59:31-00:00</dc:date>
    <prism:publicationName>Scientific Programming Journal</prism:publicationName>
    <prism:category>distributed</prism:category>
    <prism:category>google</prism:category>
    <prism:category>grid</prism:category>
    <prism:category>language</prism:category>
    <prism:category>mapreduce</prism:category>
</item>



</rdf:RDF>

