| |
Vol. 1, No. 57. (October 2010), pp. 69-74
Abstract
Wikipedia est devenue en quelques années un outil de diffusion d'informations scientifiques de premier plan, utilisé par élèves, professeurs et chercheurs (parmi ces derniers, un sur dix reconnaît y contribuer). Malgré un système d'édition relativement complexe, le " wiki de la connaissance " a gagné en popularité (15 millions d'articles, 270 langues, 100 000 contributeurs bénévoles). On en sait davantage sur son fonctionnement : son taux d'erreur factuel est relativement similaire à celui de l'encyclopédie payante Britannica ; les articles sont ...
|
| |
Vol. 14, No. 3. (2011), pp. 57-79
Abstract
La nature collaborative de l'encyclopédie en ligne Wikipédia amène naturellement ses contributeurs à travailler avec d'autres et à confronter leurs idées et points de vue. Or ni les cinq principes fondateurs, ni le logiciel wiki utilisé comme support de l'encyclopédie ne déterminent un cadre à cette collaboration. Dans cet article, nous étudions les initiatives spontanées de la communauté des contributeurs pour favoriser la collaboration, les échanges sociaux et la résolution des conflits. Pour analyser ces démarches, nous exploitons la notion de ...
|
| |
Abstract
This paper presents a simple approach to the Wikipedia Question Answering pilot task in CLEF 2006. The approach ranks the snippets, retrieved using the Lucene search engine, by means of a similarity measure based on bags of words extracted from both the snippets and the articles in wikipedia. Our participation was in the monolingual English and Spanish tasks. We obtained the best results in the Spanish one. ...
|
| |
Abstract
We investigate the integration of Wiki systems with automated natural language processing (NLP) techniques. The vision is that of a "self-aware" Wiki system reading, understanding, transforming, and writing its own content, as well as supporting its users in information analysis and content development. We provide a number of practical application examples, including index generation, question answering, and automatic summarization, which demonstrate the practicability and usefulness of this idea. A system architecture providing the integration is presented, as well as first results ...
|
| |
In EMNLP-CoNLL (2007), pp. 708-716
|
| |
In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (2006), pp. 192-199, doi:10.3115/1220835.1220860
Abstract
In this paper we present an extension of a machine learning based coreference resolution system which uses features induced from different semantic knowledge sources. These features represent knowledge mined from WordNet and Wikipedia, as well as information about semantic role labels. We show that semantic features indeed improve the performance on different referring expression types such as pronouns and common nouns. ...
|
| |
Abstract
This article is a reflection on the case of Wikipedia, the largest online reference site with 23 million articles, with 365 million readers, and without a page called Indigenous knowledge. A Postcolonial Computing lens, extended with the notion of decentering, is used to find out what happened with Indigenous knowledge in Wikipedia. Wikipedia's ordering technologies, such as policies and templates, play a central role in producing knowledge. Two designs, developed with and for Indigenous communities, are introduced to explore if another ...
|
| |
Abstract
As an increasing number of archival repositories, libraries, and cultural institutions build significant freely accessible digital collections, archivists and digital librarians must continue to develop digital outreach strategies that reflect the nature of searching and discovery in today's information economy. This case study examines the use of Wikipedia by the Ball State University Libraries as an opportunity to raise the visibility of digitized historic sheet music assets made available in the university's Digital Media Repository. By adding links to specific items ...
|
| |
Abstract
We review some recent endeavors and add some new results to characterize and understand underlying mechanisms in Wikipedia (WP), the paradigmatic example of collaborative value production. We analyzed the statistics of editorial activity in different languages and observed typical circadian and weekly patterns, which enabled us to estimate the geographical origins of contributions to WPs in languages spoken in several time zones. Using a recently introduced measure we showed that the editorial activities have intrinsic dependencies in the burstiness of events. ...
|
| |
BID, No. 28. (2012)
Abstract
El contenido de los repositorios de carácter patrimonial ha crecido en estos últimos años en España y la Wikipedia se ha convertido en una fuente fundamental de difusión, por ello en este estudio se a a describir y evaluar el uso en la Wikipedia de enlaces a las colecciones digitalizadas en bibliotecas, archivos y otras instituciones culturales. Para ello se han seleccionado instituciones relacionadas con el patrimonio cultural, en total 81. A partir de la selección se han buscado cuantos enlaces ...
|
| |
Information Research, Vol. 15, No. 3. (2010), pp. 28-28
Abstract
Wikipedia es una enciclopedia libre y abierta de colaboración en línea basado en una herramienta de escritura colaborativa llamada Wiki y las tres políticas fundamentales que guían el desarrollo de Wikipedia son: el "punto de vista neutral", "ninguna investigación original" y "verificabilidad". En general, los artículos de Wikipedia son de una calidad comparable a la de una enciclopedia impresa, aunque la calidad de los artículos individuales varía de baja a alta, esto depende en gran medida de los contribuyentes individuales. A ...
|
| |
First Monday, Vol. 16, No. 8. (2011)
Abstract
Se han utilizado las bases de datos de ISI y Scopus para recoger datos acerca de las citas y el desarrollo de investigaciones en la Wikipedia. En este estudio se han revisado todos los factores que afectan a la publicación en Wikipedia. El resultado nos aclara el impacto y la influencia que tiene este medio, además se ha logrado identificar los autores principales, las instituciones afiliadas, los países, los campos académicos y las publicaciones que más se citan en la Wikipedia. ...
|
| |
First Monday, Vol. 16, No. 4-4. (2011)
Abstract
Este artículo examina la credibilidad de los artículos en Wikipedia, tratando de encontrar y entender cómo se verifica esta información. Se realizó el estudio en una gran universidad del medio oeste de los USA en la primavera de 2010. Los resultados ofrecen ciertos patrones interesantes, sobre todo en que hay mucha diferencias según el género y en el grado de satisfacción según si los usuarios trataron de comprobar la información o no. De todas formas los autores del artículo opinan que ...
|
| |
First Monday, Vol. 15, No. 3. (2010), pp. 27-27
Abstract
Presentación de un estudio realizado por la Universidad – Escuela de Información de Washintong dentro del Proyecto de alfabetización informacional (PIL), donde se presentan y analizan los resultados obtenidos en la encuesta realizada a unos grupos de estudiantes universitarios del campo de las Ciencias Sociales y Humanidades de seis universidades de U.S. previamente definidos previa realización de un estudio de los mismo. Con el objetivo de averiguar por qué motivos, con qué frecuencia, en qué etapas de la investigación, qué tipo ...
|
| |
BID : textos universitaris de biblioteconomia i documentació, No. 28. (2012), pp. 1-18
Abstract
Wikipedia es un espacio virtual con carácter de enciclopedia generalista, que se ajusta a los documentos digitales de las bibliotecas digitales, y que tiene un gran seguimiento en la red. La cita de un documento digital en Wikipedia es un buen indicador para Europeana del uso de los contenidos de una biblioteca digital. El impacto de las bibliotecas digitales españolas en Wikipedia es bastante decepcionante, tanto en español como en catalán. La excepción es la Biblioteca Virtual Miguel de Cervantes, proyecto ...
|
| |
(27 Feb 2006)
Abstract
We present an analysis of the statistical properties and growth of the free on-line encyclopedia Wikipedia. By describing topics by vertices and hyperlinks between them as edges, we can represent this encyclopedia as a directed graph. The topological properties of this graph are in close analogy with that of the World Wide Web, despite the very different growth mechanism. In particular we measure a scale--invariant distribution of the in-- and out-- degree and we are able to reproduce these features by means of a simple statistical model. ...
|
| |
|
| |
(24 Dec 2012)
Abstract
The simplicity of producing and consuming online content makes it difficult to estimate how much attention will be devoted from Internet users to any given content. This work presents a general overview of temporal patterns in the access to content on a huge collaborative platform. We propose a model for predicting the popularity of promoted content, inspired by the analysis of the page-view dynamics on Wikipedia. Compared to previous studies, the observed popularity patterns are more complex; however, our model uses just few parameters to fully describe them. ...
|
| |
|
| |
In Biannual Conference of the Society for Computational Linguistics and Language Technology (2007)
Abstract
We analyze Wikipedia as a lexical semantic resource and compare it with conventional resources, such as dictionaries, thesauri, semantic wordnets, etc. Different parts of Wikipedia reflect different aspects of these resources. We show that Wikipedia contains a vast amount of knowledge about, e.g., named entities, domain specific terms, and rare word senses. If Wikipedia is to be used as a lexical semantic resource in large-scale NLP tasks, efficient programmatic access to the knowledge therein is required. We review existing access mechanisms ...
|
| |
(15 May 2007)
Abstract
The Internet-based encyclopaedia Wikipedia has grown to become one of the most visited web-sites on the Internet. However, critics have questioned the quality of entries, and an empirical study has shown Wikipedia to contain errors in a 2005 sample of science entries. Biased coverage and lack of sources are among the "Wikipedia risks". The present work describes a simple assessment of these aspects by examining the outbound links from Wikipedia articles to articles in scientific journals with a comparison against journal statistics from Journal Citation Reports such as ...
|
| |
(16 February 2012)
Abstract
In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory effects play an important role in controversies. On long time scales, we identify three distinct developmental patterns for the ...
|
| |
Abstract
BACKGROUND: With the advent of Web 2.0 technologies, user-edited online resources such as Wikipedia are increasingly tapped for information. However, there is little research on the quality of health information found in Wikipedia. OBJECTIVE: To compare the scope, completeness, and accuracy of drug information in Wikipedia with that of a free, online, traditionally edited database (Medscape Drug Reference [MDR]). METHODS: Wikipedia and MDR were assessed on 8 categories of drug information. Questions were constructed and answers were verified with authoritative resources. ...
|
| |
posted to data ontology reference wikipedia
by jclos
on 2013-01-24 09:56:18
Abstract
Your are here: DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. ...
|
| |
posted to api services web wikipedia
by jclos
on 2013-01-24 09:56:18
Abstract
The wikipedia webservices give access to georeferenced wikipedia articles in 240 languages. For the largest languages (English, German, French, Spanish, Italian and Polish) full text and a summary is also available. ...
|
| |
|
| |
|
| |
In In Proc. 2007 Joint Conference on EMNLP and CNLL (2007), pp. 708-716
Abstract
This paper presents a large-scale system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection and Web search results. It describes in detail the disambiguation paradigm employed and the information extraction process from Wikipedia. Through a process of maximizing the agreement between the contextual information extracted from Wikipedia and the context of a document, as well as the agreement among the category tags associated with the candidate entities, the implemented system shows ...
|
| |
Abstract
With the rise of user-generated content, evaluating the credibility of information has become increasingly important. It is already known that various user characteristics influence the way credibility evaluation is performed. Domain experts on the topic at hand primarily focus on semantic features of information (e.g., factual accuracy), whereas novices focus more on surface features (e.g., length of a text). In this study, we further explore two key influences on credibility evaluation: topic familiarity and information skills. Participants with varying expected levels ...
|
| |
posted to wikipedia
by filipmarcinowski
on 2013-01-14 19:43:06
Abstract
Open collaboration systems, such as Wikipedia, need to maintain a pool of volunteer contributors to remain relevant. Wikipedia was created through a tremendous number of contributions by millions of contributors. However, recent research has shown that the number of active contributors in Wikipedia has been declining steadily for years and suggests that a sharp decline in the retention of newcomers is the cause. This article presents data that show how several changes the Wikipedia community made to manage quality and consistency ...
|
| |
In Proceedings of the 6th EACL Workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities (2012), pp. 101-106
Abstract
Large numbers of cultural heritage items are now archived digitally along with accompanying metadata and are available to anyone with internet access. This information could be enriched by adding links to resources that provide background information about the items. Techniques have been developed for automatically adding links to Wikipedia to text but the methods are general and not designed for use with cultural heritage data. This paper explores a range of methods for adapting a system for adding links to Wikipedia ...
|
| |
In Proceedings of the 6th EACL Workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities (2012), pp. 94-100
Abstract
Over the past years large digital cultural heritage collections have become increasingly available. While these provide adequate search functionality for the expert user, this may not offer the best support for non-expert or novice users. In this paper we propose a novel mechanism for introducing new users to the items in a collection by allowing them to browse Wikipedia articles, which are augmented with items from the cultural heritage collection. Using Europeana as a case-study we demonstrate the effectiveness of our ...
|
| |
Abstract
The purposes of this study were to explore college students' perceptions, uses of, and motivations for using Wikipedia, and to understand their information behavior concerning Wikipedia based on social cognitive theory (SCT). A Web survey was used to collect data in the spring of 2008. The study sample consisted of students from an introductory undergraduate course at a large public university in the midwestern United States. A total of 134 students participated in the study, resulting in a 32.8% response rate. ...
|
| |
In Proceedings of the 21st ACM international conference on Information and knowledge management (2012), pp. 734-743, doi:10.1145/2396761.2396855
posted to context ir wikipedia
by ctl
on 2013-01-01 23:23:54
Abstract
Context surrounding hyperlinked semi-structured documents, externally in the form of citations and internally in the form of hierarchical structure, contains a wealth of useful but implicit evidence about a document's relevance. These rich sources of information should be exploited as contextual evidence. This paper proposes various methods of accumulating evidence from the context, and measures the effect of contextual evidence on retrieval effectiveness for document and focused retrieval of hyperlinked semi-structured documents. We propose a re-weighting model to contextualize (a) evidence ...
|
| |
In Proceedings of the Eighth International Conference on Computational Semantics (IWCS-8) (2009)
|
| |
In Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education (LaTeCH-SHELT&R 2009) (2009)
|
| |
Abstract
Wikipedia (the “free online encyclopedia that anyone can edit”) is having a huge impact on how a great many people gather information about the world. So, it is important for epistemologists and information scientists to ask whether people are likely to acquire knowledge as a result of having access to this information source. In other words, is Wikipedia having good epistemic consequences? After surveying the various concerns that have been raised about the reliability of Wikipedia, this article argues that the ...
|
| |
Abstract
This study presents findings from one-on-one interviews with 21 undergraduate students at a large public research university in the southeastern United States. While the preliminary focus of the study was to be students' opinions about and use of Wikipedia as a resource for course-related research, many of the interviews evolved into discussion about the relative merits of freely-available web-based resources as compared with subscription databases. In addition to providing illuminating information about respondents' relationships with Wikipedia and Google, these interviews offered ...
|
| |
Abstract
Wikis are established means for the collaborative authoring, versioning and publishing of textual articles. The Wikipedia project, for example, succeeded in creating the by far largest encyclopedia just on the basis of a wiki. Recently, several approaches have been proposed on how to extend wikis to allow the creation of structured and semantically enriched content. However, the means for creating semantically enriched structured content are already available and are, although unconsciously, even used by Wikipedia authors. In this article, we present ...
|
| |
(5 Nov 2012)
Abstract
Use of socially generated "big data" to access information about collective states of the minds in human societies becomes a new paradigm in the emerging field of computational social science. One of the natural application of this would be prediction of the society's reaction to a new product in the sense of popularity and adoption rate. However, bridging between "real time monitoring" and "early predicting" remains as a big challenge. Here, we report on an endeavor to build a minimalistic predictive model for the financial success of movies ...
|
| |
edited by Karl Aberer, Key-Sun Choi, Natasha Noy, et al.Dean Allemang, Kyung-Il Lee, Lyndon Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, Philippe Cudré-Mauroux
Abstract
DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human- and machine-consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate ...
|
| |
In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval (2012), pp. 981-990, doi:10.1145/2348283.2348413
Abstract
The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup ...
|
| |
Abstract
In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of ...
|
| |
posted to informatics wikipedia
by dullhunk
to the group Journal picks
on 2012-10-24 22:15:55
|
| |
posted to athene-donald wikipedia
by dullhunk
to the group Journal picks
on 2012-10-24 22:01:58
|
| |
Abstract
Admittedly this is a presumptuous title that should never be used when reporting on individual research advances. Wisdom is just not a scientific concept. In this case, though, we are reporting on recent developments on the web that lead us to believe that the web is on the way to providing a platform for not only information acquisition and business transactions but also for large scale knowledge development and decision support. It is likely that by now every web user has ...
Note (first note only)
Best first sentence of an abstract. Ever.
|
| |
Abstract
The rise of the Internet has enabled collaboration and cooperation on anunprecedentedly large scale. The online encyclopedia Wikipedia, which presently comprises 7.2 million articles created by 7.04 million distinct editors, provides a consummate example. We examined all 50 million edits made tothe 1.5 million English-language Wikipedia articles and found that the high-quality articles are distinguished by a marked increase in number of edits, number of editors, and intensity of cooperative behavior, as compared to other articles of similar visibility and age. ...
|
| |
J. Mach. Learn. Res., Vol. 98888 (June 2012), pp. 2063-2067
Abstract
Pattern is a package for Python 2.4+ with functionality for web mining (Google + Twitter + Wikipedia, web spider, HTML DOM parser), natural language processing (tagger/chunker, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, k-means clustering, Naive Bayes + k-NN + SVM classifiers) and network analysis (graph centrality and visualization). It is well documented and bundled with 30+ examples and 350+ unit tests. The source code is licensed under BSD and available from http://www.clips.ua.ac.be/pages/pattern. ...
|
| |
Abstract
We present Sig.ma, both a service and an end user application to access the Web of Data as an integrated information space. Sig.ma uses an holistic approach in which large scale semantic Web indexing, logic reasoning, data aggregation heuristics, ad-hoc ontology consolidation, external services and responsive user interaction all play together to create rich entity descriptions. These consolidated entity descriptions then form the base for embeddable data mashups, machine oriented services as well as data browsing services. Finally, we discuss Sig.ma's ...
|
| |
Abstract
Named Entity Recognition (NER) is a subtask of information extraction and aims to identify atomic entities in text that fall into predefined categories such as person, location, organization, etc. Recent efforts in NER try to extract entities and link them to linked data entities. Linked data is a term used for data resources that are created using semantic web standards such as DBpedia. There are a number of online tools that try to identify named entities in text and link them ...
|