Tags

mehrbod's library 109 articles

 
 

Modeling Hidden Topics on Document Manifold

  [CiTO]
In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM'08) (2008), pp. 911-920
posted to no-tag by mehrbod on 2011-05-09 06:12:02 **
 

Latent semantic mapping [information retrieval]

  [CiTO]
Signal Processing Magazine, IEEE In Signal Processing Magazine, IEEE, Vol. 22, No. 5. (2005), pp. 70-80
posted to no-tag by mehrbod on 2011-05-09 06:06:40 **

Abstract

This article has described LSM, a data-driven framework for modeling globally meaningful relationships implicit in large volumes of data. LSM generalizes a paradigm originally developed to capture hidden word patterns in a text document corpus. Over the past decade, this paradigm has proven effective in an increasing variety of fields, gradually spreading from query-based information retrieval to word clustering, document/topic clustering, large-vocabulary speech recognition language modeling, automated call routing, semantic inference for spoken interface control, and several other speech processing applications. ...

 

Proactive learning: cost-sensitive active learning with multiple imperfect oracles

  [CiTO]
In Proceedings of the 17th ACM conference on Information and knowledge management (2008), pp. 619-628, doi:10.1145/1458082.1458165
posted to no-tag by mehrbod on 2011-03-25 13:52:22 ** along with 4 people downer jdu mbilgic saeedabdullah

Abstract

Proactive learning is a generalization of active learning designed to relax unrealistic assumptions and thereby reach practical applications. Active learning seeks to select the most informative unlabeled instances and ask an omniscient oracle for their labels, so as to retrain the learning algorithm maximizing accuracy. However, the oracle is assumed to be infallible (never wrong), indefatigable (always answers), individual (only one oracle), and insensitive to costs (always free or always charges the same). Proactive learning relaxes all four of these assumptions, ...

 

Spamscatter: characterizing internet scam hosting infrastructure

  [CiTO]
In SS'07: Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium (2007), pp. 1-14
posted to no-tag by mehrbod on 2011-03-25 13:36:31 ** along with 2 people ckanich fxsanchez
 

TREC 2007 Spam Track Overview

  [CiTO]
In TREC, Vol. Special Publication 500-274 (2007)
posted to no-tag by mehrbod on 2011-03-25 13:34:16 **
 

Learning of personalized security settings

  [CiTO]
(October 2010), pp. 3428-3432, doi:10.1109/icsmc.2010.5642461
posted to no-tag by mehrbod on 2011-03-24 17:20:12 **

Abstract

While many cybersecurity tools are available to computer users, their default configurations often do not match needs of specific users. Since most modern users are not computer experts, they are often unable to customize these tools, thus getting either insufficient or excessive security. To address this problem, we are developing an automated assistant that learns security needs of the user and helps customize available tools. ...

 

Robust defenses for cross-site request forgery

  [CiTO]
In In To appear at the 15th ACM Conference on Computer and Communications Security (CCS (2008)
posted to no-tag by mehrbod on 2011-03-24 17:15:35 ** along with 1 person Mutjake

Abstract

Cross-Site Request Forgery (CSRF) is a widely exploited web site vulnerability. In this paper, we present a new variation on CSRF attacks, login CSRF, in which the attacker forges a cross-site request to the login form, logging the victim into the honest web site as the attacker. The severity of a login CSRF vulnerability varies by site, but it can be as severe as a cross-site scripting vulnerability. We detail three major CSRF defense techniques and find shortcomings with each technique. ...

 

Exploiting known taxonomies in learning overlapping concepts

  [CiTO]
In Proceedings of the 20th international joint conference on Artifical intelligence (2007), pp. 714-719
posted to no-tag by mehrbod on 2011-03-24 17:08:43 **

Abstract

Many real-world classification problems involve large numbers of overlapping categories that are arranged in a hierarchy or taxonomy. We propose to incorporate prior knowledge on category taxonomy directly into the learning architecture. We present two concrete multi-label classification methods, a generalized version of Perceptron and a hierarchical multi-label SVM learning. Our method works with arbitrary, not necessarily singly connected taxonomies, and can be applied more generally in settings where categories are characterized by attributes and relations that are not necessarily induced ...

 

Linear prediction models with graph regularization for web-page categorization

  [CiTO]
In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (2006), pp. 821-826, doi:10.1145/1150402.1150510
posted to no-tag by mehrbod on 2011-03-24 17:06:26 ** along with 2 people ajbattle ChaTo

Abstract

We present a risk minimization formulation for learning from both text and graph structures which is motivated by the problem of collective inference for hypertext document categorization. The method is based on graph regularization formulated as a well-formed convex optimization problem. We present numerical algorithms for our formulation, and show that such combination of local text features and link information can lead to improved predictive accuracy. ...

 

Experts' retrieval with multiword-enhanced author topic model

  [CiTO]
In Proceedings of the NAACL HLT 2010 Workshop on Semantic Search (2010), pp. 10-18
posted to no-tag by mehrbod on 2011-03-24 17:01:24 **

Abstract

In this paper, we propose a multiword-enhanced author topic model that clusters authors with similar interests and expertise, and apply it to an information retrieval system that returns a ranked list of authors related to a keyword. For example, we can retrieve Eugene Charniak via search for statistical parsing. The existing works on author topic modeling assume a "bag-of-words" representation. However, many semantic atomic concepts are represented by multiwords in text documents. This paper presents a pre-computation step as a way ...

 

Relational Topic Models for Document Networks

  [CiTO]
In AIStats (2009)
posted to no-tag by mehrbod on 2011-03-24 16:06:20 **
 

Dynamic topic models

  [CiTO]
In Proceedings of the 23rd international conference on Machine learning (2006), pp. 113-120, doi:10.1145/1143844.1143859
posted to no-tag by mehrbod  on 2011-03-24 15:51:49 ** along with 19 people and 2 groups adamsi asterix77 briordan cbosuna fiacobelli gunurra jhe julianpan kohei-h ldietz markymaypo maximzhao mshafiei newpoo Richmonp uvriss weiwu wiizane zhoujianying ARTFL pim

Abstract

A family of probabilistic time series models is developed to analyze the time evolution of topics in large document collections. The approach is to use state space models on the natural parameters of the multinomial distributions that represent the topics. Variational approximations based on Kalman filters and nonparametric wavelet regression are developed to carry out approximate posterior inference over the latent topics. In addition to giving quantitative, predictive models of a sequential corpus, dynamic topic models provide a qualitative window into ...

 

Modeling hidden topics on document manifold

  [CiTO]
In CIKM '08: Proceeding of the 17th ACM conference on Information and knowledge management (2008), pp. 911-920, doi:10.1145/1458082.1458202
posted to no-tag by mehrbod  on 2011-03-24 15:33:17 ** along with 2 people and 1 group hazen JamesChien Adaptive-Web

Abstract

Topic modeling has been a key problem for document analysis. One of the canonical approaches for topic modeling is Probabilistic Latent Semantic Indexing, which maximizes the joint probability of documents and terms in the corpus. The major disadvantage of PLSI is that it estimates the probability distribution of each document on the hidden topics independently and the number of parameters in the model grows linearly with the size of the corpus, which leads to serious problems with overfitting. Latent Dirichlet Allocation ...

 

Evaluation of utility of LSA for word sense discrimination

  [CiTO]
In In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers (2006), pp. 77-80
posted to no-tag by mehrbod on 2011-03-24 14:42:24 **

Abstract

The goal of the on-going project described in this paper is evaluation of the utility of Latent Semantic Analysis (LSA) for unsupervised word sense discrimination. The hypothesis is that LSA can be used to compute context vectors for ambiguous words that can be clustered together – with each cluster corresponding to a different sense of the word. In this paper we report first experimental result on tightness, separation and purity of sense-based clusters as a function of vector space dimensionality and ...

 

Indexing by Latent Semantic Analysis

  [CiTO]
Journal of the American Society of Information Science, Vol. 41, No. 6. (1990), pp. 391-407
posted to no-tag by mehrbod  on 2011-03-24 14:35:58 ** along with 43 people and 5 groups abellogin adamsi aliku almadana ankzaman avulanov baaic bundschu camster Cerzi cybermax derek_farn dmnapolitano dormieus ehohman fbihack gonenc haroldfigueroa hernandezl irenas jelsas jliegl k12u ldietz mafwood maripsa matjajuri mortimer mthomure navil pcchang85 pdlug perceptron Phanix pprett RafG rspeer rumig ubi unnonouno vegchang xxc yeminjiao AI Blog_and_Wiki_Research dbk-lab UoY-CS-AIG Wikipedia

Abstract

A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents ("semantic structure") in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca 100 orthogonal factors from which the original matrix can be... ...

 

Combining labeled and unlabeled data with co-training

  [CiTO]
In Proceedings of the 11th Annual Conference on Computational Learning Theory (1998), pp. 92-100
posted to no-tag by mehrbod  on 2011-03-24 14:22:26 ** along with 12 people and 2 groups adamsi andi_urra bpacker davidr gangli gkvas jdu markymaypo mbilgic mukundn sdvillal sugarexpletive ARTFL tulip

Abstract

We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular, we consider a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views. For example, the description of a web page can be partitioned into the words occurring on that page, and the words occurring in ...

 

An empirical study of smoothing techniques for language modeling

  [CiTO]
In Proceedings of the 34th annual meeting on Association for Computational Linguistics (1996), pp. 310-318, doi:10.3115/981863.981904
posted to no-tag by mehrbod  on 2011-03-24 14:20:23 ** along with 11 people and 1 group codex hehrig johnkork justinbetteridge mapio mtkachenko pickw pprett pschulam walsanie zzb3886 searchingspeech2010

Abstract

We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, ...

 

Using and combining predictors that specialize

  [CiTO]
In STOC '97: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing (1997), pp. 334-343, doi:10.1145/258533.258616
posted to no-tag by mehrbod on 2011-03-24 14:18:16 ** along with 3 people davidr education03 holopoj

Abstract

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references. ...

 

Statistical language modeling for information retrieval

  [CiTO]
Annual Review of Information Science and Technology, Vol. 39, No. 1. (2005), pp. 1-31, doi:10.1002/aris.1440390108

Abstract

No Abstract. ...

 

Locality-sensitive hashing scheme based on p-stable distributions

  [CiTO]
In Proceedings of the twentieth annual symposium on Computational geometry (2004), pp. 253-262, doi:10.1145/997817.997857
posted to no-tag by mehrbod  on 2011-03-21 07:50:57 ** along with 10 people and 1 group asterix77 beshining clickstone eddymier hayko jingqizu longtop myui rueycheng spacedragon RecommenderSystem

Abstract

We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the lp norm. It also yields the first known provably efficient approximate NN algorithm for the case p<1. We also show that the algorithm finds the exact near neigbhor in O(log n) time for data satisfying certain "bounded growth" condition.Unlike earlier schemes, our LSH scheme works directly on ...

 

Locality-Sensitive Hashing for Finding Nearest Neighbors

  [CiTO]
Signal Processing Magazine, IEEE In Signal Processing Magazine, IEEE, Vol. 25, No. 2. (March 2008), pp. 128-131, doi:10.1109/msp.2007.914237
posted to no-tag by mehrbod  on 2011-03-21 07:49:29 ** along with 3 people and 1 group ianturton maropu pick600 Geomatics

Abstract

This lecture note describes a technique known as locality-sensitive hashing (LSH) that allows one to quickly find similar entries in large databases. This approach belongs to a novel and interesting class of algorithms that are known as randomized algorithms. A randomized algorithm does not guarantee an exact answer but instead provides a high probability guarantee that it will return the correct answer or one close to it. By investing additional computational effort, the probability can be pushed as high as desired. ...

 

Semantic hashing

  [CiTO]
Int. J. Approx. Reasoning, Vol. 50, No. 7. (10 July 2009), pp. 969-978, doi:10.1016/j.ijar.2008.11.006

Abstract

We show how to learn a deep graphical model of the word-count vectors obtained from a large set of documents. The values of the latent variables in the deepest layer are easy to infer and give a much better representation of each document than Latent Semantic Analysis. When the deepest layer is forced to use a small number of binary variables (e.g. 32), the graphical model performs ”semantic hashing”: Documents are mapped to memory addresses in such a way that semantically ...

 

Hierarchical document categorization with support vector machines

  [CiTO]
In CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management (2004), pp. 78-87, doi:10.1145/1031171.1031186
posted to no-tag by mehrbod on 2011-03-21 07:40:38 ** along with 1 person agaelebe

Abstract

Automatically categorizing documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques like Support Vector Machines and related large margin methods have been successfully applied for this task, albeit the fact that they ignore the inter-class relationships. In this paper, we propose a novel hierarchical classification method that generalizes Support Vector Machine learning and that is based on discriminant functions that are structured in a way that mirrors the class hierarchy. ...

 

Support vector machines classification with a very large-scale taxonomy

  [CiTO]
SIGKDD Explor. Newsl., Vol. 7, No. 1. (June 2005), pp. 36-43, doi:10.1145/1089815.1089821
posted to no-tag by mehrbod on 2011-03-21 07:38:22 ** along with 2 people avulanov vnata

Abstract

Very large-scale classification taxonomies typically have hundreds of thousands of categories, deep hierarchies, and skewed category distribution over documents. However, it is still an open question whether the state-of-the-art technologies in automated text categorization can scale to (and perform well on) such large taxonomies. In this paper, we report the first evaluation of Support Vector Machines (SVMs) in web-page classification over the full taxonomy of the Yahoo! categories. Our accomplishments include: 1) a data analysis on the Yahoo! taxonomy; 2) the ...

 

Supervised Aggregation of Classifiers using Artificial Prediction Markets

  [CiTO]
In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (June 2010), pp. 591-598
posted to no-tag by mehrbod on 2011-03-21 07:17:05 **
 

A study of thresholding strategies for text categorization

  [CiTO]
In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (2001), pp. 137-145, doi:10.1145/383952.383975
posted to no-tag by mehrbod on 2011-03-21 06:54:47 ** along with 2 people akastrin philiplei

Abstract

Thresholding strategies in automated text categorization are an underexplored area of research. This paper presents an examination of the effect of thresholding strategies on the performance of a classifier under various conditions. Using k-Nearest Neighbor (kNN) as the classifier and five evaluation benchmark collections as the testbets, three common thresholding methods were investigated, including rank-based thresholding (RCut), proportion-based assignments (PCut) and score-based local optimization (SCut); in addition, new variants of these methods are proposed to overcome significant problems in ...

 

On the algorithmic implementation of multiclass kernel-based vector machines

  [CiTO]
Journal of Machine Learning Research, Vol. 2 (December 2001), pp. 265-292
posted to no-tag by mehrbod on 2011-03-21 06:52:37 ** along with 1 person kira

Abstract

In this paper we describe the algorithmic implementation of multiclass kernel-based vector machines. Our starting point is a generalized notion of the margin to multiclass problems. Using this notion we cast multiclass categorization problems as a constrained optimization problem with a quadratic objective function. Unlike most of previous approaches which typically decompose a multiclass problem into multiple independent binary classification tasks, our notion of margin yields a direct method for training multiclass predictors. By using the dual of the optimization problem ...

 

Using error-correcting codes for text classification

  [CiTO]
In Proceedings of ICML-00, 17th International Conference on Machine Learning (2000), pp. 303-310
edited by Pat Langley
posted to no-tag by mehrbod  on 2011-03-21 06:48:51 ** along with 2 people and 1 group jread82 zielaj inference-group

Abstract

This paper explores in detail the use of Error Correcting Output Coding (ECOC) for learning text classifiers. We show that the accuracy of a Naive Bayes Classifier over text classification tasks can be significantly improved by taking advantage of the error-correcting properties of the code. We also explore the use of different kinds of codes, namely Error-Correcting Codes, Random Codes, and Domain and Data-specific codes and give experimental results for each of them. The ECOC method ... ...

 

Random Forests

  [CiTO]
In Machine Learning, Vol. 45 (2001), pp. 5-32
posted to no-tag by mehrbod on 2011-03-21 06:40:20 **
 

A k-Nearest Neighbor Based Algorithm for Multi-label Classification

  [CiTO]
IEEE International Conference on Granular Computing, Vol. 2 (2005), pp. 718-721 Vol. 2
posted to no-tag by mehrbod  on 2011-03-21 06:26:51 ** along with 5 people and 4 groups bhurley egibaja jread82 markymaypo mlinarev ARTFL COLSWE mlkd Multilabel Classification

Abstract

In multi-label learning, each instance in the training set is associated with a set of labels, and the task is to output a label set whose size is unknown a priori for each unseen instance. In this paper, a multi-label lazy learning approach named ML-kNN is presented, which is derived from the traditional k-nearest neighbor (kNN) algorithm. In detail, for each new instance, its k-nearest neighbors are firstly identified. After that, according to the label sets of these neighboring instances, maximum ...

 

Multi-label Output Codes using Canonical Correlation Analysis

  [CiTO]
AISTAT (2011)
posted to no-tag by mehrbod  on 2011-03-21 05:52:40 ** along with 1 person and 1 group egibaja Multilabel Classification
 

Coupled semi-supervised learning for information extraction

  [CiTO]
In Proceedings of the third ACM international conference on Web search and data mining (2010), pp. 101-110
 

Thumbs up? Sentiment Classification using Machine Learning Techniques

  [CiTO]
In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2002)
posted to no-tag by mehrbod  on 2011-03-21 05:30:17 ** along with 5 people kurumo pkrzyzaniak praisegod43v3r sachina Scis0000002

Abstract

We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based... ...

 

Semi-supervised Extraction of Entity Aspects Using Topic Models

  [CiTO]
(2009)
posted to no-tag by mehrbod on 2011-03-21 05:25:13 **
 

Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars

  [CiTO]
In UAI (2005), pp. 658-666
posted to no-tag by mehrbod on 2011-03-21 05:10:42 **
 

A survey of named entity recognition and classification

  [CiTO]
Lingvisticae Investigationes, Vol. 30, No. 1. (January 2007), pp. 3-26, doi:10.1075/li.30.1.03nad
posted to no-tag by mehrbod  on 2011-03-21 05:06:03 ** along with 16 people and 3 groups AlisonBabeu anthropomorphism arafalov asimanovsky dabril dmitry_dontsov ejmeij johnkork lisah2u lsamper mainka markymaypo pcalado wb yusmi zareensyed1 ARTFL ilps NLP

Abstract

This survey covers fifteen years of research in the Named Entity Recognition and Classification (NERC) field, from 1991 to 2006. We report observations about languages, named entity types, domains and textual genres studied in the literature. From the start, NERC systems have been developed using hand-made rules, but now machine learning techniques are widely used. These techniques are surveyed along with other critical aspects of NERC such as features and evaluation methods. Features are word-level, dictionary-level and corpus-level representations of words ...

 

School of phish: a real-world evaluation of anti-phishing training

  [CiTO]
In Proceedings of the 5th Symposium on Usable Privacy and Security (2009), pp. 1-12, doi:10.1145/1572532.1572536

Abstract

PhishGuru is an embedded training system that teaches users to avoid falling for phishing attacks by delivering a training message when the user clicks on the URL in a simulated phishing email. In previous lab and real-world experiments, we validated the effectiveness of this approach. Here, we extend our previous work with a 515-participant, real-world study in which we focus on long-term retention and the effect of two training messages. We also investigate demographic factors that influence training and general phishing ...

 

Web spam taxonomy

  [CiTO]
In First International Workshop on Adversarial Information Retrieval on the Web (2005)
posted to no-tag by mehrbod on 2011-01-13 18:03:27 ** along with 4 people agulli elsantosneto thienanh yniu

Abstract

Web spamming refers to actions intended to mislead search engines into ranking some pages higher than they deserve. Recently, the amount of web spam has increased dramatically, leading to a degradation of search results. This paper presents a comprehensive taxonomy of current spamming techniques, which we believe can help in developing appropriate countermeasures. ...

 

Efficient projections onto the $\ell_1$-ball for learning in high dimensions

  [CiTO]
In Proceedings of the 25th international conference on Machine learning (2008), pp. 272-279, doi:10.1145/1390156.1390191

Abstract

We describe efficient algorithms for projecting a vector onto the l1-ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the l1-ball, projecting in O(k log(n)) time. This setting is especially useful for online learning in sparse feature spaces such as text categorization applications. We demonstrate the merits and effectiveness of our algorithms in numerous ...

 

Opinion spam and analysis

  [CiTO]
In Proceedings of the international conference on Web search and web data mining (2008), pp. 219-230, doi:10.1145/1341531.1341560
posted to no-tag by mehrbod  on 2010-12-25 20:36:14 ** along with 7 people and 1 group ChaTo exp_chris JihyePark jliegl kjedrzejewski nliu82 textmining09_dbs_fu opinion mining

Abstract

Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research has been focused on classification and summarization of opinions using natural language processing and data mining techniques. An important issue that has been neglected so far is opinion spam or trustworthiness of online opinions. In this paper, we study this issue in the context of ...

 

Reading between the lines: learning to map high-level instructions to commands

  [CiTO]
In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (2010), pp. 1268-1277
posted to no-tag by mehrbod on 2010-12-12 20:11:28 **

Abstract

In this paper, we address the task of mapping high-level instructions to sequences of commands in an external environment. Processing these instructions is challenging---they posit goals to be achieved without specifying the steps required to complete them. We describe a method that fills in missing information using an automatically derived environment model that encodes states, transitions, and commands that cause these transitions to happen. We present an efficient approximate approach for learning this environment model as part of a policy-gradient reinforcement ...

 

WordsEye: An Automatic Text-to-Scene Conversion System

  [CiTO]
(2001)
posted to no-tag by mehrbod on 2010-12-12 20:08:20 ** along with 1 person AlexFortitude

Abstract

Natural language is an easy and effective medium for describing visual ideas and mental images. Thus, we foresee the emergence of language-based 3D scene generation systems to let ordinary users quickly create 3D scenes without having to learn special software, acquire artistic skills, or even touch a desktop window-oriented interface. WordsEye is such a system for automatically converting text into representative 3D scenes. WordsEye relies on a large database of 3D models and poses to depict entities and actions. Every 3D ...

 

Content-based Web Spam Detection

  [CiTO]
In In Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web (AIRWeb (2007)
posted to no-tag by mehrbod on 2010-11-15 15:54:04 **

Abstract

of ten content-based classifiers stacked using logistic regression. Each classifier used one of two state-of-the art email filters – DMC [2] or OSBF-Lua [1] – applied to simple text files, with each text file acting as a proxy for a host to be classified. All text files were derived from the home page (including ...

 

Less is More: Sparse Graph Mining with Compact Matrix Decomposition

  [CiTO]
Stat. Anal. Data Min., Vol. 1, No. 1. (February 2008), pp. 6-22, doi:10.1002/sam.v1:1
posted to no-tag by mehrbod on 2010-10-08 20:18:27 ** along with 3 people dragonrez hazen nliu82

Abstract

Given a large sparse graph, how can we find patterns and anomalies? Several important applications can be modeled as large sparse graphs, e.g., network traffic monitoring, research citation network analysis, social network analysis, and financial transactions. Low-rank decompositions, such as singular value decomposition (SVD) and CUR, are powerful techniques for revealing latent-hidden variables and associated patterns from high dimensional data. However, those methods often ignore the sparsity property of the graph, and hence usually incur too high memory and computational cost ...

 

Less is More: Compact Matrix Decomposition for Large Sparse Graphs

  [CiTO]
In SDM (2007)
posted to no-tag by mehrbod on 2010-10-08 20:18:11 ** along with 2 people duce sdvillal
 

GraphScope: parameter-free mining of large time-evolving graphs

  [CiTO]
In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining (2007), pp. 687-696, doi:10.1145/1281192.1281266
posted to no-tag by mehrbod  on 2010-10-08 20:17:16 ** along with 8 people compbio Demiurgo DynamicNetworkAnalysis elsantosneto hazen praesepegreendragon salmanjamali tmmurali

Abstract

How can we find communities in dynamic networks of socialinteractions, such as who calls whom, who emails whom, or who sells to whom? How can we spot discontinuity time-points in such streams of graphs, in an on-line, any-time fashion? We propose GraphScope, that addresses both problems, using information theoretic principles. Contrary to the majority of earlier methods, it needs no user-defined parameters. Moreover, it is designed to operate on large graphs, in a streaming fashion. We demonstrate the efficiency and effectiveness ...

 

Combating web spam with trustrank

  [CiTO]
In VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases (2004), pp. 576-587
posted to trust by mehrbod on 2010-10-05 21:04:40 ** along with 4 people julenka macle rlichten yniu

Abstract

Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine's results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semi-automatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that ...

 

Who falls for phish?: a demographic analysis of phishing susceptibility and effectiveness of interventions

  [CiTO]
In Proceedings of the 28th international conference on Human factors in computing systems (2010), pp. 373-382, doi:10.1145/1753326.1753383
posted to no-tag by mehrbod on 2010-10-05 19:45:53 ** along with 2 people sadia499 tamabravolillo

Abstract

In this paper we present the results of a roleplay survey instrument administered to 1001 online survey respondents to study both the relationship between demographics and phishing susceptibility and the effectiveness of several anti-phishing educational materials. Our results suggest that women are more susceptible than men to phishing and participants between the ages of 18 and 25 are more susceptible to phishing than other age groups. We explain these demographic factors through a mediation analysis. Educational materials reduced users' tendency to ...

 

Star Quality: Aggregating Reviews to Rank Products and Merchants

  [CiTO]
In International Conference on Weblogs and Social Media (May 2010)
posted to no-tag by mehrbod on 2010-10-05 19:43:52 ** along with 2 people chihchun_chen mmcgloho
 

Characterizing Microblogs with Topic Models

  [CiTO]
In ICWSM (2010)
posted to no-tag by mehrbod on 2010-10-05 19:42:38 ** along with 2 people jeonhyuk mtay
Note: You may cite this page as: http://www.citeulike.org/user/mehrbod

Result page: 1 2 3 Next

Create CiTO

Create a CiTO relationship by dragging the [CiTO] link onto another article.

Alternatively, drag two articles into the two boxes below. This is useful when the two articles are not on the same page - the articles will be remembered between pages.

This article...

...this one

Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.