![]() |
CiteULike | ![]() |
ldietz's CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
A Latent Dirichlet Allocation Model for Entity Resolution |
Reviews
[Write a review of this article]
Notes for this articleUses a lda-like generative model, where
- a reference (mention) is generated by a Noise model perturbing the author's correct name -- p(reference | authornames, authorentity, noise model)
- for each reference an author entity a is generated from a author-mixture of frequently co-authoring groups z -- p(a | group z, author-mixture phi, dirichlet parameter beta)
- for each reference an latent group is generated from the document's groupmixture -- p(z|d, groupmixture theta)
- for each document the groupmixture theta is sampled from a dirichlet with prior alpha
The noise model incorporates for each of the tokens in a name reference
- dropping the token
- retaining the token (and perturbing it with string edits insert/delete/replace)
- intaling the token (using only the first letter)
The parameters for the noise model are updated every X iterations.
The conference version is here: http://www.citeulike.org/user/ldietz/article/1467629
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
BibTeX record
RIS record