![]() |
CiteULike | ![]() |
jorgevillaverde's CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
Document indexing in text categorizationMachine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on, Vol. 6 (2005), pp. 3792-3796 Vol. 6.
|
Reviews
[Write a review of this article]
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
AbstractAiming at the characteristic of text categorization, this paper proposes an improved method of computing term weights, tfidfie, based on the traditional tfidf function that is generally used in most classifiers. In comparison with the tfidf function, the tfidfie function adds an information entropy factor, H, which represents the distribution of documents in the training set in which the term occurs. The experiments show tfidfie outperforms tfidf. In addition, this paper analyses the difference of using information entropy factor H between document categorization and feature selection, also finds that both two phases are all necessary for text categorization, meanwhile it can reach the best performance of classification with up to 70% of the unique terms being removed.
BibTeX record
RIS record