Springe zum Hauptinhalt
Fakultät für Mathematik
Fakultät für Mathematik
Costantea, Ioana; Bot, Radu Ioan; Wanka, Gert : Patent Document Classification Based on Mutual Information Feature Selection

Costantea, Ioana ; Bot, Radu Ioan ; Wanka, Gert : Patent Document Classification Based on Mutual Information Feature Selection


Author(s):
Costantea, Ioana
Bot, Radu Ioan
Wanka, Gert
Title:
Patent Document Classification Based on Mutual Information Feature Selection
Electronic source:
application/pdf
Preprint series:
Technische Universität Chemnitz, Fakultät für Mathematik (Germany). Preprint 11, 2004
Mathematics Subject Classification:
62H30 [ Classification and discrimination; cluster analysis ]
68T50 [ Natural language processing ]
90C46 [ Optimality conditions, duality ]
Abstract:
We describe a supervised text classification approach based on a greedy feature selection method, which uses a support vector machine (SVM) classifier. As feature selection method we use the mutual information. This measures the quantity of information about the categories contained by the words. To train and test the algorithm we used patent documents from the US Patent Classification System. Average break-even point (BEP) for some US Classes is reported as conclusion.
Keywords:
Supervised Classification; Support Vector Machines; Mutual Information; Patent Classification
Language:
English
Publication time:
8 / 2004