PhD Thesis

Constructing understandable and accurate classifiers using data mining algorithms

Rok Piltaver, supervisor: Matjaž Gams
Accuracy of a classifier is often the most important aspect of it’s quality and is measured using various well known matrices. On the other hand understandability (i.e. comprehensibility or simplicity) of a classifier is often treated as less important or even neglected. Nevertheless it is important for the users of the classifier as they trust it more if they can understand how the classifier works and because additional knowledge about the relations in observed data can be extracted by studying the classifier. Therefore some of the existing methods focus on understandability of learned classifiers or transforming non-understandable classifiers into human-understandable structure.
There is lack of algorithms that treat accuracy and understandability of classifiers as equally important, transforming the problem of constructing a classifier into multi-objective optimization problem. Such algorithms are especially important in domains where there are parts of attribute space that can be classified with high accuracy using understandable classifier(s) (e.g. decision trees, decision rules) and parts that require non-understandable classifiers (ensemble methods, ANN) to achieve required classification accuracy. Furthermore, producing a set of classifiers ranging from the most accurate to the most understandable (Pareto front of solutions in multi-objective optimization terminology) gives the user additional information about how challenging certain domain is for classification and enables informed decision regarding how much accuracy should be sacrificed to achieve desired accuracy or vice-versa.
The subject of the doctoral thesis is development and evaluation of data mining algorithms that treat accuracy and understandability of classifiers as equally important and as result output set of hybrid classifiers ranging from the most accurate to the most understandable.

Click here to access documents related to the work on my PhD thesis.