Machine Learning and Data Mining
The overall goal of the data mining (DM) and machine learning (ML) is to extract knowledge (in a human-understandable structure) from large quantities of data i.e. examples that illustrate relations between observed variables. The automatic or semi-automatic DM/ML process analyzes data and discovers new patterns to extract previously unknown interesting patterns such as generalizations that can predict values of certain variable in previously unseen examples from the known values of other variables, interesting groups of data records, unusual records and dependencies in the data.
We are mainly focused on developing algorithms that produce human-readable and interpretable classifiers, developing algorithms and tools that enable a human expert to actively participate in the knowledge extraction process and adapting existing algorithms for specific applications.
Contact: R. Piltaver (firstname.lastname@example.org)
Doctoral Research Projects
- Constructing understandable and accurate classifiers using data mining algorithms
Rok Piltaver, supervisor: Matjaž Gams
Accuracy of a classifier is often the most important aspect of it’s quality and is measured using various well known matrices. On the other hand understandability (i.e. comprehensibility or simplicity) of a classifier is often treated as less important or even neglected. Nevertheless it is important for the users of the classifier as they trust it more if they can understand how the classifier works and because additional knowledge about the relations in observed data can be extracted by studying the classifier. Therefore some of the existing methods focus on understandability of learned classifiers or transforming non-understandable classifiers into human-understandable structure.
There is lack of algorithms that treat accuracy and understandability of classifiers as equally important, transforming the problem of constructing a classifier into multi-objective optimization problem. Such algorithms are especially important in domains where there are parts of attribute space that can be classified with high accuracy using understandable classifier(s) (e.g. decision trees, decision rules) and parts that require non-understandable classifiers (ensemble methods, ANN) to achieve required classification accuracy. Furthermore, producing a set of classifiers ranging from the most accurate to the most understandable (Pareto front of solutions in multi-objective optimization terminology) gives the user additional information about how challenging certain domain is for classification and enables informed decision regarding how much accuracy should be sacrificed to achieve desired accuracy or vice-versa.
The subject of the doctoral thesis is development and evaluation of data mining algorithms that treat accuracy and understandability of classifiers as equally important and as result output set of hybrid classifiers ranging from the most accurate to the most understandable.
- Behavior modeling by combining domain knowledge and machine learning
Violeta Mirčevska, mentor: Matjaž Gams, Mitja Luštrek
The aim of this dissertation is to develop a novel method for behavior modeling that leverages both existing domain knowledge (DK) and machine learning (ML). ML models and DK represent complementary sources of information. ML algorithms can discover characteristic domain patterns which may be too subtle for humans to detect, but they discover only patterns that are present in the training dataset. DK refers to general and specific knowledge humans have about a particular task as well as common sense, which may be related to examples not present in the available domain dataset. The combination of the two, if made in a proper way, would improve the reliability and robustness of the developed models. Three domains that benefit from behavior modeling are addressed in this dissertation: (i) adaptation of software applications to user needs – adaptation of the reporting level of business intelligence applications to better suit user information needs; (ii) modeling users for the purpose of detecting unusual behavior – learning everyday behavior of an elderly user in order to detect deviations related to health problems; and (iii) understanding and studying the behavior of agents in a multi-agent system – analyzing interactions of opposing groups of agents.