Partner System Group: Data Mining Activity

The Partner System group of the Institute for Problems of Information Transmission, Russian Academy of sciences, concentrates on the Partner System R&D project. Partner System (PS) is a kind of information and knowledge processing technique that:

PS R&D involves techniques of cognitive psychology, knowledge processing, data analysis and data mining. PS group activity in Data Mining bases on its 30 years experience in pattern recognition. A bunch of methods developed have been successfully applied to pen recognition, a set of complex problems in clinical medicine, engineering. Since 1993 PS group deals with the problems of market analysis, including direct mail optimization; revenue modeling; classification of clients; prediction of client behavior for banking and insurance. Results were always competing, and in many cases the best in comparison to conventional methods.

PS group R&D efforts in Data Mining concentrate on the following directions:

PS group R&D efforts answers to the following challenges:
  1. non-numeric and mixed type data
  2. massively incomplete data: lot of missing values, irregularly distributed
  3. automatic scalability: the same method applies to data sets which sizes differ hundreds and thousands of times
  4. high dimensionality: hundreds of features involved
  5. sharp hyperbolic distribution of important econometric variables (makes conventional statistic tools inadequate)

Knowledge guided data analysis

Data analysis and knowledge discovery results may be trivial or of low interest if the process is not guided by the existing problem oriented knowledge. There are two ways to provide knowledge to guide data analysis: organize data analysis as an interactive process with the participation of application field expert, who will apply his implicit knowledge while guiding the process develop data analysis tools that take advantage of existing problem oriented knowledge bases PS group develops both approaches.

Interpretable Data Analysis

Good interpretation of the data analysis results means, the user incorporates these results into his problem oriented knowledge. If the knowledge base is maintained, this can lead to knowledge cumulating. Most conventional methods do not allow straightforward and easy user interpretation either due to mathematical form of results which is strange for problem oriented knowledge (regression, discrimination analysis), or due to huge size of model description (decision trees, neural networks).

Basing on the experience in different application fields and the research in cognitive psychology, Partner System group suggested the so-called syndromes framework. Syndrome is a network of simple threshold decision elements. It proved to be easy conceivable by user and flexible enough to represent sophisticated models. Graphic visualization tools help user browse through syndrome networks.