Gennady L. Andrienko

GMD - German National Research Center for Information Technology
Schloss Birlinghoven, Sankt-Augustin, D-53754 Germany
http://allanon.gmd.de/and/and.html
E-mail: gennady.andrienko@gmd.de
Tel: +49-2241-142329
Fax: +49-2241-142072


1. My goals on this meeting

My (G.Andrienko) possible participation in the meeting has 3 goals:

I would demonstrate (using WWW) the latest progress in the development of our system Descartes for intelligent support of visual data analysis and discuss possible ways of its further advancement.


2. Position statement

1. Overview of the Descartes system

Our system Descartes (see Internet variant at http://allanon.gmd.de/and/java/iris/) is intended to help users to comprehensively analyze various spatially referenced attribute data. To meet these objectives, we develop in the system the following capabilities:

  1. Automated knowledge-based generation of maps that correctly (i.e. in compliance with the principles of graphical and cartographic design) represent user-selected data variables.
  2. Automated generation of non-cartographic data displays (dot plots, scatter plots, box-and-whiskers plots) supplementing the maps. All displays, cartographic and non-cartographic, produced by the systems are linked with each other.
  3. Interactive facilities to manipulate the map displays that dynamically change their appearance in response to user's actions. These dynamic techniques are designed so as to promote revealing interesting patterns in data distribution.
  4. Necessary data manipulation functions: database querying, calculations, formation of derived attributes.
It is supposed to enhance the system by developing the ability to intelligently assist the user in the utilization of the offered instruments of data analysis. This will be done on the basis of knowledge about conceptual structure of a database under analysis and the underlying problem domain. This kind of knowledge is already used in Descartes as its consideration is indispensable for adequate map design. So, adding the new capability essentially means developing the potential of the system by extending the usage of knowledge about data.

2. Related research

A pioneering work in automated knowledge-based visualization design was done by J.Mackinlay. His software system APT can encode data variables, according to their types and cardinality, by J.Bertin's visual variables and construct graphical displays combining these visual variables. This approach was adapted by F.Zhan and B.Buttenfield for selection of an appropriate cartographic presentation method for one spatially referenced data variable. Later V.Jung developed the system Vizard capable of automated mapping of several independent variables. Vizard accounts for not only data characteristics but also user's objectives though the latter are indicated in terms of predefined generic tasks being either rather primitive or rather abstract: lookup, locate, compare, see distribution.

Descartes takes into account data characteristics and conceptual relationships among data variables. For example, the system can "understand" that the database fields with numbers of female population from 0 to 14 years, female population from 15 to 64 years, male population from 0 to 14, and so on, essentially refer to one and the same variable "population number" measured for different age and sex groups, and that these groups are parts of the whole population. This kind of knowledge allows grounded selection of particular presentation techniques such as maps with pie charts or segmented bars. The same knowledge can be effectively used to guide the user in data analysis through communicating with her/him about her/his objectives in domain-specific terms rather than on an abstract level. On this potential we intend to base the further advancement of Descartes.

The use of direct manipulation techniques for visual data exploration was originally proposed in statistics by W.Cleveland. The most widely known is his idea of visual linking of several graphical displays by means of brushing. M.Monmonier suggested to apply this technique to maps linked with non-cartographic displays. Later the idea of linking between different maps and other graphics was implemented by J.Dykes in his CDV system. CDV also offers facilities for interactive change of map symbolism, investigating contiguity relationships, and some others. It's worth saying that interactive tools for changing presentation parameters with the aim of making maps more expressive was proposed by T.Yamahira et al. much earlier than the notion of dynamic displays emerged. These researches developed a histogram interface for selection of intervals for a classed choropleth map. Later S.Egbert and T.Slocum considered interactive classification as exploratory task.

A well-known group of dynamic manipulation techniques is devoted to database querying: the user is given convenient graphical widgets to alter query conditions and can immediately observe corresponding changes in graphical presentation of search results ("Dynamic Query" proposed by B.Shneiderman and S.Ahlberg, "Attribute Explorer" and "Influence Explorer" by H.Dawkes et al.).

Descartes offers a number of interactive exploratory techniques:

Our special interest is in designing dynamic manipulation techniques being inherently connected with map symbolism and enhancing expressive capabilities of presentation methods they address.

In connection with the user guidance we intend to implement the earlier mentioned Vizard system can be referred to. This system not only designs maps but also explains why this or that solution is proposed and which opportunities for analysis it offers. However, the parts concerning analytical opportunities are merely general descriptions of cartographic visualization methods with no regard to user's specific data and goals. Our plan is to guide the user by proposing her/him a number of analysis scenarios specifically allowed by data at hand. Such scenarios are automatically constructed on the basis of system's knowledge about the data and the underlying problem domain, about potential capabilities of different presentation methods, about available dynamic manipulation techniques and other system functions.

3. User guidance: why and how.

Comprehensive data analysis usually requires quite a number of operations with data and their display. Accordingly, the functions and facilities available in Descartes are numerous. This means that the user should learn them and always keep in mind. Further, a rather long sequence of operations is often needed to proceed from source data to a useful presentation. For example, it may be necessary to transform absolute values to percentages, calculate differences or ratios, filter database records, etc. We intend to "wrap" such operation sequences into analysis scripts presented to the user as various analytical tasks formulated in terms of analyzed data and domain notions. These scripts will, first, simplify the acquaintance with the system and release the users from memorizing its capabilities and, second, save time and efforts of even experienced users.

The following example explains our idea. Suppose that a dataset under analysis contains earlier cited fields with absolute population number in sex-age population groups for different countries. The system can foresee several analytical tasks that can be done with the use of these data: "study how sex structure varies depending on age", "study how age structure varies depending on sex", "study sex (or age) structure across countries irrespective of age (or sex)", "examine a particular age group", etc. These or similar formulations are proposed to the user as alternatives to select from. Standing behind each task is a sequence of operations resulting in potentially useful presentation or several presentations and, possibly, some recommendations how to use them and how to proceed further.

Suppose that the user has selected the first "task", study of dependency of sex structure on age. In response the system automatically calculates percentages of male and female in all age groups and creates a map with segmented bars: bars correspond to age division, and segments show proportions of male and female. Note that automation of calculating percentages and selection of this type of presentation really requires knowledge of conceptual relationships among fields.

Displaying the map to the user, the system supplies it with a brief comment explaining that this map is suitable for seeing local differences in sex structure depending on age in each country or for pairwise comparison of countries. It does not help in seeking for spatial patterns and trends. Thus the system offers as a direction for further investigation to take separately male or female percentages and consider their spatial distributions for different ages. Alternatively, the user may be proposed to concentrate on studying differences in percentages of male and female population depending on age. For the first task a series of choropleth maps would be suitable. In the second case the system would automatically calculate the differences and represent them by bar chart map. At the next step the system may propose the user to study spatial distributions of differences for the age groups.

User guidance applies also to the utilization of dynamic manipulation facilities for data analysis. Again, the system can help the user not only by a general description of this or that tool ("static" on-line help) but also with some data- and analysis context-specific recommendations. For instance, if in the course of analysis a ratio of two numeric fields was calculated and presented, the system can propose to apply visual comparison with the value 1; for a difference of two fields visual comparison with 0 is reasonable. In both cases the map will change so that the geographical objects will be visually classified into 3 groups: 1) field1<field2; 2) field1=field2; 3) field1field2. The system can also automatically detect cases when dynamic outlier removal is necessary and propose the user to do this.

It should be noted that the use of guidance is optional: the user does not have to analyze data according to proposed scenarios. S/he always has the possibility to apply any of the available functions in any order. This is important as we cannot guarantee that it is possible to foresee all imaginable analysis tasks. Yet, since the guidance is proposed stepwise the scripts may occur to be useful for partial automation of rather sophisticated investigations.

In guiding the user the system utilizes the following kinds of knowledge:

A) Generic analysis tasks such as "Local comparisons of values of attributes", "Looking at spatial distribution of values of an attribute", "Local consideration of proportions" etc. The tasks may have applicability conditions. For example, the latter task is meaningful for a set of data fields that together constitute a meaningful whole. Unlike the generic tasks in the Vizard system, our tasks are patterns rather than simply abstract statements. The patterns have slots filled with appropriate domain notions when the system proposes analysis scenarios to the user.

B) Knowledge about methods of cartographical and graphical presentation available in the system: which generic analysis tasks are enabled by each of the methods. For example, "Parallel bars" "Local comparisons of values of attributes", "Choropleth map" "Study spatial distribution of values of an attribute", "Scatter plot" "Look for relationships between two attributes". Some presentation methods offer different opportunities depending on data they applied to. For example, "Pie charts"/absolute quantities "Local consideration of proportions", "Comparison of totals"; "Pie charts"/percentages "Local consideration of proportions", "Comparison of proportions for pairs of geographical objects".

C) Knowledge about potentially useful operations with data: for what generic tasks they can be applied and how to perform each operation with the use of available functions. An example of such an operation is proceeding from absolute values to percentages. This operation is helpful, in particular, in the task of studying proportions (other variants of application are also possible). It is performed with the use of the calculation function of the system.

D) Knowledge about dynamic manipulation facilities available in the system: possible ways of use depending on the analysis context. Here belong the earlier mentioned heuristics about visual comparison with 1 for calculated ratios and with 0 for calculated differences. Another example concerns the application of dynamic classification tool for investigating relationships between one attribute selected as a base of classification and some other attributes for that class statistics is calculated and displayed. A reasonable strategy is to try to increase the number of classes and move class boundaries to probe the robustness of the demonstrated relationship, if any.

E) Knowledge about data and underlying problem domain. This knowledge, besides selection of proper visualization methods, allows to formulate analysis tasks in a way easily understandable by the user. Thus, the generic task "Local estimation of proportions" may have a formulation "Consider proportions of age groups 0-14 years, 15-64 years, 65 and more years in population of each country of Europe" or "Consider proportions of classes of industry X, Y, ..., Z  in overall industrial product of main cities of Germany", depending on the application domain. The knowledge about data is used in automatic application of such system functions as calculations, querying, classification according to the pursued analysis scenario.

The utilization of these kinds of knowledge for generating guidance proposals on different steps of user's work may be governed by rules with following structure:
  IF [applicability conditions] THEN [recommendation],
where
[applicability conditions] may include one or more of the following:
    a) required data characteristics and relationships;
    b) characteristics of currently considered presentation;
    c) currently pursued generic task;
[recommendation] may be either one or more generic tasks to proceed to or a hint concerning the use of dynamic map manipulation facilities.

Conclusions

Presentation on maps with following visual investigation plays a very important role in analysis of spatially referenced data. We offer an environment that supports the analysis by automation of map generation. Furthermore, the generated maps are not mere static pictures but subject to manipulation and can dynamically change that potentially can make interesting features of data distributions more prominent.

In map design the system relies upon conceptual knowledge about data under analysis. Such knowledge need not to be very extensive, but for each application of Descartes a formalized description of the application domain (relevant notions and relationships IS-A, PART-OF among them) and the database structure (correspondence of database fields to domain notions) should be provided. The utilization of domain knowledge can be substantially extended. We have shown that on the basis of this knowledge the system can offer an intelligent guidance to the user in the course of data analysis.

The dynamic map manipulation facilities available in the system are rather innovative, and therefore there is a probability that even people experienced in the use of maps (or GIS) for data analysis will not try to actively use them. Therefore we consider it necessary also to give the user apt hints concerning the employment of the dynamic facilities in analysis.

Though it is impossible to guarantee interesting findings in any data, we believe that further development of the intelligent capabilities of the system will make it more helpful as an environment for visual data exploration.


3. Short biography and selected publications

Dr. Gennady Andrienko received a Diploma in Computer Science from the faculty of Cybernetics of Kiev State University in 1986, and Ph.D. equivalent in Computer Science from Moscow State University in 1992. He worked on knowledge-based systems at the Mathematics Institute of Moldavian Academy of Sciences (Kishinev, Moldova), then at the Institute on Mathematical Problems in Biology of Russian Academy of Science (Pushchino Research Center, Russia). He also worked as assistant professor at Pushchino State University conducting a course on GIS. In 1995 and 1996 he visited GMD as guest researcher. Since July 1997 he has a research position at GMD. He will act as a technical manager at CommonGIS project (Common Access to Geographically Referenced Data) accepted for 30 months funding in Esprit Programme "Information Access and Interfaces". His research interests and experiences are automated knowledge-based cartographic visualization, visual geo-data exploration, and knowledge-based systems.

Selected publications:

  1. Knowledge-Based Support for Visual Exploration of Spatial Data. Extended Abstracts of Int. Conf. CHI’97 (Atlanta GA), ACM Press, pp.16-17
  2. Intelligent Cartographic Visualization for Supporting Data Exploration in the IRIS System. Programming and Computer Software, 1997. v.23(5), pp.268-282
  3. IRIS: a Tool to Support Data Analysis with Maps. Proceedings of Interop’97: International Conference on Interoperating Geographical Information Systems (Santa-Barbara, CA, December 3-4, 1997), NCGIA, pp.215-226 (to be published in M.Goodchild, M.Egenhofer, R.Fegeas, and C.Kottman (eds.) Interoperating Geographic Information Systems, Kluwer, 1998)
  4. Intelligent Visualization and Dynamic Manipulation: Two Complementary Instruments to Support Data Exploration with GIS. Proceedings of AVI'98: Advanced Visual Interfaces Int. Working Conference (L'Aquila – Italy, May 24-27, 1998), ACM Press, pp.66-75
  5. Interactive Maps for Visual Data Exploration. Paper presented at ICA Visualization Commission Meeting, May 21-24, Warsaw, Poland. URL http://allanon.gmd.de/and/icavis/ (to be submitted to Int.J.GIS)
  6. Dynamic Categorization for Visual Study of Spatial Information. Programming and Computer Software, 1998, v.24(3), pp.108-115
  7. AFORIZM approach: creating situations to facilitate expertise transfer. In L.Steels, G.Schreiber, W.Van de Velde (Eds.) EKAW’94: A Future for Knowledge Acquisition. Lecture Notes in Artificial Intelligence. Springer-verlag, 1994. v.867, pp.244-261
  8. Information retrieval and presentation in multimedia systems: Knowledge-based approach. Programming and Computer Software, 1996. v.22(1), pp.45-52
  9. Knowledge base framework for retrieval of multimedia ecological information. Proc. of the Int. Conf. INTERNET Applications and Electronic Information Resources in Forestry and Environmental Sciences. European Forest Institute. Joensuu, Finland. 1-5 August 1995, EFI Proceedings No. 10, 1996, pp.109-119
Papers 1-3 describe the overall architecture of the Descartes (formerly IRIS) system and focus on knowledge-based visualization design. Papers 4-5 focus on interactive manipulations with maps (mostly visual comparison and classification techniques). Paper 6 describes in detail our approach to interactive dynamic classification. Papers 7-9 reflect our previous work in knowledge engineering and knowledge-based information retrieval which has a substantial influence on our current research agenda.


This page is currently maintained by Varenius Workshop Webmaster
Last Updated: Sep. 30, 1998