Involved scientific members: Brauer, Hernández, Eldracher, Kinder, Brychcy

Partners: Siemens AG, München; DLR, Oberpfaffenhofen; Universität Dortmund

Project number: FKZ ITN9102B

Duration: 07/91 - 06/95

It was the objective of NERES to create intelligent systems, which
explore their natural environment and act within the
environment autonomously using multimodal sensory input.
The intention was to combine artificial
neural networks with some methods of symbolic AI.

Various neural approaches (such as causality detection, adaptive
subgoal genereration, system identification with combined networks
and trajectory generation with reinforcement learning)
could successfully be applied to simple environments.
It turned out, that these approaches could hardly be scaled for
more complex environments.
However the experiences of these studies have been used for the
developments of new, more powerful methods, which
combine neural (subsymbolic) elements with symbolic AI-techniques.
In the area of symbolic AI approaches for the qualitative
representation of multidimensional spatial knowledge were developed,
which preserve essential properties of the spatial domain. Also
cognitive aspects were included to be able to treat the problems of
user-supplied knowledge into the representation used by the
machine and vice versa.

Keywords: NERES, industrial robots, neural networks, artificial intelligence, qualitative reasoning, spatial reasoning, control, path planning.

Involved scientific members: Brauer, Weiß

Project number: Br609/5-1 and extension Br609/5-2

Duration: since 10/93

This project is concerned with the specification, implementation and analysis of learning algorithms for computational distributed agent systems. The central question addressed is how multiple agents can learn to coordinate their activities such that they become able to collectively solve tasks being too complex for being solved by a single agent. Issues of main interest are the following:

- multi-agent learning and organizational structures;
- multi-agent learning and problem solving;
- multi-agent learning and planning; and
- application of multi-agent learning algorithms in complex domains like load balancing in computer networks and automated manufacturing.

Involved scientific members: Schmidhuber, Hochreiter, Brauer

Project number: Schm 942/3-1

Duration: since 08-95

It is intended to apply formal definitions of the terms
``simplicity'', ``complexity'', ``low redundancy'' and
``information'' to artificial neural networks in order to improve
existing learning algorithms (to speed them up and to obtain
better generalization performance).

Main topics are:

- adaptive simplification of the net inputs (redundancy reduction of the input data to avoid overfitting, i.e., to improve generalization performance).
- adaptive simplification of the net itself (nets with lower complexity approximate the training data equally well).
- speeding up reinforcement learning algorithms by maximizing information gain (using classical information theory to develop experimental strategies of an ``active learning'' system in a Markovian environment).
- analysis of the derived algorithms (computation time, storage requirements), tests of large data sets, comparisons with alternative methods.

The intention of the cooperation agreements lie in the
interdisciplinary investigation of theoretical foundations as
well as various applications of artificial neural nets,
fuzzy logic, fuzzy control and genetic algorithms.

(Siemens ZFE, München)

Gerhard Weiß: 04/89 - 09/91

Sepp Hochreiter: 10/94 - 07/95

Dieter Butz: 10/94 - 06/96

Dirk Ormoneit: Since 01/95

This section describes the specific projects and fields of research the group has dealt with within the last five years, as well as the current research. Of course, not all areas are investigated with the same intensity.

Involved scientific members: Hernández, Freksa

We are exploring the qualitative approach to representation of spatial knowledge. Mainly we deal with reason maintenance (finding out by an analysis of typical inferences and an actual implementation, if it is worth maintaining justifications or cheaper to recompute the constraints), qualitative shape descriptions (based on 3-D shapes that originate from the spanning surface in between a bottom and a top primitive 2-D shape; complex shapes are described by stating the relative position and size of their component shapes, the object consistency, and the contact information; determining similarity of shapes with a matching algorithm), dynamical aspects (qualitative description of motion; building cognitive maps from dynamic exploration of the environment; how qualitative spatial knowledge may improve path planning), and qualitative representation of positional information (extending previous work to handle distances, which together with orientation provide means for reasoning about truly qualitative positional information).

Involved scientific members: Hernández, Freksa

In the field of temporal reasoning we are especially dealing with spatio-temporal models of the iconic memory and context sensitive temporal reasoning. We also developed an interactive, knowledge-based system for the examination of temporal reasoning processes. For this system we also were working on the representation of theories for uncertainty in order to give decision support in knowledge based systems. Furthermore we were working on serial and parallel aspects in the visual perception. further topics were the control of attention in cognitive real-time-processes and strategies for processing and learning of knowledge in perception and classification tasks. We were investigating how to build cognitive models of concepts like colours.

Involved scientific members: Eldracher, Kinder

In principle we observe four different possibilities to combine symbolic and sub-symbolic techniques: 1) Symbolic (e.g. rule based) approaches choose between alternative sub-symbolic approaches or choose parameters for the, or supervise e.g. the learning procedure. 2) Sub-symbolic methods do the pre-processing of input data (e.g. by classification) and thus choose symbolic techniques for further processing. 3) Sub-symbolic techniques provide the unknown qualitative structure of a complex domain. 4) A known qualitative structure of some domains is used to pre-structure sub-symbolic, neural network. We were using such combinations in our application domain robotics. E.g. the path planing techniques use combinations of adaptive sub-symbolic processing for pre-processing and construction of the underlying symbolic graph based planning machines. Another example is that after an automatic decomposition of a robots configuration space into cells, we label these cells with qualitative attributes, e.g. a cell is left of a certain obstacle. This information can be used afterwards to plan trajectories that fulfill certain qualitative constraints, e.g. staying left of an obstacle.

Involved scientific members: Eldracher, Kinder

Within robotics we examine the planning of kinematic trajectories for higher-dimensional, real-world manipulators. In order to do our research we developed a very flexible and comfortable manipulator simulation tool. In parallel we developed several adaptive approaches for kinematic trajectory generation that are based on neural network and graph techniques.

For single complex trajectories developed different planning techniques using reinforcement-learning, Q-learning, adaptive stochastic learning automatons and CMAC storages. However, it turned out that these techniques are too slow for real-word-applications. Better success is achieved combining simpler parts of trajectories instead of directly planning complex ones. Therefore two major fields of investigation are (1) the automatic generation of subgoals using neural networks and (2) the generalizing storage of already planned trajectories in a neural storage.

In the first field we developed several algorithms that combine simple sub-trajectories based on neural world models in order to generate a trajectory for arbitrary start and goal configurations. These algorithms include also techniques to combine the subgoals into a subgoal graph that can be used to plan arbitrary trajectories.

In the second field we construct a model of the world by using and combining the knowledge from already known example trajectories. This model afterwards can be used to plan arbitrary trajectories.

In both fields exploration techniques can be incorporated in order to improve the results in difficult areas. We are now able to plan collision free trajectories for arbitrary start-goal-combinations in some tenths of a second with less than ten minutes preprocessing for each new environment. We also can plan trajectories with these techniques for ten-dimensional manipulators. Besides these techniques are designed to adapt to slowly changing environments without requiring a complete recomputation of the world-models (i.e. the underlying graph structures). First results in dynamic environments also are very promising.

Altogether these two new algorithms are about equally good and clearly outperform most well known trajectory planning approaches.

Furthermore we examine the learning of inverse kinematics for environments with obstacles using improved growing cell structures and growing neural gas models. Our current interest in this field includes new models for generalizing sequential trajectory storages.

Involved scientific members: Eldracher, Butz

Within function approximation we deal with neural network approaches in comparison to multivariate spline methods.

One salient property of the networks we use is localized encoding, e.g. familiar as RBF-encoding (encoding with radial basis functions). In particular we investigate various modifications of combined one-dimensional encodings, such as distributed, continuous-valued, adaptive and growing input encodings. If these encodings for example are applied as a preprocessing step to the well known local function approximator CMAC, improvements could be achieved with respect to the system robustness (concerning sensitivity on parameter changes) and the final results despite consuming less training time and memory.

Furthermore we investigate function approximation with extended radial basis function networks. These offer also good results with drastically reduced memory consumption compared to the standard, fixed variance radial basis function networks.

Another focus lies in the development and exploration of so called `distributed input and output encoding` and distributed memory. Here the RBF encoding is considered as a distributed number representation scheme; in addition the output of the net is also construed as a distributed number representation scheme. As an immediate consequence the degrees of freedom of the approximation network are drastically increased. Results are encouraging utilizing these extended nets as `rapid function approximators` together with an intelligent, data-based initialization procedure.

It is our common intention to develop `neural network approximators` under the premise of their industrial applicability, which means that we are primarily considering problems from the industrial domain. Up to now we have been involved in several big research projects with strong industrial touch together with industrial partners; a new project (BMBF project ACON) has been accepted and will start in 03/96.

Involved scientific members: Schmidhuber, Hochreiter

Very good improvements are gained by different than the standard input encoding techniques. Against that the speed-up of a parallel implementation of backpropagation is only marginal on redundant real world data. Similarly, only minor improvements are achieved using adaptive, higher order transfer functions in the weights of a neural network. Furthermore we studied the favorable application domains for different numerical optimization techniques in neural learning as e.g. the Levenberg-Marquardt-Algorithm, the Quasi-Newton-Method or conjugate gradients. We also investigated the incorporation of prior knowledge into neural network architectures, e.g. at the example of rolling mills.

Besides we developed an algorithm that searches for a ``flat'' minimum of the error function: a large connected region in weight-space where the error remains approximately constant. An minimum-description-length-based, Bayesian argument suggests that flat minima correspond to low expected over-fitting. Although our algorithm requires the computation of second order derivatives, it has back-propagation's order of complexity, and automatically and effectively prunes units, weights, and input lines. From a more theoretical point of view we want to find useful formalizations of the terms ``simplicity''. ``complexity'', ``information'' and ``redundancy'' in order to use them into the terms of neural networks for their improvement (i.e. e.g. better generalization capability, quicker learning, etc.).

Involved scientific members: Schmidhuber, Hochreiter

Basic research deals with unsupervised techniques to train neural networks to produce redundancy poor, (near) factorial codes. These networks and their produced codes can be used as preprocessing units for statistical classifiers that only reach their theoretically proven optimal results, if statistically independent codes are used. However in real world applications most of the times we have very redundant codes, showing the importance of our work. In order to generate factorial codes we e.g. use a technique called predictability minimization. If predictability minimization is applied to encode real world photographs, interestingly receptive fields are automatically learned that also can be found in the visual cortex of mammals.

Research on a ``generally useful'' code yielded three MDL-inspired
criteria for such a code: (1) It conveys information about the input
data. (2) It can be computed from the data by a low-complexity
mapping. (3) The data can be computed from the code by a
low-complexity mapping. To obtain such codes, we train an
auto-associator with our recent *flat minimum search* technique
(FMS) for finding low-complexity nets. Depending on data and
architecture, this sometimes leads to factorial codes, sometimes to
local codes, and sometimes to sparse codes, but never to random codes
such as those generated by standard back-propagation.

Another direction of research goes into the field of information compression using neural networks. Applying a technique called chunking we developed techniques to compress written newspaper articles, we outperform the classical, general compression techniques. Furthermore we used these compression techniques to speaker-independently classify spoken numbers.

Involved scientific member: Schmidhuber

We are interested in environment independent reinforcement acceleration with self-incremental and self-referential algorithms in non-Markovian and non-deterministic environments. Here a system learns to change its own learning algorithm in order to get the maximum reinforcement within its one, single life cycle. This method overcomes universal search by shifting its inductive bias and, therefore, influencing the search space in a reasonable way. The learning system enforces a steadily growth of the payoff per time ratio which is an advance to previous learning algorithms.

Involved scientific member: Weiß

Within the multi-agent area we concentrate on the field of multi-agent learning and adaptation. Our main focus is on useful overall system behavior resulting from the concerted interaction of several logically or physically distributed agents which in some sense can be called intelligent and autonomous. Our current work concerns the following issues:

- unique requirements for learning in multi-agent systems;
- transformation of single-agent learning approaches to multi-agent learning approaches;
- learning to improve coordination and cooperation in multi-agent systems;
- adaptive distributed planning by multiple agents; and
- organizational structuring and self-organization in multi-agent systems.

As domains of application we have chosen (simulated) distributed computer and manufacturing systems and the tasks of load balancing and scheduling.

Involved scientific member: Hofmann

Within the field of genetic algorithms we investigated the capability of pure genetic implementations for e.g. pole balancing. Furthermore we did research on the capability of genetic algorithms for creating genotypes of neural networks that are afterwards trained using well known algorithms (e.g. backpropagation). Besides we applied genetic algorithms to the generation of new rules in classifier systems. Furthermore we examined the algebra of genetic algorithms.

Involved scientific member: Deco

Feature extraction is one of the principal goals of unsupervised learning. In biological systems this is the first step of the cognitive mechanism that enables processing of higher order cognitive functions. We concentrate on an information theoretic approach to the problem of unsupervised learning.

A classical method for addressing the problem of linear feature extraction is the well known statistical tool, Principal Component Analysis (PCA). Information theory formulations of the PCA related problems of output decorrelation and optimal reconstruction are addressed. The so called Infomax principle is based on the fact that the optimal reconstruction after dimension reduction corresponds to the minimum loss of information. Consequently, the optimal data compression is achieved by maximizing the transmission of information between the input and the output of the transformation.

The standard PCA method based on covariance matrix diagonalization can be generalized by formulating the feature extraction problem in the most general way. Feature extraction is defined as Independent Component Analysis (ICA) where independence is formulated in the statistical sense. An information theory based formulation of ICA is presented for the case of arbitrary input probability distributions and arbitrary, possibly nonlinear statistical dependence. We introduced a parameterization of deterministic nonlinear maps whose architecture guarantees invertibility and volume preservation. These conditions are required for the factorial learning which minimizes the ICA criteria previously defined. The input-output maps have triangular structure whose diagonal elements are parameterized by an arbitrary neural network while the direct input-output connection guarantees that the corresponding Jacobian has a determinant equal to one. Furthermore, it is shown that a successive combination of the volume preserving transformations is volume preserving itself. This is the first theoretical formulation and implementation of nonlinear ICA.

These methods are used to avoid over-fitting, e.g. with specialized pruning techniques, or to allow higher order statistical decorrelation without information loss.

Involved scientific member: Ormoneit

Within time series forecasting we use density estimating neural networks and focus on (1) Bayesian methods to improve Gaussian mixture estimates as well as on (2) properties of regression estimates derived from density estimating neural networks. Furthermore we investigate second order methods for exploiting the generalization power of flat minima and pruning networks. Another important topic is, is how to deal with long time lags, where we proposed algorithms that can cope with time lags that are too long for all known algorithms. Our application domain in this field is mainly financial forecasting.

Involved scientific members: Geiger, Kinder

We worked on the possibility to convert fuzzy rule bases into neural networks and vice versa. Since it is often relatively easy to formalize knowledge using fuzzy sets, this offers on one hand the possibility to fine tune these rules. On the other hand we can train knowledge into neural networks that up to know can not be formulated as rules. After extracting the fuzzy rule base from the neural network, we can interpret such knowledge as fuzzy rules. In this way we can extract rules that may be hidden in the data. We applied these techniques to the steering of (simple models of) airplanes (i.e. we were steering a flight simulator using a fuzzy set) and cars, as well as in mining applications. Different applications using rule extraction from a special radial basis function network include optical character recognition (OCR), and a juristic consulting system.

Involved scientific members: Geiger, Waschulzik

We work on (size-, rotation- and translation-) invariant recognition of objects and segmentation of scenes. More theoretical work concentrates on new algorithms applying so called large receptive fields to theoretical problems as classifying rectangles from ellipsoids. More practical work is done in applications as e.g. the segmentation of computer tomographies using simulated annealing. Both theoretical and practical work is needed for segmentation in computer tomographies using synchronized activities in single spike networks.

Involved scientific members: Sturm, Eder, Geiger

Fundamental questions about the recognition of patterns and the way of communication between neurons in the cortex are studied by extending the theory of synfire chains (Abeles, Bienenstock). In addition to detailed simulation experiments (in particular concerning the synchronization of chains), an application to translation invariant pattern recognition is developed.

Involved scientific member: Eder

Within this area we developed a model that allows to produce synchronization in single spike-models (with and without using oscillating neurons). Furthermore we concentrated on the representation of temporal information as well as learning algorithms in single-spike models. In the application area of language processing, namely the understanding of spoken sentences, we dealt with the generation and decision between several hypotheses.

Involved scientific member: Scheler

We investigated the application of artificial neural network algorithms to linguistic problems. Mainly we concentrated on the relation of morphological categories to semantics and cognition in the area of aspect and tense, and spatial prepositions/spatial relations. Our main goal was to define learning approaches to both generation and interpretation of linguistic categories. We succeeded with an approach that uses sets of semantic features, which can be linked to cognition via logical definitions for the atomic features, and which can be used to learn functional mappings to output categories. Accordingly, interpretation and generation of linguistic categories can be recast as pattern classification and feature extraction problems.

Involved scientific member: Ruge

Methods are developed to automatically extract semantic features of words from large linguistic corpora. And important distinction to other methods of corpora analysis is that our methods are not simply based on statistics but that they are based on a mathematical theory of word semantics which combines linguistic theory (in particular model theoretic semantics and dependence grammars), statistical considerations (in particular from the field of information retrieval) and ideas from cognitive psychology. The main result, up to now, is that head/modifier relations are relevant linguistic entities, i. e. that words whose heads and modifiers are almost the same are semantically closely related. This result was successfully applied to problems of information retrieval. A more general theory is being developed.

Involved scientific members: Various members of our group and of the associated groups

We concentrate on medical applications (analysis of CTs, cancer diagnosis, data analysis, and the development of an integrated fuzzy-expert-system), speech processing (e.g. speaker independent speech recognition, text compression), quality control (prediction of the stability of bridges, classification of paving stones), remote sensing (analysis and prediction of geophysical parameters), control (robotics, automatic car drivers, steering a car), prediction of share prices, load balancing in parallel computers, production program generation in manufacturing problems, and others. A systematic way for the construction of feedforward neural networks with supervised learning for (in particular medical) applications (the SENN method) has been developed and tested with several applications; the method particularly considers the aspects of data preparation, the possibilities of watching and influencing the training process and the ways of reinterpretation of the knowledge contained in a trained net.

Wed Nov 20 12:16:55 MET 1996