An extension of Allen's approach to interval-based temporal reasoning is presented. The new method allows for temporal and spatial reasoning on the basis of incomplete or imprecise knowledge of the kind that is available from inference and perception processes. The central idea of the representation method is the structuring of knowledge according to the conceptual neighborhood of temporal and spatial relations. This representation allows for integration of coarse and fine knowledge. Logical reasoning on the basis of such knowledge therefore takes place within a unified scheme. The method presented not only is more efficient than Allen's method, it also is more `cognitively adequate' in comparison with previous approaches.
Previous algorithms for supervised sequence learning are based on dynamic recurrent networks. This paper describes alternative gradient-based systems consisting of two feed-forward nets which learn to deal with temporal sequences by using fast weights: The first net learns to produce context dependent weight changes for the second net whose weights may vary very quickly. One advantage of the method over the more conventional recurrent net algorithms is the following: It does not necessarily occupy full-fledged units (experiencing some sort of feedback) for storing information over time. A simple weight may be sufficient for storing temporal information. Since with most networks there are many more weights than units, this property represents a potential for storage efficiency. Various learning methods are derived. Two experiments with unknown time delays illustrate the approach. One experiment shows how the system can be used for adaptive temporary variable binding.
This paper addresses the problem of meaningful hierarchical adaptive decomposition of temporal sequences. This problem is relevant for time-series analysis as well as for goal-directed learning. The first neural systems for recursively chunking sequences are described. These systems are based on a principle called the `principle of history compression'. This principle essentially says: As long as a predictor is able to predict future environmental inputs from previous ones, no additional knowledge can be obtained by observing these inputs in reality. Only unpredicted inputs deserve attention. The focus is on a 2-network system which tries to collapse a self-organizing multi-level predictor hierarchy into a single recurrent network (the automatizer). The basic idea is to feed everything that was not expected by the automatizer into a `higher-level' recurrent net (the chunker). Since the expected things can be derived from the unexpected things by the automatizer, the chunker is fed with a reduced description of the input history. The chunker has a comparatively easy job in finding possibilities for additional reductions, since it works on a slower time scale and receives less inputs than the automatizer. Useful internal representations of the chunker in turn are taught to the automatizer. This leads to even more reduced input descriptions for the chunker, and so on. Experimentally it is shown that the system can be superior to conventional training algorithms for recurrent nets: It may require fewer computations per time step, and in addition it may require fewer training sequences. A possible extension for reinforcement learning and adaptive control is mentioned. An analogy is drawn between the behavior of the chunking system and the apparent behavior of humans.
Much of the recent research on adaptive neuro-control and reinforcement learning focuses on systems with adaptive `world models'. Previous approaches, however, do not address the problem of modeling the reliability of the world model's predictions in uncertain environments. Furthermore, with previous approaches usually some ad-hoc method (like random search) is used to train the world model to predict future environmental inputs from previous inputs and control outputs of the system. This paper introduces ways for modeling the reliability of the outputs of adaptive world models, and it describes more sophisticated and sometimes much more efficient methods for their adaptive construction by on-line state space exploration: For instance, a 4-network reinforcement learning system is described which tries to maximize the future expectation of the temporal derivative of the adaptive assumed reliability of future predictions. The system is `curious' in the sense that it actively tries to provoke situations for which it learned to expect to learn something about the environment. In a very limited sense the system learns how to learn. An experiment with a simple non-deterministic environment demonstrates that the method can be clearly faster than the conventional model-building strategy.
The fixed-size storage learning algorithm for fully recurrent continually running networks (e.g. (Robinson and Fallside, 1987), (Williams and Zipser, 1988)) requires O( ) computations per time step, where n is the number of non-input units. We describe a method which computes exactly the same gradient and requires fixed-size storage of the same order as the previous algorithm. But, the average time complexity per time step is O( ).
This introduction into Artificial Life (written in German) deals with the central question of the differences in the main structures and processes between lively and unlively systems. The central strategy consists in the modeling of exactly these structures and processes out of pre-specified primitives of forms and functions. The central working hypothesis of Artificial Life claims that not the elementary parts for themselves, but much more their interaction and cooperation in a coherent system is essential for the living. At this the exact analysis of certain biological life-forms ("life as it is") is of minor importance compared to the variations of the basic primitives for potentially possible living structures and processes ("life as it could be").
A generalization of Allen's interval-based approach to temporal reasoning is presented. The notion of 'conceptual neighbourhood' of qualitative relations between events is central to the presented approach. Relations between semi-intervals rather than intervals are used as the basic units of knowledge. Semi-intervals correspond to temporal beginnings or ending of events. We demonstrate the advantages of reasoning on the basis of semi-intervals: 1) semi-intervals are rather natural entities from both a cognitive and from a computational point of view; 2) coarse knowledge can be processed directly; computational effort is saved; 3) incomplete knowledge about events can be fully exploited; 4) incomplete inferences made on the basis of complete knowledge can be used directly for further inference steps; 5) there is no trade-off in computational strength for the added flexibility and efficiency; 6) for a natural subset of Allen's algebra, global consistency can be guaranteed in polynomial time; 7) knowledge about relations between events can be represented much more compactly.
In the field of Artificial Intelligence different mathematical formalisms have been adopted so far to allow for the modeling of natural inferencing mechanisms. Standard logic has been extended into the fuzzy logic, relation theory is used in constraint systems.
All these systems share the property of being directly based on the set of real numbers. For example fuzzy logic allows the representation of fuzzy values but only by describing them through a function F:R->R that returns for every value x the fuzzy grade f(x). This, paradoxically, you end up representing fuzzy knowledge using real numbers without potentially infinite precision and resolution.
In most symbolic constraint propagation systems the underlying variables are taken to be qualitative, i.e. their value set is the set of real numbers, even if one does not care for the exact value. The usage of quantitative values for the variables is typically not supported. Some widely used systems even translate qualitative constraints into numeric ones.
This report (written in German) describes a method to represent such values qualitatively. It is based on cognitive requirements and constraints that lead into a system using positive absolute values, together with sign attributes, and an extended half order representation. This system can be used in a stand alone fashion or combined with existing ones.
Carving up an input space into hyperquadrics (typically hyper-ellipsoids or just hyper-spheres), as it is done by localized receptive fields (Moody) or Hyper Basis Functions (Poggio et al.) in current approaches to computationally more efficient and mathematically better founded network architectures, suffers in practice from the severe drawback that as soon as the input's dimensionality is higher, it is becoming increasingly less feasible to cover the whole space with units of only local relevance. We argue for a network architecture, that augments the familiar frameworks by an essentially space-filling partitioning-to-one while preserving most of its locality properties with efficient ball tree data structures still being usable. Specifically, we develop a theoretical foundation for this normalization, and show by recasting the problem of approximation in Bayes decision theory (a) that it is guaranteed to be the optimal output function and (b) how a priori knowledge that goes beyond mere smoothness can be readily incorporated. Experimental results produced as a part of a project "neural control for hot line rolling mills' (high-dimensional and non-linear) indicate that our method achieves one or two orders of magnitude higher accuracy than optimally-tuned standard algorithms such as sigmoidal backpropagation and improve a state-of-the-art analytic model.
Learning in intelligent-agent systems typically works on the principle of action-orientation. In this paper we show that this kind of action- oriented learning can be also realized in classifier systems. In particular, we introduce the action-oriented bucket brigade (ABB), a modified and extended version of the standard bucket brigade (BB). Starting from a general overview of current approaches to learning in multi-component systems, first we describe the principle of action-orientation and motivate its transfer to learning in classifier systems. Next, we introduce the ABB and present its action-oriented mechanisms of bidding, firing, and strength adjustment. Finally, we focus on the differences between the ABB and the BB.
StoL provides simple means for literate programming in SCHEME. It helps you to improve the external presentation of your ideas by producing high quality typesetted output - containing both ``as is'' code and formatted comments - from unmodified (or only slightly modified) sources.
This paper explores an action-oriented perspective of learning in classifier systems. Three variants of the bucket brigade (BB) and the profit-sharing plan (PSP) are presented that operate on the action level in the sense that their bidding, firing and strength-modification processes are guided by the actions of the classifiers: the action-oriented bucket brigade 1 (ABB1), the action-oriented bucket brigade 2 (ABB2), and the action-oriented profit-sharing plan (APSP). The results reported in this paper indicate that the action-oriented perspective leads to an improved learning in classifier systems. Section 1 gives a brief introduction to classifier systems and motivates an action-oriented perspective of learning. Sections 2, 3 and 4 describe the ABB1, the ABB2 and the APSP, respectively, and compare them with the BB and the PSP. Section 5 shows an empirical analysis of the action-oriented algorithms. Section 6 concludes with a brief summary of the reported results and some general considerations on learning in adaptive multi-component systems.
This report (written in German) deals with the difficult problem of representing space given a perception of it. Starting point is an approach by W.K. Yeap, which basically is founded on the metric properties of the perceptions and a classification of corners. Using an extension of the provided methods and by the introduction of new concepts it is possible to generate a description of space that is rather independent of the observers placement. Compared to an approach purely based on metrics, an approach using qualitative knowledge proves to be very helpful.
There have been various approaches to extend Allens's interval-based logic to spatial dimensions, quantitative and qualitative ones. We decided for the qualitative solution. Our relative representation of spatial knowledge is based on a distinction between projection and orientation, as proposed by Hernández , as well as on Allen's representation of time by using intervals . We examine the components projection and orientation, describe a hierarchical structure of orientation represented as intervals, convey all possible compositions in tables an finally extract the composition-rules among three objects with two known relations and the third to be "calculated". Three regularities are used to form a rule-basis within a system to develop cognitive maps.
In this paper we demonstrate the usefulness of the diagrammatical aspects of a qualitative representation of positions in 2-D space (Hernández:1991). Qualitative representations make only as many distinctions as necessary to identify objects, events, situations, etc. in a given context (identification task) as opposed to those needed to fully reconstruct a situation (reconstruction task). While the distinctions made are expressed propositionally in form of relations, we use data structures that analogically reflect the structure of the relational domain on a higher level of abstraction. This representation allows to perform operations such as a change in point of view or the composition of relations efficiently.
This paper (written in German) shows the close relationship that hold between qualitative representations (especially for representation of spatial knowledge) and the field of default reasoning. These relations are manifold: On one hand default reasoning and qualitative representations can be seen as alternatives for handling insecure or incomplete knowledge. On the other hand also for qualitative approaches there is need for default assumptions and revision algorithms working on them, in order to cope with the fact of their inherent under-determinism. This connection also exist in general for all constraint-reasoning-approaches, however we will show it here using examples using spatial reasoning. Furthermore, in the case of spatial representations the structural similarity between the representing and the represented worlds prevents us from violating constraints corresponding to basic properties of the represented world, which would otherwise have to be restored through revision mechanisms at great cost.
Learning in multi-agent systems establishes a young research field in distributed artificial intelligence. This paper investigates an action-oriented approach to delayed reinforcement learning in reactive multi-agent systems, and focuses on the question how the agents can learn to coordinate their actions. Two basic algorithms called the ACE algorithm and the AGE algorithm (ACE and AGE stand for "Action Estimation" and "Action Group Estimation", respectively) for the collective learning of appropriate action sequences are introduced. Both algorithms explicitly take into consideration that the agents may know different aspects of the environment and that actions may be incompatible. The experiments described in this paper illustrate these algorithms and their learning capacities.
We describe how to solve linear-non-separable problems using simple feed-forward perceptrons without hidden layers, and a biologically motivated topologically distributed encoding of input data. We point out why neural networks have advantages compared to classic mathematical algorithms without loosing performance. The so-called Iris-dataset is analyzed as a practical example.
A neural classifier of planar trajectories is presented. There already exist a large variety of classifiers that are specialized on particular invariants contained in a trajectory classification task such as position-invariance, rotation-invariance, size-invariance, ... . That is, there exist classifiers specialized on recognizing trajectories e.g. independently of their position. The neural classifier presented in this paper is not restricted to certain invariants in a task: The neural network itself extracts the invariants contained in a classification task by assessing only the trajectories. The trajectories need to be given as a set of points. No additional information must be available for training, which saves the designer from determining the needed invariants by himself. Besides its applicability to real-world problems, such a more general classifier is also cognitively plausible: In assessing trajectories for classification, human beings are able to find class specific features, no matter what kinds of invariants they are confronted with. Invariants are easily handled by ignoring unspecific features.
This paper focuses on reactive multi-agent systems in which (i) each agent only knows a specific part of the environment, (ii) each agent is specialized in a specific action, and (iii) actions of different agents can be incompatible. The central question addressed is how several agents can collectively adapt to their environment by learning to generate a sequence of action sets that solves an environmental task. The contents and organization of the paper are as follows. Section 1 briefly motivates the topic of action selection and learning in multi-agent systems. Section 2 introduces a new algorithm called DFG (for ``Dissolution and Formation of Groups'') for the reinforcement learning of appropriate sequences of groups of concurrently active agents. Section 3 provides theoretical and experimental results on the the DFG algorithm. Section 4 concludes with a brief summary and an outlook on future work.
We aim at building a joint space trajectory generation system. Connected to a fixed manipulator with sensory feedback, neural networks are expected to move the end-effector from any start to any goal configuration without colliding with obstacles. The output of our system is a series of consecutive configurations yielding a joint-space trajectory. Sensory information serves as the only source of knowledge about the manipulator's kinematics and the trajectory features. This knowledge will gradually be incorporated into the trajectory retrieval system.
This paper shows the close relationships between qualitative representations and default reasoning. These relationships are manifold: On the one hand, qualitative representations and defaults can be considered as alternate ways to cope with fuzzy or incomplete knowledge. On the other hand, defaults and their associated revision mechanisms are also needed in the qualitative approach to handle the fact that they are inherently under-determined. Furthermore, in the case of spatial representations the structural similarity between the representing and the represented worlds prevents us from violating constraints corresponding to basic properties of the represented world, which would otherwise have to be restored through revision mechanisms at great cost.
We present a variety of mechanisms to reason with qualitative representations in general, and qualitative representations of 2-D positional information in particular. One of the simplest is transforming between explicit reference frames (intrinsic, extrinsic, deictic) and a canonical implicit one. Another is computing the composition of spatial relations. Constraint propagation and constraint relaxation form the core of the qualitative inference system. All of them exploit the rich structure of space to reduce the complexity of the algorithms involved. In some cases we even use analogical data structures (abstract maps) that allow us to reason diagrammatically.
We present mechanisms used to maintain the consistency of a knowledge base of spatial information based on a qualitative representation of 2-D positions. These include the propagation heuristics used when inserting new relations as well as the reason maintenance mechanisms necessary to undo the effects of propagation when deleting a relation. Both take advantage of the rich structure of the spatial domain.
The goal in this paper is to show how the classification of patterns of phonetic features (=phones) to phonemes can be acquired. This classificational process is modeled by a supervised feature selection method, based on a weighted Hamming distance, augmented by Boolean functions describing exceptions. An important aspect is the differentiation of rules and exceptions during learning.
This paper presents a collection of problems for natural language analysis derived mainly from theoretical linguistics. Most of these problems present major obstacles for computational systems of language interpretation. The set of given sentences can easily be scaled up by introducing more examples per problem. The construction of computational systems could benefit from such a collection, either using it directly for training and testing or as a set of benchmarks to qualify the performance of a NLP system.
We present a neural trajectory storage for manipulator trajectories. Trajectories will be neurally stored as a sequence of discrete joint positions. Every time a trajectory needs to be generated, the storage is consulted. Besides being a ``classical'' retrieval system, the neural network is able to generalize from stored trajectories to new trajectories and thus serves as a trajectory planner. Its generalization abilities arise from an adequate representation of free space. We recommend the use of Kohonen's self-organizing feature maps to represent input and output data, both data of the same free space. Different to others, we do not compute the winner unit but instead feed the activities of all units of a Kohonen-Map into the next layer.
There exist all kinds of problems where both input and output data for neural networks are continuous and vector-valued. From our previous works we know that distributed representations of the input data are extremely useful for neural networks to embody good generalization skills and also to model forbidden regions in data space. Units are only located in those regions where data has occurred in the learning process. In this paper, we analyze Kohonen's self-organizing feature maps with respect to distributed representations and in this respect come up with a comparison to Radial Basis Functions. Concerning the output data, we give two interpretations for distributed representations. First, the center of gravity interpretation for which we explain some severe problems. Second, the pseudo-inverse of the matrix defined by the positions of the representation units.
Aim of this workshop, taking place within the "17. Fachtagung für künstliche Intelligenz" from September 13-16, 1993 at Humbold-Universität Berlin, was to provide a place for discussions for hybrid and integrated approaches for the representation of space. The proceedings contain an introductory paper as well as ten accepted papers on ``Qualitative Approaches'', ``Picture-Like/Propositional'', ``Applications in Design Processes'' und ``Path Description and Navigation''.
Two standard schemes for learning in classifier systems have been proposed in the literature: the bucket brigade algorithm (BBA) and the profit sharing plan (PSP). The BBA is a local learning scheme which requires less memory and lower peak computation than the PSP, whereas the PSP is a global learning scheme which typically achieves a clearly better performance than the BBA. This ``requirement versus achievement'' difference, known as the locality/globality dilemma, is addressed in this paper. A new algorithm is described which aims at synthesizing the local and the global learning schemes. This algorithm bases on an experience-based learning mechanism called hierarchical chunking, and offers a solution to the locality/globality dilemma for reactive classifier systems. This ``requirement versus achievement'' difference, known as the locality/globality dilemma, is addressed in this paper. A new algorithm is described which aims at synthesizing the local and the global learning schemes. This algorithm bases on an experience-based learning mechanism called hierarchical chunking, and offers a solution to the locality/globality dilemma for reactive classifier systems.
In this paper, we want to explore the notion of learning the classification of patterns from examples by synthesizing distance functions. A working implementation of a distance classifier is presented. Its operation is illustrated with the problem of classification according to parity (highly non-linear) and a classification of feature vectors which involves dimension reduction (a linear problem). A solution to these problems is sought in two steps: (a) a parameterized distance function (called a `distance function scheme') is chosen, (b) setting parameters to values according to the classification of training patterns results in a specific distance function. This induces a classification on all remaining patterns. The general idea of this approach is to find restricted functional shapes in order to model certain cognitive functions of classification exactly, i.e. performing classifications that occur as well as excluding classifications that do not naturally occur and may even be experimentally proven to be excluded from learnability by a living organism. There are also certain technical advantages in using restricted function shapes and simple learning rules, such as reducing learning time, generating training sets and individual patterns to set certain parameters, determining the learnability of a specific problem with a given function scheme or providing additions to functions for individual exceptions, while retaining the general shape for generalization.
This article focuses on the intersection of distributed machine learning and organizational design in the context of multi-agent systems. A computational approach to distributed reinforcement learning from experience and interaction is described and analyzed. According to this approach, multiple agents coordinate their actions by collectively learning appropriate instantiations of one of the most fundamental organizational structures, namely, hierarchically arranged groups of compatible agents. Distributed learning encompasses two interrelated processes: credit assignment (i.e., the process of approximating the goal relevance of the agents' and the groups' activities) and group development (i.e., the process of creating new and dissolving old groups). The approach is formalized in an algorithm called DFG. Theoretical and experimental results are presented which demonstrate the learning abilities of this approach. The approach is discussed and research directions are suggested.
We present an inter-lingual semantic representation for the synthesis of morphological aspect of English and Russian by standard back-propagation. Grammatical meanings are represented symbolically and translated into a binary representation. Generalization is assessed by test sentences and by a translation of the training sentences of the other language. The results are relevant to machine translation in a hybrid systems approach and to the study of linguistic category formation.
The main point of this paper is to show how we can extract semantic features, describing aspectual meanings, from a syntactic representation. The goal is to translate English to Russian aspectual categories. This is realized by a specialized language processing module, which is based on the concept of vertical modularity. The results of supervised learning of syntactic-semantic correspondences using standard back-propagation show that both learning and generalization to new patterns is successful. Furthermore, the correct generation of Russian aspect from the automatically created semantic representations is demonstrated.
This paper presents a novel kind of generalizing neural storage tailored for the problem of global motion planning for six-joint manipulators in complex, changing environments. Paths are stored in a growing map of neurons with adaptive ellipsoidal receptive field, called the Ellipsoidal Map. Path planning is based on finding the best-matching neurons for start and goal and a graph search in the neurons' connectivity graph, which gets adapted to the topology of free space. The Ellipsoidal Map remains adaptive all the time: Paths can always be stored and information about collision in planned paths, which occur in particular due to changes of the environment, lead always to corrections of the map. An application to a simulation of the Rotex robot with six degrees of freedom demonstrates highly satisfactory planning skills.
Good user interfaces cannot be built on top of representations that ignore cognitive issues. This is particularly true for the representation of spatial knowledge, since space plays a preponderant role in human cognition. In this paper, we lay down the theoretical framework for a qualitative representation of space that takes cognitive aspects into consideration. A Knowledge Representation Model, characterized by making the role of the observer explicit, allows us to distinguish various modalities of representation (declarative vs. procedural, propositional vs. analogical, qualitative vs. quantitative), which are often at the heart of system (including GIS) design. We relate this approach to insights from linguistic metaphors, information theory, and semiotics.
Many machine learning algorithms aim at finding ``simple'' rules to explain training data. The expectation is: the ``simpler'' the rules, the better the generalization on test data (-> Occam's razor). Most practical implementations, however, use measures for ``simplicity'' that lack the power, universality and elegance of those based on Kolmogorov complexity and Solomonoff's algorithmic probability. Likewise, most previous approaches (especially those of the ``Bayesian'' kind) suffer from the problem of choosing appropriate priors. This paper addresses both issues. It first reviews some basic concepts of algorithmic complexity theory relevant to machine learning, and how the Solomonoff-Levin distribution (or universal prior) deals with the prior problem. The universal prior leads to a method for finding ``algorithmically simple'' problem solutions with high generalization capability. The method is based on Levin complexity (a time-bounded generalization of Kolmogorov complexity) and Levin's optimal universal search algorithm. A probabilistic variant of universal search, designed for conventional digital machines, is introduced. With a given problem, solution candidates are computed by efficient ``self-sizing'' programs that influence their own runtime and storage size. Universal search finds the ``good'' programs (the ones computing algorithmically probable solutions fitting the training data). Simulations focus on the task of discovering ``algorithmically simple'' neural networks with high generalization capability. It is demonstrated that the method, at least in certain simple cases where it is computationally feasible, can lead to generalization results unmatchable by previous neural net algorithms. The final part of the paper concerns extensions of universal search designed for incremental learning.
Building a model for an environment with a specific manipulator takes exponential computational costs in the dimension of the manipulator's configuration space. Furthermore complexity increases with the number of obstacles, which in real world applications usually is high. Therefore many classical trajectory planning algorithms, based on world models, can not cope with a changing environment. In order to plan complex trajectories, a system that plans hierarchically shows many advantages. The single sub-trajectories may be simple and can often be recombined for new tasks without further low-level planning. In this article we report on results with different neural network based (and therefore inherently adaptive), hierarchical trajectory planning systems. Trajectories are built in combining known sub-trajectories by choosing subgoals. The neural systems are trained with the (supervised) backpropagation learning rule. Nevertheless the subgoals are produced by the system itself, without any pre-specifications. We show that useful subgoals can be produced for manipulators in different (but still static) environments with obstacles. Opposite to many classical approaches our approach works (once trained) fast but remains adaptive. In order to show the capability of our system we compare the results to a recently introduced model free stochastic search technique.
Recent experimental results suggest that neurons in the visual cortex synchronize their action potentials on the millisecond time scale. More importantly this binding expresses functional relationships between the neurons. In several neural network models oscillatory units were used to simulate the behavior of the recorded neurons. It is still under debate, however, whether neurons in the cortex show oscillatory discharges per se. The model proposed here exhibits the same synchronizing effects but uses physiologically more plausible neurons. Especially a high rate of noise cannot destroy synchronization in the neural network simulations.
Many artists try to depict ``the essence'' of objects to be represented. In an attempt to formalize certain aspects of the ``the essence'', I propose an art form called algorithmic art. It is based on concepts from algorithmic information theory. Suppose the task is to draw a given object. Usually there are many ways of doing so. The goal of algorithmic art is to draw the object such that the drawing can be specified by an algorithm and two properties hold: (1) The drawing should ``look right''. (2) the Kolmogorov complexity of the drawing should be small (the algorithm should be short), and a typical observer should be able to see this. Examples of algorithmic art are given in form of ``algorithmically simple'' cartoons of various objects, including a pin-up girl and a weight lifter. Relations to previous work are established. Finally, attempts are made to relate the formalism of the theory of minimum description length to informal notions like ``good artistic style'' and ``beauty''.
This paper introduces the "incremental self-improvement paradigm". Unlike previous methods, incremental self-improvement encourages a reinforcement learning system to improve the way it learns, and to improve the way it improves the way it learns ..., without significant theoretical limitations - the system is ale to "shift its inductive bias" in a universal way. Its major features are: (1) There is no explicit difference between "learning", "meta-learning", and other kinds of information processing. Using a Turing machine equivalent programming language, the system itself occasionally executes self-delimiting, initially highly random "self-modification programs" which modify the context dependent probabilities of future action sequences (including future self-modification programs). (2) the system keeps only those probability modifications computed by "useful" self-modification programs. (3) The computation of payoff per time takes into account all the computation time required for learning -- the entire system life is considered: boundaries between learning trials are ignored (if there are any). A particular implementation based on the novel paradigm is presented. It is designed to exploit what conventional digital machines are good at: fast storage addressing, arithmetic operations etc. Experiments illustrate the system's mode of operation.
CMAC is well known as a good function approximator with local generalization abilities. Depending on smoothness of the function to be approximated, the resolution as smallest distinguishable part of the input domain, plays a crucial role. If the usually used binary quantizing functions are dropped in favor of more general, continuous valued functions, this drawback can be remedied. We introduce such a model, using gaussians instead of binary functions. We show the far better results in learning two valued functions with this continuous valued activation function.
We present a new algorithm for finding low complexity neural networks with high generalization capability. The algorithm searches for a ``flat'' minimum of the error function. A flat minimum is a large connected region in weight-space where the error remains approximately constant. An MDL-based argument shows that flat minima correspond to low expected over-fitting. Although our algorithm requires the computation of second order derivatives, it has back-propagation's order of complexity. Automatically, it effectively prunes units, weights, and input lines. Various experiments with feed-forward and recurrent nets are described. In an application to stock market prediction, flat minimum search outperforms (1) conventional back-propagation, (2) weight decay, (3) ``optimal brain surgeon'' / ``optimal brain damage''.
Static real world images are processed by a computationally simple and biologically plausible version of the recent predictability minimization algorithm for unsupervised redundancy reduction. Without a teacher and without any significant pre-processing, the system automatically learns to generate orientation sensitive edge detectors in the first (semi-linear) layer.
This report describes an experimentation environment called SCHED for distributed machine learning scheduling. SCHED provides an easy to use for simulation, testing and comparison of different learning algorithms for scheduling problems in distributed environments. Besides the presentation of the different tasks and learning algorithms provided by SCHED the report includes detailed informations on the user interface, on implementation details and on possible extension of SCHED.
A framework for the representation of qualitative distances is developed inspired by previous work on qualitative orientation. It is based on the concept of ``distance systems'' consisting of a list of distance relations and a set of structure relations that describe how the distance relations in turn relate to each other. The framework is characterized by making the role of the ``frame of reference'' explicit in the context of distances. The composition of distance relations as main inference mechanism to reason about distances within a given frame of reference is explained, in particular under ``homogeneous structural restrictions''. Finally, we introduce articulation rules as a way to mediate between different frames of reference.
For several years there has been a strong interest in computational models, which use the adaptivity, learning ability and fault tolerance of artificial neural networks. This is also increasingly becoming applied to the modeling of language functions, such as lexical disambiguation, syntax-semantics mapping or syntactic parsing. These models have reached a level, where they are applicable to real world problems and allow to use textual corpora for evaluation. In this article, we give an overview on the state of the art in the application of neuronal learning procedures to linguistic problems, specifically problems of syntax-semantics mapping, including a motivation for using neural nets or similar learning techniques instead of the still prevalent rule-based systems. Finally a detailed model for the learning of a syntax-semantics mapping function is developed, which is directed at interpretation and generation of spatial expressions, and possible applications in the area of grammar checking and machine translation are being discussed. The paper is written in German.
We compare two regularization methods which can be used to improve the generalization capabilities of Gaussian mixture density estimates. The first method consists of defining a Bayesian prior distribution on the parameter space. We derive EM (Expectation Maximization) update rules which maximize the a posterior parameter probability in contrast to the usual EM rules for Gaussian mixtures which maximize the likelihood function. In the second approach we apply ensemble averaging to density estimation. This includes Breiman's "bagging", which has recently been found to produce impressive results for classification networks. To our knowledge this is the first time that ensemble averaging is applied to improve density estimation.
This report (written in German) describes a simulation environment named SIMULOAD, that was programmed in order to examine distributed machine learning algorithms in the field of load-balancing problems in computer networks. The environment SIMULOAD provides several different new learning algorithms and different elementary approaches for comparison of those learning algorithms. With SIMULOAD it is easily possible to specify and use realistic load scenarios. The learning algorithms are based on the method of Q-Learning. Using simple examples we illustrate the management of SIMULOAD and the theoretical power of the learning algorithms.
``Recurrent back-propagation'' for learning to store information over extended time periods takes too long. The main reason is insufficient, decaying error back flow. We describe a novel, efficient ``Long Short Term Memory'' (LSTM) that overcomes this and related problems. Unlike previous approaches, LSTM can learn to bridge arbitrary time lags by enforcing constant error flow. Using gradient descent, LSTM explicitly learns when to store information and when to access it. In experimental comparisons with ``Real-Time Recurrent Learning'', ``Recurrent Cascade-Correlation'', ``Elman nets'', and ``Neural Sequence Chunking'', LSTM leads to many more successful runs, and learns much faster. Unlike its competitors, LSTM can solve tasks involving minimal time lags of more than 1000 time steps, even in noisy environments.
A framework for the qualitative representation of positional information in a two-dimensional space is presented. Qualitative representations make only as many distinctions as necessary in a given context, providing a flexible framework that accommodates various levels of granularity and scales of reasoning. The model takes into account primarily orientation and distance. However, since the formalization of orientation has been developed elsewhere (Hernández, 1994), we concentrate on the representation of distance and the combination of orientation and distance to perform spatial reasoning. We develop homogeneous and heterogeneous distance systems and investigate the role of frames of reference as a way to capture the inherent context dependency of qualitative distances. Finally, basic inference mechanisms including the composition of homogeneous and heterogeneous distance relations and of distance/orientation tuples, as well as articulation rules to mediate between different frames of reference are discussed.
This tutorial is aimed at AI researchers and engineers interested in representing and reasoning with spatial knowledge, i.e., shape, size, relative position, connectivity, etc. The need to represent spatial knowledge explicitly arises in applications as diverse as Geographical Information Systems, Image Analysis, Robot Navigation, Natural Language Understanding, and Visual Modeling. In this tutorial, we highlight the progress that has been made in representing space at different levels of abstraction, with particular emphasis on applications. We first compare traditional quantitative approaches with recent qualitative and hybrid approaches. We then cover interval algebras and present a 2D application for block-based image structures such as documents. Next we give an overview of extant approaches to the representation of arrangement, topology, orientation, size, distance, and shape together with the corresponding reasoning mechanisms. Along the way we shall discuss general representational aspects (frames of reference, points vs. extension, granularity, vagueness) and illustrate these with particular applications such as:
Trajectory generation for manipulators can be performed most efficiently, if a model of the environment is available. Classical approaches usually build such a model in a preprocessing step. But the construction of the model is computationally very expensive. A further disadvantage of classical approaches is that a complete re-computation of the whole model is necessary after each small change in the environment. Therefore an adaptive model that can be changed online is favorable, even if small changes in the obstacles' places may cause fundamental changes in configuration-space. With their inherent adaptability neural networks seem to be an optimal tool to construct such adaptive models. We introduce an approach that shows that adaptive modeling with neural networks can be performed most efficient. Furthermore we use adaptive neural network world-models to plan subgoals in configuration-space allowing hierarchical trajectory construction for manipulators. Experimental results in a basic example environment are presented.
This report (written in German) summarizes the results that we worked out in the four year project Neres. Our focus in that project was on qualitative spational reasoning and neural networks in order to plan kinematic trajectories for industrial manipulators. Besides demonstrating why several approaches do not scale up well for complicated manipulator environments, we developed two planning approaches that outperform most other planning algorithms concerning processing time and scalability.