Chronology
Home Background Copyright Instructions Bibliography Chronology Change History arXiv

 

Introduction

This page is extracted with no modification from a curriculum vitae that I created several years ago (probably around 1997 judging by the last entry in the list below). This was a quick way of creating a chronology of work that I have done in various areas. I have added links to various papers that are representative of the various areas. I will refine this page when I have time.

List of Research Areas (oldest first)

  1. Bayesian super-resolution theory. Super-resolution describes the enhancement of resolution that occurs when you take a limited-resolution image and fuse it with prior knowledge of what the image might actually be. For instance, if you claim that a blob in an image derives from an object of limited spatial extent, then you may process the image to have a better resolution than it had in the first place. Generally, the better the prior knowledge, the more the increase in image resolution. This is useful, so it has been patented.
  2. Information theory. Mutual information is an information theoretic measure of the amount of information that two or more related events have about each other. For instance, the pixel values of an image contain information about the electromagnetic field that was scattered off the object being imaged, which in turn contains information about the actual underlying object itself. This relationship works both ways; hence the name mutual information. One possible application of mutual information is to use it as a criterion for optimising a data collection system, i.e. you seek to collect the data that contains the greatest amount of information about the object of interest. Ralph Linsker has shown how this principle can be applied to neural network optimisation.
  3. Markov random field clutter modelling. Radar clutter has complicated spatial correlations which may be modelled by one of two extreme approaches: (1) physical modelling of the process by which the data is generated, or (2) phenomenological modelling of the properties of the data itself. Clearly, (1) is the more desirable approach, but has limited success because of the complexity of the physical models. On the other hand, (2) lends itself well to clutter modelling, even though it does not necessarily give us physical insight into the underlying physics. Markov random field (MRF) models are the most general approach to modelling spatial correlations.
  4. Clutter modelling. MRF models can be used for clutter modelling. However, there remains the problem of deciding how to parameterise these models. This can be shown to be equivalent to designing a "Boltzmann Machine" for characterising the interdependencies amongst a set of "visible variables" (i.e. the observed data) in terms of some "hidden variables" (i.e. the underlying causes). An alternative approach is also possible, in which the interdependencies are broken down into a nested structure, with the most strongly coupled variables occupying the innermost level. Typically, for images, each level corresponds to a different spatial scale. This is called the "cluster expansion".
  5. Image compression. The spatial correlations that exist in images make it possible to compress those images, because where there is a correlation there is a redundancy (i.e. in effect, the same information is recorded more than once). One possible way of compressing images is the cluster expansion, which provides a adaptive approach to multiscale image compression. This is useful, so it has been patented under the guise of anomaly detection.
  6. Satellite communications terminal. Essentially, this is a hardware demodulator. A finite state machine is carefully designed, so that if it is fed the raw signal (from an antenna) as its input, then it moves around in its state space in a way that makes it easy to read off the symbols that were encoded in the raw signal. This is useful, so it has been patented.
  7. Bayesian autofocus/super-resolution theory. This is an extension to the original work on Bayesian super-resolution theory, in which it is not necessary to know the exact form of the imaging system which is collecting the raw data. This approach unifies all earlier work by me on this subject. It is also a nice example of the expectation-maximisation (EM) method in action.
  8. Adaptive cluster expansion principle. The cluster expansion method may be reformulated in terms of a mutual information maximisation principle. This is different from my earlier mutual information approach. The mutual information is now measured horizontally between different pixels of each compressed image, rather than vertically between different compressed images. The InfoMax approach of Suzanna Becker and Geoffrey Hinton is a special case of this.
  9. Self-supervised multilayer networks. A neural network can be optimised in two essentially different ways: (1) supervised, in which each input is paired with a corresponding target output, or (2) unsupervised, in which only the inputs themselves are provided. However, in a multi-layer unsupervised neural network, the higher layers can act as supervisors of the lower layers, even though there is no external supervisor. This effect arises because the overall network objective function has contributions from all of the layers, so making adjustments in one layer will have side-effects on what happens in other layers (i.e. in higher layers, if the network is feed-forward). In order that the layer in which you are making adjustments is aware of these side-effects, it must receive feedback signals originating in those other layers in which there were side-effects. This is called "self-supervision".
  10. Partitioned mixture distribution vision models. MRF models are the most general way of modelling the joint statistical structure of the pixels of an image. However, they are notoriously computationally expensive. A much cheaper approach is to use many MRF models, each of which has its attention restricted to only a very small part of the image. In effect, each of these small MRFs models a marginal PDF of the image, rather than the full joint PDF that was attempted in the original MRF approach. Nothing is lost if only these marginal PDFs were required anyway. Each of these small MRF models is equivalent to a mixture model, i.e. its PDF is a superposition of a finite number of elementary constituent PDFs. The set of all of these small MRF models is equivalent to a new type of PDF model, called a partitioned mixture distribution (PMD).
  11. Adaptive Bayesian network theory. My use of the Bayesian approach to adaptive data modelling needs to be summarised. Here is that summary.
  12. Bayesian self-organising maps. My use of the Bayesian approach to self-organising maps (which are a type of unsupervised neural network) needs to be summarised. Here is that summary.
  13. Discretely firing neural networks. Conventionally, neural networks consists of a number of layers of neurons, and the layers are connected together in a feed-forward fashion. The state of each layer a characterised by a vector which specifies the "activity" of each neuron in that layer. Thus, the overall effect is for the neural network to compute as its output vector a non-linear mapping of its input vector, via a number of intermediate vectors (i.e. the hidden layers). In real life this is almost certainly overkill. Does the brain actually wait until a whole vector of activity is available in each layer before computing the state of the next layer (assuming feed-forward connectivity, that is)? Equivalently, do we need to know all of the pixel values of an image before we commence our analysis of it? Of course not! Thus, a diametrically opposite approach to analysing neural networks is to consider extremely incomplete layer vectors, in which the activity of only a few of its neurons are known. A simple model of this type is one in which the neurons fire discretely (i.e. they are on or off), rather than have analogue activities, and the only information available to the next layer is which neuron fired first (say). Not only is this model more realistic than the usual type of neural network approach (which is a special case of the discrete approach in which an infinite number of firing events have occurred), but it also proves to be remarkably versatile.
  14. Self-organising modular neural networks. One of the emergent properties of discretely firing neural networks is their ability to self-organise into modular structures, in which different modules process different aspects of the input. It is possible to demonstrate analytically how this behaviour occurs. One very useful application of this would be in fusion of multiple data sources (data fusion), where a network to do this job could be designed by a process of self-organisation. Typically, lower layers of the network would split into modules which would each process a different sensor, whereas higher layers of the network would have modules which processed groups of correlated sensors (in effect, doing data compression), and finally the top layer of the network might have a single module which processed all of the sensors.
  15. Unified theory of self-organising neural networks. My use of the discrete firing approach to self-organising neural networks needs to be summarised. Here is that summary.
  16. Self-organising visual cortex network (VICON). Another of the emergent properties of discretely firing neural networks is their ability to self-organise into the types of structure that are observed in the mammalian visual cortex. In a sense, this is a special case of the behaviour observed in modular networks. The types of structure that emerge are dominance stripes (alternate bands of neurons process left eye and right eye data), and orientation maps (each neuron has a characteristic preferred orientation for its input data, and the neurons cluster together in patches which contain all orientations).
  17. Unified Density Modelling. The optimisation of discretely firing neural networks can be expressed as a density modelling problem, in which a model of the joint probability density of the states of all layers of a multilayer network is optimised. This is not the same as conventional density modelling, in which only the probability density of the input layer is modelled. This unified theory generalises my earlier Bayesian self-organising maps theory, and contains the adaptive cluster expansion principle and partitioned mixture distribution vision models as special cases.

    This page was last updated on Thursday, 05 February 2004 .