|
| |
Introduction
This page is extracted with no modification from a curriculum vitae that I
created several years ago (probably around 1997 judging by the last entry in the
list below). This was a quick way of creating a chronology of work that I have
done in various areas. I have added links to various papers that are
representative of the various areas. I will refine this page when I have time.
List of Research Areas (oldest first)
- Bayesian super-resolution theory.
Super-resolution describes the enhancement of resolution that occurs when
you take a limited-resolution image and fuse it with prior knowledge of what
the image might actually be. For instance, if you claim that a blob in an
image derives from an object of limited spatial extent, then you may process
the image to have a better resolution than it had in the first place.
Generally, the better the prior knowledge, the more the increase in image
resolution. This is useful, so it has been patented.
- Information theory. Mutual information
is an information theoretic measure of the amount of information that two or
more related events have about each other. For instance, the pixel values of
an image contain information about the electromagnetic field that was
scattered off the object being imaged, which in turn contains information
about the actual underlying object itself. This relationship works both
ways; hence the name mutual information. One possible application of mutual
information is to use it as a criterion for optimising a data collection
system, i.e. you seek to collect the data that contains the greatest amount
of information about the object of interest. Ralph Linsker has shown how
this principle can be applied to neural network optimisation.
- Markov random field clutter modelling. Radar
clutter has complicated spatial correlations which may be modelled by one of
two extreme approaches: (1) physical modelling of the process by which the
data is generated, or (2) phenomenological modelling of the properties of
the data itself. Clearly, (1) is the more desirable approach, but has
limited success because of the complexity of the physical models. On the
other hand, (2) lends itself well to clutter modelling, even though it does
not necessarily give us physical insight into the underlying physics. Markov
random field (MRF) models are the most general approach to modelling spatial
correlations.
- Clutter modelling. MRF models can be
used for clutter modelling. However, there remains the problem of deciding
how to parameterise these models. This can be shown to be equivalent to
designing a "Boltzmann Machine" for characterising the
interdependencies amongst a set of "visible variables" (i.e. the
observed data) in terms of some "hidden variables" (i.e. the
underlying causes). An alternative approach is also possible, in which the
interdependencies are broken down into a nested structure, with the most
strongly coupled variables occupying the innermost level. Typically, for
images, each level corresponds to a different spatial scale. This is called
the "cluster expansion".
- Image compression. The spatial
correlations that exist in images make it possible to compress those images,
because where there is a correlation there is a redundancy (i.e. in effect,
the same information is recorded more than once). One possible way of
compressing images is the cluster expansion, which provides a adaptive
approach to multiscale image compression. This is useful, so it has been
patented under the guise of anomaly detection.
- Satellite communications terminal.
Essentially, this is a hardware demodulator. A finite state machine is
carefully designed, so that if it is fed the raw signal (from an antenna) as
its input, then it moves around in its state space in a way that makes it
easy to read off the symbols that were encoded in the raw signal. This is
useful, so it has been patented.
- Bayesian autofocus/super-resolution theory.
This is an extension to the original work on Bayesian super-resolution
theory, in which it is not necessary to know the exact form of the imaging
system which is collecting the raw data. This approach unifies all earlier
work by me on this subject. It is also a nice example of the
expectation-maximisation (EM) method in action.
- Adaptive cluster expansion principle. The
cluster expansion method may be reformulated in terms of a mutual
information maximisation principle. This is different from my earlier mutual
information approach. The mutual information is now measured horizontally
between different pixels of each compressed image, rather than vertically
between different compressed images. The InfoMax approach of Suzanna Becker
and Geoffrey Hinton is a special case of this.
- Self-supervised multilayer networks. A
neural network can be optimised in two essentially different ways: (1)
supervised, in which each input is paired with a corresponding target
output, or (2) unsupervised, in which only the inputs themselves are
provided. However, in a multi-layer unsupervised neural network, the higher
layers can act as supervisors of the lower layers, even though there is no
external supervisor. This effect arises because the overall network
objective function has contributions from all of the layers, so making
adjustments in one layer will have side-effects on what happens in other
layers (i.e. in higher layers, if the network is feed-forward). In order
that the layer in which you are making adjustments is aware of these
side-effects, it must receive feedback signals originating in those other
layers in which there were side-effects. This is called
"self-supervision".
- Partitioned mixture distribution vision models.
MRF models are the most general way of modelling the joint statistical
structure of the pixels of an image. However, they are notoriously
computationally expensive. A much cheaper approach is to use many MRF
models, each of which has its attention restricted to only a very small part
of the image. In effect, each of these small MRFs models a marginal PDF of
the image, rather than the full joint PDF that was attempted in the original
MRF approach. Nothing is lost if only these marginal PDFs were required
anyway. Each of these small MRF models is equivalent to a mixture model,
i.e. its PDF is a superposition of a finite number of elementary constituent
PDFs. The set of all of these small MRF models is equivalent to a new type
of PDF model, called a partitioned mixture distribution (PMD).
- Adaptive Bayesian network theory. My use of
the Bayesian approach to adaptive data modelling needs to be summarised.
Here is that summary.
- Bayesian self-organising maps. My use
of the Bayesian approach to self-organising maps (which are a type of
unsupervised neural network) needs to be summarised. Here is that summary.
- Discretely firing neural networks.
Conventionally, neural networks consists of a number of layers of neurons,
and the layers are connected together in a feed-forward fashion. The state
of each layer a characterised by a vector which specifies the
"activity" of each neuron in that layer. Thus, the overall effect
is for the neural network to compute as its output vector a non-linear
mapping of its input vector, via a number of intermediate vectors (i.e. the
hidden layers). In real life this is almost certainly overkill. Does the
brain actually wait until a whole vector of activity is available in each
layer before computing the state of the next layer (assuming feed-forward
connectivity, that is)? Equivalently, do we need to know all of the pixel
values of an image before we commence our analysis of it? Of course not!
Thus, a diametrically opposite approach to analysing neural networks is to
consider extremely incomplete layer vectors, in which the activity of only a
few of its neurons are known. A simple model of this type is one in which
the neurons fire discretely (i.e. they are on or off), rather than have
analogue activities, and the only information available to the next layer is
which neuron fired first (say). Not only is this model more realistic than
the usual type of neural network approach (which is a special case of the
discrete approach in which an infinite number of firing events have
occurred), but it also proves to be remarkably versatile.
- Self-organising modular neural networks.
One of the emergent properties of discretely firing neural networks is their
ability to self-organise into modular structures, in which different modules
process different aspects of the input. It is possible to demonstrate
analytically how this behaviour occurs. One very useful application of this
would be in fusion of multiple data sources (data fusion), where a network
to do this job could be designed by a process of self-organisation.
Typically, lower layers of the network would split into modules which would
each process a different sensor, whereas higher layers of the network would
have modules which processed groups of correlated sensors (in effect, doing
data compression), and finally the top layer of the network might have a
single module which processed all of the sensors.
- Unified theory of self-organising neural
networks. My use of the discrete firing approach to self-organising
neural networks needs to be summarised. Here is that summary.
- Self-organising visual cortex network
(VICON). Another of the emergent properties of discretely firing neural
networks is their ability to self-organise into the types of structure that
are observed in the mammalian visual cortex. In a sense, this is a special
case of the behaviour observed in modular
networks. The types of structure
that emerge are dominance stripes (alternate bands of neurons process left
eye and right eye data), and orientation maps (each neuron has a
characteristic preferred orientation for its input data, and the neurons
cluster together in patches which contain all orientations).
- Unified Density Modelling. The
optimisation of discretely firing neural networks can be expressed as a
density modelling problem, in which a model of the joint probability density
of the states of all layers of a multilayer network is optimised. This is
not the same as conventional density modelling, in which only the
probability density of the input layer is modelled. This unified theory
generalises my earlier Bayesian self-organising maps
theory, and contains
the adaptive cluster expansion principle and
partitioned mixture
distribution vision models as special cases.
This page was last updated on Thursday, 05 February 2004
.
|