Abstracts
Home Background Copyright Instructions Bibliography Chronology Change History arXiv

 

Copyright

All publications are retypeset versions of Crown copyright publications with the following copyright statement:

© Crown copyright. Reproduced with the permission of the Controller of HMSO and Queen's Printer for Scotland.

Journal Papers

OPEN A discrete firing event analysis of the adaptive cluster expansion network (1997): This paper describes how a hierarchical network for encoding sensor data (the adaptive cluster expansion network) can be constructed by linking together a number of elementary modules, each of which is a simple two-layer encoder/decoder network. To achieve this goal, a Bayesian analysis is applied to the discrete neural firing events that occur within each layer of the network.

OPEN Self-organisation of multiple winner-take-all neural networks (1997): In this paper, analysis of the information content of discretely firing neurons in unsupervised neural networks is presented, where information is measured according to the network's ability to reconstruct its input from its output with minimum mean square Euclidean error. It is shown how this type of network can self-organise into multiple winner-take-all subnetworks, each of which tackles only a low-dimensional subspace of the input vector. This is a rudimentary example of a neural network that effectively subdivides a task into manageable subtasks.

OPEN A Bayesian analysis of self-organising maps (1994): In this paper Bayesian methods are used to analyse some of the properties of a special type of Markov chain. The forward transitions through the chain are followed by inverse transitions (using Bayes' theorem) backwards through a copy of the same chain; this will be called a folded Markov chain. If an appropriately defined Euclidean error (between the original input and its 'reconstruction' via Bayes' theorem) is minimised with respect to the choice of Markov chain transition probabilities, then the familiar theories of both vector quantisers and self-organising maps emerge. This approach is also used to derive the theory of self-supervision, in which the higher layers of a multi-layer network supervise the lower layers, even though overall there is no external teacher.

OPEN Partitioned mixture distribution: an adaptive Bayesian network for low-level image processing (1994): Bayesian methods are used to analyse the problem of training a model to make predictions about the probability distribution of data that has yet to be received. Mixture distributions emerge naturally from this framework, but are not ideally matched to the density estimation problems that arise in image processing. An extension, called a partitioned mixture distribution is presented, which is essentially a set of overlapping mixture distributions. An expectation-maximisation training algorithm is derived for optimising partitioned mixture distributions according to the maximum likelihood prescription. Finally, the results of some numerical simulations are presented, which demonstrate that lateral inhibition arises naturally in partitioned mixture distributions, and that the nodes in a partitioned mixture distribution network co-operate in such a way that each mixture distribution in the partitioned mixture distribution receives its necessary complement of computing machinery.

OPEN Self-supervised adaptive networks (1992): A scheme for training multilayer unsupervised networks is presented, in which control signals propagate downwards from the higher layers to influence the optimisation of the lower layers. Because there is no external teacher involved, this is called self-supervised training. The author demonstrates both theoretically and numerically how self-supervision emerges when a simple network built out of vector quantisers is optimised.

OPEN Code vector density in topographic mappings: scalar case (1991): In coding theory one transforms signals from a source representation into an encoded representation that is suitable for transmission through a (possibly noisy) medium, and upon reception one decodes to reconstruct an approximation to the original signal. In autoassociative network theory one reconstructs a signal given incomplete (and possibly noisy) information about it. We derive some new results by combining these two approaches in the form of vector quantisation (VQ) theory and topographic mapping (TM) theory. We use a VQ model (with a noisy transmission medium) to model the processes that occur in TM's, which leads to the standard TM training algorithm, albeit with a slight modification to the encoding process (minimum distortion rather than nearest neighbour encoding). To emphasise this difference we call our model a topographic vector quantiser (TVQ). In the continuum limit of the one-dimensional (scalar) TVQ we find that the density of code vectors is proportional to P(x)^α (α=1/3) (which is the same as the result obtained from a standard scalar quantiser), assuming that the transmission medium introduces additive noise with a zero-mean, symmetric, monotonically decreasing probability density (which is equivalent to using a symmetrically tapered neighbourhood in a TM). Our α=1/3 result is dramatically different from the α=((2n+1)^2/3))/((n+1)^2+n^2)) result that is predicted when the standard TM training algorithm is used with a uniform symmetric neighbourhood[-n,+n], and we note that this difference arises entirely from using minimum distortion rather than nearest neighbour encoding. We verify our new result by performing a numerical experiment using P(x) ∝ x.

OPEN The theory of Bayesian super-resolution of coherent images: a review (1991): We review some theoretical work on super-resolution of coherent images from a Bayesian point of view. The well known singular value decomposition super-resolution method emerges as a special case, and it is extended in order to derive a practical iterative super-resolution algorithm.

OPEN A Bayesian derivation of an iterative autofocus/super-resolution algorithm (1990): We derive an estimate-maximise formulation of a Bayesian super-resolution algorithm for reconstructing scattering cross sections from coherent images. We generalise this result to obtain an 'autofocus/super-resolution' method, which simultaneously autofocuses an imaging system and super-resolves its image data. We present an explanatory numerical example to illustrate the implementation of our method on images of single and double point targets that are defocused by O(depth of focus). These are successfully super-resolved by autofocus/super-resolution, but not by pure super-resolution. We conjecture that autofocus/super-resolution might usefully be applied to the interpretation of airborne synthetic aperture radar images that are subject to defocusing effects.

OPEN Derivation of a class of training algorithms (1990): This paper presents a novel derivation of Kohonen's topographic mapping training algorithm, based upon an extension of the Linde-Buzo-Gray (LBG) algorithm for vector quantiser design. Thus a vector quantiser is designed by minimising an L2 reconstruction distortion measure, including an additional contribution from the effect of code noise which corrupts the output of the vector quantiser. The neighbourhood updating scheme of Kohonen's topographic mapping training algorithm emerges as a special case of this code noise model. This formulation of Kohonen's algorithm is a specific instance of the 'robust hidden layer principle', which stabilises the internal representations chosen by a network against anticipated noise or distortion processes.

OPEN Rapid acquisition of low signal-to-noise carriers (1989): A parallel bank of 1-bit digital filters is proposed as a solution to the rapid carrier acquisition problem in satellite communications. The hardware is compact and cheap, and using a crude threshold detection criterion, it can localise a 30 dBHz carrier to within 100 Hz in a total bandwidth of 12 kHz in about 60 mS. A theoretical analysis of the system performance is also presented, together with predictions of its statistical behaviour, which will assist in the design of more sophisticated signal detection algorithms.

OPEN Hierarchical vector quantisation (1989): We present a method of vector quantisation which trades off accuracy for speed of encoding. We achieve this by hierarchically structuring a multistage encoder so that each stage encodes low dimensional input vectors. Such hierarchical encoders may easily be realised as a set of fast table look-up operations. We demonstrate how the Euclidean distortion in such a multistage encoder is approximately minimised by using Kohonen's topographic mapping learning algorithm from neural network theory. We also demonstrate the performance of the technique on various stochastic time series. We find that there is little loss in encoding accuracy, when compared with the exact nearest neighbour encoding using an equivalent single stage encoder.

OPEN Image compression using a multilayer neural network (1989): We demonstrate that a topographic neural network model (Kohonen, 1984) may be used to data compress synthetic aperture radar (SAR) images by up to a factor of 8.

OPEN The inverse cross section problem for complex data (1989): Using a Gaussian scattering model we present a Bayesian analysis of the recovery of a scattering cross section from complex data samples (the inverse cross section problem). We derive the likelihood function of the cross section given complex data, and its functional derivatives with respect to cross section. The entire analysis is expressed using the same weighted singular value decomposition (SVD) of the imaging system that is used when solving the related inverse object field problem in a weighted space.

OPEN A maximum entropy approach to sampling function design (1988): We use the maximum entropy principle to infer the information content of sampling functions of data sets. This may be used as a quantitative measure of the suitability of sampling schemes for use in inverse problems. We present a practical numerical means of evaluating the integrals which appear in the analysis, and we demonstrate how the method is applied to the special case of homogeneous texture analysis.

OPEN The role of prior knowledge in coherent image processing (1988): Models that encode prior knowledge about a scene provide a means for interpreting image data from that scene in more detail than would otherwise be so. Information about both background clutter and target characteristics should be included in this prior knowledge. We demonstrate the use of a generalised noise model to represent a variety of naturally occurring random terrain clutter textures observed in high-resolution synthetic aperture radar (SAR) images. In addition a similar approach is adopted for the simulation of such textures. Having established the background properties we next introduce prior knowledge about any target within the scene and exploit this in achieving a cross-section reconstruction having improved resolution compared with the original image. Examples of such a super-resolution method based on singular value decomposition are demonstrated and the limits of the technique are indicated.

OPEN The use of Markov random field models to derive sampling schemes for inverse texture problems (1987): We advocate the use of Markov random field (MRF) models to describe texture properties generally. For homogeneous textures we derive a sampling scheme that preserves the information content of the data whilst reducing their dimensionality considerably. We derive a refinement of this sampling scheme where residual redundancy is removed by a more careful selection of what is sampled. We relate our results to the grey level co-occurrence method of texture classification and to the pattern recognition device that is known as WISARD

OPEN An optimisation of the Metropolis algorithm for multibit Markov random fields (1986): We present a version of the Metropolis algorithm which is suitable for rapidly updating multibit Markov random fields. The fundamental objects on which the algorithm operates are the constituent binary digits of the field variables. A speed enhancement factor of O(2^N/N) (for an N-bit representation) is obtained in comparison with a conventional Metropolis algorithm.

OPEN Prior knowledge in synthetic-aperture radar processing (1986): We briefly review the role of models as a means of encoding prior knowledge with which to interpret data. We then examine the specific case of synthetic-aperture radar (SAR) images. We review the current state of SAR terrain clutter models, and their role in target detection. We present numerical results which demonstrate the consistency of a correlated gamma-distributed surface cross section model with SAR terrain data. We then review the theory of target super-resolution by the use of the singular-value decomposition (SVD). We emphasise the need to generalise the basic SVD technique in order to achieve success with SAR target data. Furthermore we demonstrate that the general SVD technique is a special case of a Bayesian reconstruction scheme which we interpret in terms of Shannon information theory.Numerical super-resolution results from simulated SAR data are presented.

OPEN The use of transinformation in the design of data sampling schemes for inverse problems (1985): We analyse the average useful information content of data samples by using the transinformation entropy (rate of transmission) of Shannon's information theory. We derive a simple expression for the transinformation in linear experiments with gaussian a priori distributions. We use this expression to examine various schemes for sampling the image spaces of a translation invariant (sine) and a conformally invariant (Laplace) mapping. The optimum sampling scheme is found to be considerably better than the naive sampling scheme (e.g. Nyquist) when the number of samples is small and the a priori knowledge is non-trivial.

OPEN A new method of sample optimisation (1985): The transinformation measure is used to define the average information supplied by data samples. This is maximised by suitably positioning the samples, which optimises the data collection process. The application of the scheme to a linear imaging system is demonstrated.

OPEN Prior knowledge and object reconstruction using the best linear estimate technique (1985): Prior knowledge and image data are combined to produce an object reconstruction, using the best-linear-estimate technique. It is shown how this technique is related to the minimum norm method and the singular function method. How prior knowledge influences the object reconstruction is examined in detail. The concentration of image degrees of freedom to produce a resolution enhancement in the object reconstruction is demonstrated for suitable types of prior knowledge.

OPEN A super-resolution algorithm for SAR images (1988): We describe an algorithm designed to 'super-resolve' a given feature in a SAR image. The algorithm makes use of qualitative prior knowledge and is designed to be fully automatic. Although a straightforward implementation is expensive, we show that indirect implementation techniques reduce its cost considerably. A comparison with the best available alternative approach shows the effectiveness of the method on a set of trial data.

Conference Papers

OPEN A self-organising approach to multiple classifier fusion (2001): In this paper the theory of unsupervised multi-layer stochastic vector quantiser (SVQ) networks is reviewed, and then extended to the supervised case where the network is to be used as a classifier. This leads to a hybrid approach, in which training is governed both by unsupervised and supervised pieces in the network objective function. The unsupervised piece aims to preserve enough information in the network to be able to accurately reconstruct the input (i.e. the network serves as an encoder), whereas the supervised piece aims to reproduce the classification output supplied by an external teacher (i.e. the network serves as a classifier). The tension between these two pieces of the objective function leads to an optimal network, in which typically the lower layers (near to the input) act as faithful encoders of the input, whereas the higher layers (near to the output) act as faithful classifiers. The results of some simulations are presented to illustrate these properties.

OPEN Using stochastic encoders to discover structure in data (2000): In this paper a stochastic generalisation of the standard Linde-Buzo-Gray (LBG) approach to vector quantiser (VQ) design is presented, in which the encoder is implemented as the sampling of a vector of code indices from a probability distribution derived from the input vector, and the decoder is implemented as a superposition of reconstruction vectors. This stochastic VQ (SVQ) is optimised using a minimum mean Euclidean reconstruction distortion criterion, as in the LBG case. Numerical simulations are used to demonstrate how this leads to self-organisation of the SVQ, where different stochastically sampled code indices become associated with different input subspaces.

OPEN Invariant stochastic encoders (2000): The theory of stochastic vector quantisers (SVQ) has been extended to allow the quantiser to develop invariances, so that only "large" degrees of freedom in the input vector are represented in the code. This has been applied to the problem of encoding data vectors which are a superposition of a "large" jammer and a "small" signal, so that only the jammer is represented in the code. This allows the jammer to be subtracted from the total input vector (i.e. the jammer is nulled), leaving a residual that contains only the underlying signal. The main advantage of this approach to jammer nulling is that little prior knowledge of the jammer is assumed, because these properties are automatically discovered by the SVQ as it is trained on examples of input vectors.

OPEN Self-organised discovery of structure in signal manifolds (2000): The theory of stochastic vector quantisers (SVQ) has been extended to allow the quantiser to develop invariances, so that only 'large' degrees of freedom in the input vector are represented in the code. This has been applied to the problem of encoding data vectors which are a superposition of a 'large' jammer and a 'small' signal, so that only the jammer is represented in the code. This allows the jammer to be subtracted from the total input vector (i.e. the jammer is nulled), leaving a residual that contains only the underlying signal. The main advantage of this approach to jammer nulling is that little prior knowledge of the jammer is assumed, because these properties are automatically discovered by the SVQ as it is trained on examples of input vectors.

OPEN An adaptive network for encoding data using piecewise linear functions (1999): An objective function that encourages an encoder to have the minimum overall Euclidean reconstruction error is shown to lead to encoders that can be implemented using functions that depend only in a piecewise linear fashion on the input vector. From the neural network viewpoint, the optimal form of the probability that each neuron is the next one to fire is a piecewise linear function of the input vector.

OPEN Optimal response functions in a network of discretely firing neurons (1997): An objective function is defined which models an unsupervised neural network of discretely firing neurons. The objective function can be explicitly optimised with respect to the choice of probabilistic model that describes the neuron properties; this optimisation is done explicitly (i.e. algebraically) in some special cases.

OPEN Partitioned mixture distributions: a first order perturbation analysis (1997): A perturbation analysis is used to linearise the theory of partitioned mixture distribution (PMD) encoder/decoder networks, which allows many connections to be made with previous results. For instance, the difference of Gaussians (DOG) filtering operation commonly used in image processing emerges very naturally, and the dependence of the wavelength of dominance stripes in a visual cortex network (VICON) may then be derived.

OPEN Partitioned mixture distributions: the dynamical case (1997): The theory of static partitioned mixture distribution networks is generalised to the dynamical case. A simulation is presented to demonstrate how this type of network may be used to track moving objects in a cluttered background.

OPEN The cluster expansion: a hierarchical density model (1994): Density modelling in high-dimensional spaces is a difficult problem. In this paper a new model, called the cluster expansion,is proposed and discussed. The cluster expansion scales well to high-dimensional spaces, and it allows the integrals over model parameters that arise in Bayesian predictive distributions to be evaluated explicitly.

OPEN The partitioned mixture distribution: multiple overlapping density models (1994): In image processing problems density models are often used to characterise the local image statistics. In this paper a layered network structure is proposed, which consists of a large number of overlapping mixture distributions. This type of network is called a partitioned mixture distribution (PMD), and it may be used to apply mixture distribution models simultaneously to many different patches of an image.

OPEN Using self-organising maps to classify radar range profiles (1995): A model-based approach to radar range profile classification is presented, and it is shown to be equivalent to training a topographic mapping neural network (see Kohonen) on each of the range profile categories to be classified. The topographic mapping method is basically a Euclidean distance method of classifying range profiles. However, because it is model-based, it offers much more flexibility, and will perform better in situations where there is little training data.

OPEN An adaptive Bayesian network for texture modelling (1993): Bayesian methods are used to analyse the problem of training a model to make predictions about the distribution of data that has yet to be received. Mixture distributions emerge naturally from this framework, but are not well-matched to high-dimensional problems such as arise image processing applications. An extension to partitioned mixture distributions (PMD) is presented, which is essentially a set of overlapping mixture distributions, and an expectation-maximisation training algorithm is derived. Finally, the results of some numerical simulations are presented, which demonstrate that lateral inhibition arises naturally in PMDs, and that the nodes in a PMD co-operate in such a way that each mixture distribution receives a full complement of what is needed for it to compute a mixture distribution.

OPEN An adaptive Bayesian network for low-level image processing (1993): Probability calculus, based on the axioms of inference (Cox), is the only consistent scheme for performing inference; this is also known as Bayesian inference. The objects which this approach manipulates, namely probability density functions (PDFs), may be created in a variety of ways, but the focus of this paper is on the use of adaptive PDF networks. Adaptive mixture distribution (MD) networks are already widely used, (Luttrell). In this paper an extension of the standard MD approach is presented; it is called a partitioned mixture distribution (PMD). PMD networks are designed specifically to scale sensibly to high-dimensional problems, such as image processing. Several numerical simulations are performed which demonstrate that the emergent properties of PMD networks are similar to those of biological low-level vision processing systems.

OPEN Adaptive Bayesian networks (1992): The theory of adaptive Bayesian networks is summarised. A detailed discussion of the Adaptive Cluster Expansion (ACE) network is presented. ACE is a scalable Bayesian network designed specifically for high-dimensional applications,such as image processing.

OPEN Gibbs distribution theory of adaptive n-tuple networks (1992): In this paper it is demonstrated that the theory of optimising Gibbs distributions leads to training schemes for n-tuple networks. Both unsupervised and supervised networks emerge naturally from this analysis.

OPEN Self-supervised training of hierarchical vector quantisers (1991): In (Luttrell) we developed a hierarchical vector quantisation (VQ) model, and in (Luttrell) we successfully applied it to time series and image compression respectively. The goal of this paper is to derive an extension to this model, in which we backpropagate signals from higher to lower layers of the hierarchy to self-supervise the training of the VQ. We review the basic properties of our VQ model and its relationship to neural network methods. We extend the model to an ensemble of VQs, and we derive its properties in the limit of a large codebook size (i.e. the continuum limit). Finally, we demonstrate how self-supervision emerges naturally in this type of model.

OPEN A hierarchical network for clutter and texture modelling (1991): The presence of clutter complicates the location of targets in time series and images. Various types of adaptive clutter model have been proposed to deal with this problem. In this paper we treat clutter as a type of texture, and we propose a novel type of hierarchical Gibbs distribution texture model. To optimise this type of model, we define a relative entropy cost function which we decompose into a sum over a number of terms, each of which can be interpreted as the mutual information between clusters of samples of the data. Furthermore we show how the various terms of this cost function can be used to construct an image-like representation of the relative entropy. Finally, using a Brodatz texture image, we present an example of this type of decomposition, and demonstrate that a statistical anomaly in the Brodatz texture image can be easily located.

OPEN Hierarchical self-organising networks (1989): Most neural networks are parametric models, the parameter values of which are chosen (by some training scheme) to optimise some appropriately chosen cost function. We shall derive a training scheme for a non-parametric neural network, which leads to the vector quantiser. Then we shall introduce a new principle - the robust hidden layer principle - in order to relate the vector quantiser to self-organising neural networks. Finally we shall demonstrate how hierarchical self-organising neural networks may be constructed by further application of the robust hidden layer principle.

OPEN Self-organisation: a derivation from first principles of a class of learning algorithms (1989): We present a novel derivation of Kohonen's topograhic mapping learning algorithm. Thus we prescribe a vector quantiser by minimising an L2 reconstruction distortion measure. We include in this distortion a contribution from the code noise which corrupts the output of the vector quantiser. Such code noise models the expected distorting effect of later stages of processing, and thus provides a convenient way of ensuring that the vector quantiser acquires a useful coding scheme. The neighbourhood updating scheme of Kohonen's self-organising neural network emerges as a special case of this code noise model. This reformulation of Kohonen's algorithm provides a simple interpretation of the role of the neighbourhood update scheme which is used.

OPEN The use of Bayesian and entropic methods in neural network theory (1988): There has been much interest recently in the use of neural networks to solve complicated information processing problems such as those which arise in signal and image processing. In this paper we review Markov random field (MRF) neural network techniques for representing joint probability density functions (PDF). The 'Boltzmann machine' serves as the paradigm, and we present a generalised version of its learning algorithm. We also present a technique for designing MRF potentials with low information redundancy for modelling image texture. To improve further the computational efficiency of such neural networks we introduce a novel method of cluster decomposing a PDF by using topographic mappings. The outcome of this programme is a means of designing sampling functions for extracting information from datasets (typically images).

OPEN Self-organising multilayer topographic mappings (1988): We use a minimum distortion measure argument to derive the need for topographic mappings in unsupervised multilayer networks, and we perform some numerical experiments to demonstrate the power of multilayer topographic mappings.

OPEN Image compression using a neural network (1988): Data compression of speckled images poses a non-trivial model identification problem. We train an unsupervised neural network on a set of archetype images in order to form an internal representation (or model) of the image features. We find that a multi-layer topographic mapping network has the necessary properties successfully to compress and reconstruct imagery. We show how to extend and improve upon existing learning algorithms for this type of network, and we express the network learning dynamics as a diffusion equation. We then present some examples of the application of this technique to synthetic aperture radar images.

OPEN Markov random fields: a strategy for clutter modelling (1987): We briefly review the need for models of image texture (or clutter). The use of detailed physical models is prohibited by the difficulty of characterising the scatterer distribution and the scattering process which cause the texture, and so we resort to phenomenological modelling. We propose that certain statistics of the image be measured, and that synthetic images which are consistent with these measurements be generated by a Monte Carlo method - this allows us to test the need for further statistics. We point out the relationship of this scheme to the well-known maximum entropy method for constructing least committal probability distributions. Furthermore we show how a Markov random field model is generated by this process. Finally we suggest some methods of selecting useful statistics, which we demonstrate by analysing a synthetic aperture radar image of a wood and a sonar image of the sea bed.

OPEN The use of Markov random field models in sampling scheme design (1987): I use information theoretic techniques to derive schemes for the Bayesian analysis of images with spatially homogeneous statistical properties. In any particular case the scheme is equivalent to deducing the structure of the Markov random field which models the data. This scheme may also be viewed as a generalised sampling technique where the data is reduced by a set of sampling functions to a more compact set of data, which nevertheless retains all the information content of the original data.

OPEN Designing Markov random field structures for clutter modelling (1987): A thorough understanding of clutter statistics is a prerequisite for the successful analysis of radar images. Usually very simple statistics such as moments and correlation properties are used, perhaps based on an underlying physical model of the scattering and imaging process. In this paper we use the maximum entropy method to reconstruct clutter probability density functions (PDF) from observed statistical properties; this leads to representations of clutter in terms of Markov random fields (MRF). Furthermore we show how the set of statistics which is used for each clutter type may be optimised in order to yield a more compact probabilistic model. The principal advantage of our results is that MRF clutter models may be mapped directly onto parallel image processing hardware, and they provide a rigorous framework for Bayesian decision making concerning the presence of objects embedded in clutter. Image segmentation is another very useful application of these MRF models.

OPEN Markov random field image models of targets and clutter (1987): We review Markov random field theories of radar clutter, and we explain their structure in terms of sampling functions and information theory. We explain some recent advances in Markov random field theories of target imaging, and explain their relationship in information theoretic terms to super-resolution.

OPEN The performance of a parallel super-resolution algorithm for synthetic aperture radar images (1989): In a previous paper we have described an effective super-resolution algorithm for the detailed analysis of portions of a SAR image. That paper also discussed indirect implementation techniques for the algorithm, with the aim of reducing the very high cost of the algorithm when applied to realistic size images. Despite the large speedups made, even the indirect algorithm remains expensive. We therefore describe here some initial experiments with parallel versions of the original (direct) algorithm, on both the AMT DAP and on transputer arrays. The results show that the algorithm can be very effectively implemented on either architecture.

OPEN A comparative study of the AMT DAP and of transputer array architectures for the super-resolution of synthetic aperture radar images (1988): The processing of SAR images is made more difficult by the high signal:noise ratio inherent in the technique used to form the image. As a result of this high ratio, the mathematical techniques used tend to differ from the standard techniques used in other image processing applications, and tend also to be computationally intensive. There is then obvious interest in the use of parallel architectures to speed up the processing. We consider in this paper the advantages and disadvantages of a SIMD machine (the AMT DAPS10) and a MIMD machine (a transputer array such as the Meiko Computing Surface) for SAR image processing. To provide a concrete test problem, we look at a specific problem, that of Image Super-Resolution.

Book Chapters

OPEN Using stochastic vector quantisers to characterise signal and noise subspaces (2002): In this paper a stochastic generalisation of the standard Linde-Buzo-Gray (LBG) approach to vector quantiser (VQ) design is presented, in which the encoder is implemented as the sampling of a vector of code indices from a probability distribution derived from the input vector, and the decoder is implemented as a superposition of reconstruction vectors. This stochastic VQ (SVQ) is optimised using a minimum mean Euclidean reconstruction distortion criterion, as in the LBG case. Numerical simulations are used to demonstrate how this leads to self-organisation of the SVQ, where different stochastically sampled code indices become associated with different input subspaces.

OPEN Self-organised modular neural networks for encoding data (1999): It is shown how a neural network can be optimised so that multiple interlinked network modules emerge by self-organisation. The processing task chosen to illustrate this is encoding high-dimensional data, such as images,where multiple network modules implement a factorial encoder, in which the high-dimensional data space is broken up into a number of low-dimensional subspaces, each of which is separately encoded. This type of factorial encoder emerges through a process of self-organisation, provided that the input data lies on a curved manifold, as is indeed the case in image processing applications.

OPEN A theory of self-organising neural networks (1997): The purpose of this paper is to present a probabilistic theory of self-organising networks based on the results published in (Luttrell). This approach allows vector quantisers and topographic mappings to be treated as different limiting cases of the same theoretical framework. The full theoretical machinery allows a visual cortex-like network to be built.

OPEN The emergence of dominance stripes and orientation maps in a network of firing neurons (2000): This chapter addresses the problem of training a self-organising neural network on images derived from multiple sources; this type of network potentially may be used to model the behaviour of the mammalian visual cortex (for a review of neural network models of the visual cortex see Swindale. The network that will be considered is a soft encoder which transforms its input vector into a posterior probability over various possible classes (i.e. alternative possible interpretations of the input vector). This encoder will be optimised so that its posterior probability is able to retain as much information as possible about its input vector, as measured in the minimum mean square reconstruction error (i.e. L2 error) sense (Luttrell).

OPEN Designing analysable networks (1995): In this section a unified theoretical model of unsupervised neural networks based on the theoretical ideas in (Luttrell) is presented. The analysis starts with a probabilistic model of the discrete neuron firing events that occur when a set of neurons is exposed to an input vector, and then uses Bayes' theorem to build a probabilistic description of the input vector from knowledge of the firing events. This sets the scene for unsupervised training of the network, by minimisation of the expected value of a distortion measure between the true input vector and the input vector inferred from the firing events. Various models of this type are investigated. For instance, if the model of the neurons permits firing to occur only within a defined cluster of neurons, and further, that only one firing event is observed, then the theory approximates the well-known topographic mapping network (Kohonen).

OPEN Inference Theory (1989): This transcript reviews the need for low-level stochastic image models, especially for coherent images which are corrupted by speckle noise. A unified approach to inference in such models using Bayes' rule and Shannon's information theory is presented. In the gaussian PDF approximation exact results for sampling scheme optimisation and for inference from sample values are derived (super-resolution in particular is concentrated on). For non-gaussian PDFs the method of graphs for constructive PDF generation is outlined, and several research trends are indicated.

OPEN Prior knowledge in synthetic aperture radar (SAR) processing (1985): All that we know of the world about us has been deduced from data collected by our senses (eyes, ears, etc), and those of our ancestors. Usually we employ sophisticated transducers (microscopes, telescopes, radars, etc) to transform signals into a form which our senses can 'see', therefore we require an accurate model of the transducer(s) if we are to know how our sensory data is acquired. Furthermore we require a model of the source of the signals being transduced in order to 'make sense of' this data. We shall call this model 'prior knowledge', because it exists in advance of acquiring the data. Such prior knowledge manifests itself with various degrees of complication. In some cases the model may leave very little undetermined, and it remains to determine the values of a finite umber of parameters from the data. This type of situation arises when we take measurements of a well known and controlled phenomenon. In other cases the model may provide only a weak constraint on the source of the signal. This may be for two basic reasons: we are ignorant of what we are observing, or we have a detailed model but it contains a large number of parameters. It could be argued that these are the same reason! In all cases however the model is essential if the data is to be interpreted at all.

RSRE/DRA/DERA Reports

OPEN Invariance discovery by vector quantisation of noisy data (2000): This report demonstrates how a vector quantiser for encoding noisy or distorted data, with the intention of eventually recovering the undistorted data, may be trained to produce results that are invariant with respect to the unwanted noise degrees of freedom. This is an example of self-organised invariance (or symmetry) discovery.

OPEN Invariance discovery by vector quantisation of noisy data (technical appendix) (2000): This report demonstrates how a vector quantiser for encoding noisy or distorted data, with the intention of eventually recovering the undistorted data, may be trained to produce results that are invariant with respect to the unwanted noise degrees of freedom. This is an example of self-organised invariance (or symmetry) discovery.

OPEN A user's guide to stochastic encoder/decoders (1999): The overall goal of this research is to develop the theory and practice of self-organising networks that can discover objects and correlations in data, and the application of this to the fusion of data derived from multiple sensors. The purpose of this report is to give a practical introduction to self-organising stochastic encoder/decoders, in which each input vector is encoded as a stochastic sequence of code indices, and then decoded as a superposition of the corresponding sequence of code vectors. Mathematica software for implementing this type of encoder/decoder is presented, and numerical simulations are run to illustrate a variety of emergent properties.

OPEN Encoding data from a curved manifold (1998): In this report it is shown how a neural network can be optimised so that a factorial encoder (which splits the input into its constituents) emerges as the preferred scheme for encoding data derived from a curved input manifold. This is a useful general result, because input manifolds are usually curved, and also factorial encoders require far fewer coding resources than brute force quantisation of the whole input manifold.

OPEN A Bayesian analysis of complementary approaches to training self-organising stochastic networks (1998): It is shown that two apparently different approaches to the optimisation of unsupervised neural networks both emerge from the same theory of network optimisation. The objective function which achieves this unification measures the average Euclidean reconstruction distortion that occurs when the network encodes its input, and then subsequently attempts to reconstruct its input. The two approaches then emerge as different ways of encoding the input using the same network: either a fixed number of neural firing events is used as the coded version of the input, or all the firing events that occur during a fixed time interval are used.

OPEN A unified theory of density models and auto-encoders (1997): This report introduces an objective function for simultaneously optimising the density model and transition matrices of a Markov source. The chosen objective function seeks to minimise the average total number of bits that is required to encode the joint state of the Markov source. This may be applied to the problem of optimising the bottom-up (recognition model) and top-down (generative model) connections in a multilayer neural network. This approach unifies many previous results on the optimisation of multilayer unsupervised neural networks.

OPEN Spatio-temporal inference theory: final report (1 April 1994-31 March 1997) (1997): This report summarises all of the research that has been done in 'Spatio-Temporal Inference Theory' research programme during the period 1 April 1994-31 March 1997, and explains the relationship between the various strands of research.

OPEN A self-organising visual cortex network (VICON) for processing data from two imaging sensors (1996): A self-organising neural network is presented that is based on a rigorous Bayesian analysis of the information contained in individual neural firing events. This leads to a visual cortex network (VICON) that has many of the properties emerge when a mammalian visual cortex is exposed to data arriving from two imaging sensors (i.e. the two retinae), such as dominance stripes and orientation maps. Eventually, these techniques might be extended an arbitrary number of sensors to yield a self-organising data fusion network.

OPEN Optimal posterior probabilities for self-organised neural networks (1996): This report presents some results for the explicit algebraic optimisation of encoder/decoder neural networks. The optimal form for the neuron response functions are derived in many cases. If the network is allowed to make only one attempt at encoding each input pattern then the optimal network is winner-take-all, whereas if more than one attempt is allowed then the optimal network has more than one winning neuron. These results on multiple attempts at encoding may be used to improve the performance of a vector quantiser without greatly increasing the size of the codebook, as would normally be the case. This is a probabilistic theory of vector quantisers.

OPEN Dynamical partitioned mixture distributions: an introduction (1995): This report describes how to apply a partitioned mixture distribution (PMD) to the problem of tracking a target in clutter. The basic theoretical machinery of PMDs is first extended to allow it to be applied to data which consists of a time series of input vectors: this is called a dynamical PMD. Various approximations to the theory are then made in order to obtain a dynamical system that closely approximates the required dynamical PMD, whilst having a much simpler theoretical form. Computer simulations are then presented which demonstrate the basic capabilities of a dynamical PMD in the context of target tracking problems.

OPEN A componential self-organising neural network (1995): The Bayesian analysis of self-organising maps described in (Luttrell) and the high-dimensional mixture distribution model described in (Luttrell) are used to design a componential self-organising neural network. Numerical simulations demonstrate the emergence of centre-on/surround-off and centre-off/surround-on type neuron responses, and the ability of the network to 'explain' data in terms of multiple 'causes'.

OPEN Range profile classification using topographic mappings (1995): In this report a model-based approach to range profile classification is presented, and it is shown to be equivalent to training a topographic mapping neural network on each of the range profile categories to be classified. The topographic mapping method is very similar to the more empirical cross-correlation method of classifying range profiles. However, because it is model-based, it offers much more flexibility, and will perform better in situations where there is little training data.

OPEN A self-organising network for processing data from multiple sensors (1995): In this report a novel self-organising network is derived from first principles, and numerical simulations are performed to demonstrate some of its emergent properties. The network's most important property is its ability to configure itself to process data from more than one source. The network is a combination of a folded Markov chain and a partitioned mixture distribution, and it exhibits properties that are analogous to those observed in the mammalian visual cortex, such as dominance stripes. This report focusses on a 2-layer 1-dimensional array of network nodes, but the approach may be generalised to multi-layer networks and higher dimensional arrays of nodes.

OPEN The use of mixture distributions to model the density of speckled data (1994): The aim of this work are to produce a theoretically rigorous and computationally tractable means of adaptively modelling the statistical properties of speckled data. The longer-term aim of this work is to lay the groundwork for more sophisticated adaptive models that will be introduced in the future. For the purpose of demonstrating principles the assumption is made that the data is samples of the modulus-squared of complex numbers (i.e. intensities), and that the effect of speckle can be modelled as a multiplicative negative exponential noise process. The method used is maximum likelihood optimisation of a mixture distribution model, in which the components of the mixture are negative exponentials of various decay rates. The programming language Mathematica is used to implement this approach. The results obtained show that negative exponential mixture distributions can very closely model the statistical properties of speckled data. The number of mixture distribution components need not be excessively high to obtain good results.

OPEN The application of mixture distributions to the detection of linear anomalies in cluttered images: a preliminary survey (1993): This memorandum outlines a new method of detecting anomalies (specifically, linear anomalies) in cluttered images. The novelty of the proposed technique lies in its use of so-called probability images (derived from mixture distributions) as an intermediate representation, to which standard Hough transform techniques are applied in order to locate linear features. The proposed technique is simple enough to be readily implemented in any standard image processing package.

OPEN The Markov chain theory of vector quantisers (1993): In this paper a rigorous approach will be taken, in which Bayesian methods are used to analyse some of the properties of a special type of Markov chain. The forward transitions through the chain are followed by inverse transitions (using Bayes' theorem) backwards through a copy of the same chain; this is called a folded Markov chain. If an appropriately defined Euclidean distortion (between the original input and its 'reconstruction' via Bayes theorem) is minimised in the space of Markov chain transition probabilities, then the theory of vector quantisers and topographic vector quantisers emerges, and the theory of self-supervision in multi-layer unsupervised networks also emerges. This approach is much more compelling than one in which these models are proposed as if they were logically independent constructs. Only the 2 and 3-layer cases are studied in this paper.

OPEN Partitioned mixture distributions: an introduction (1992): This memorandum contains an introduction to the use of 'probability images' as a standard format for the data stored in the layers of multilayer image processing networks. It also contains an introduction to a new type of network model, called the 'partitioned mixture distribution', which is a generalisation of the standard mixture distribution model. Unlike the standard model, this generalisation scales well and is suitable for applying to image-sized datasets. A simple derivation shows that probability images emerge naturally from partitioned mixture distribution models. Furthermore, a simple numerical simulation shows how the nodes in this type of network adapt so as to form a type of 'orientation map', in which each local patch of nodes contains in a spatially ordered fashion all of the machinery that is needed to process a local patch of the image.

OPEN Code vector density in topographic mappings (1992): In this memorandum we present an informally argued derivation of the properties of topographic vector quantisers in the limit of a large codebook size. In particular, we prove that the code vector density does not depend on one's choice of neighbourhood function, provided that we use the minimum distortion (rather than the nearest neighbour) encoding prescription. This result suggests that widespread use of the nearest neighbour prescription in topographic mapping networks is fundamentally misguided. It would be advisable to remember that the nearest neighbour prescription is assumed not derived, so its adherents must accept defeat gracefully.

OPEN Self-supervision in multilayer adaptive networks (1991): We theoretically derive and numerically simulate a new phenomenon called self-supervision, in which the higher layers of a multilayer unsupervised network control the optimisation of the lower layers, even when there is no external supervising teacher present. Self-supervision is a very convenient hybrid, which combines the best properties of unsupervised and supervised network training algorithms.

OPEN A trainable texture anomaly detector using the Adaptive Cluster Expansion (ACE) method (1990): We derive an adaptive hierarchical maximum entropy estimate of probability density functions, whose mathematical structure suggests the name 'adaptive cluster expansion' (ACE). We apply ACE to the problem of locating statistically anomalous regions in otherwise homogeneous textured images, which we demonstrate using several images of Brodatz textures.

OPEN Asymptotic code vector density in topographic vector quantisers (1990): In this memorandum we use a noise-robust vector quantiser model to derive expressions for the asymptotic code vector density ρ in various types of topographic vector quantisers. A topographic vector quantiser is not identical to a standard (i.e. Kohonen) topographic mapping, but the differences are minimal, In all the cases that we study (scalar and vector quantisation with various symmetric topographic neighbourhoods) we obtain the asymptotic result ρ ∝ P^(N/(N+2)), where N is the input dimensionality and P is the input probability density. Thus the asymptotic code vector densities of a topographic vector quantiser and a standard vector quantiser are the same.

OPEN A Bayesian derivation of an iterative autofocus/super-resolution algorithm (1989): We derive an estimate-maximise (EM) formulation of a Bayesian super-resolution algorithm for reconstructing scattering cross sections from coherent images - this renders obsolete all past attempts to derive such a scheme. We extend the analysis to the case of simultaneous super-resolution and autofocussing, which corrects the damage caused by an uncertain point spread function. Finally, we demonstrate our method by presenting the results of some simple numerical experiments.

OPEN The complex point spread function of the RSRE SAR (1987): Super-resolution of SAR data requires that the complex point spread function (PSF) of the SAR be accurately known. We propose and implement a method of measuring the PSF which uses existing SAR images of targets as raw data. We find that in addition to the expected distortions of the azimuth response which arise from swing of the antenna and from artefacts of motion compensation processing, there are extensive and unexpected distortions of the range response. Most importantly the range response does not obey the principle of superposition (i.e. it is non-linear), and so SAR PSF calibration must be conducted with care. We estimate the complex SAR PSF when a bright target is present.

OPEN A compact digital communications system part 1: rapid carrier acquisition (1986): The problem of reducing the carrier centre frequency ambiguity in low data rate satellite communications is well known. Prior solutions to this problem have been slow, bulky or expensive. Our proposed solution does not have these drawbacks, and it is easily implemented in readily available digital hardware. This is achieved by using a 1 bit digital emulation of a bank of forced LCR oscillators to achieve an approximate Fourier decomposition.

OPEN The implications of Boltzmann-type machines for SAR data processing: a preliminary survey (1985): We propose that Markov random field models (MRFs) be used as a framework within which to construct models of synthetic aperture radar (SAR) images. We clarify the relationship between this class of models and the Boltzmann machine (BM) of artificial intelligence. We then generalise the BM training procedure and use it to train MRF models. Using this technique we investigate the ability of a simple MRF texture model to learn a texture by maximising a relative entropy objective function. We find that the marriage of MRF models with the BM training procedure is fruitful.

DRA/RSRE Research Notes

OPEN The detection of linear anomalies in images (1992): This report, outlines a new method of detecting anomalies (specifically, linear anomalies) in images. The novelty of the proposed technique lies in its use of so-called probability images (derived from mixture distributions) as an intermediate representation, to which standard Hough transform techniques are applied in order to locate linear features. The proposed technique is simple enough to be readily implemented in any standard image processing package.

OPEN A maximum entropy interpretation of random access memory network computations (1990): We briefly summarise the maximum entropy method and its application to the problem of assigning a probability measure in a state space. We show how one can impose constraints on the marginal probabilities to guarantee that the maximum entropy solution can be interpreted in terms of random access memory network (specifically WISARD) computations. We thus achieve a unification of the theoretical analyses of this type of network and Gibbs distribution (or Markov random field) networks.

OPEN Error measures in adaptive networks (1990): The main purpose of this note is to remove the need to assume that the output of a network has to be trained using an arbitrary choice of error measure, such as L2. The solution to this problem is subtle and indirect, and it requires one to embed the network model in a Bayesian framework. Thus we develop a Bayesian framework for constructing data generation models which may be inverted to yield the posterior probability over classes, and show how maximising relative entropy (of posterior probabilities) leads to adaptive networks that are similar (or identical) to standard network models (such as the multilayer perceptron network, radial basis function network, and hidden Markov model network)

OPEN Vector quantisation of K-distributed data (1990): I apply an extension of the standard LBG vector quantisation algorithm to the problem of encoding K-distributed data, with the intention of eventually recovering the underlying cross section. I also derive some useful approximations for the distortion that this coding scheme introduces.

OPEN Bayesian inference on a tree (1990): Bayesian inference processes information by manipulating probabilities, which, in turn, need to be represented in a form that is amenable to both adaptive training and manipulation. The Boltzmann machine (and its generalisations) is a flexible, but computationally costly, solution to this problem. We propose a computationally cheap replacement in the form of a cluster decomposition of the state space whose probability needs to be represented.

OPEN An EM approach to iterative Bayesian super-resolution with applications to K-distributed data (1989): We derive an expectation-maximisation formulation of an iterative super-resolution algorithm, which complements that presented in. We extend the technique by incorporating prior knowledge of the properties of K-distributed data, and obtain a simple modification to the basic super-resolution algorithm

OPEN The Gibbs Machine applied to hidden Markov model problems. Part 1: Basic theory (1989): I show how a hidden Markov model can be expressed as a Gibbs distribution. I review my Gibbs distribution training algorithm (a 'Gibbs Machine' rather than a 'Boltzmann Machine'), which I use to perform gradient ascent on the relative entropy between the Gibbs distribution and the data distribution. I demonstrate how this reduces to elementary matrix computations of exactly the same form as encountered in the Baum-Welch re-estimation method. Although this toy problem is amenable to Baum-Welch re-estimation, the same cannot be said of non-tree-like Markov models. In such cases I propose that a hybrid Baum-Welch/Gibbs Machine optimisation scheme should be used.

OPEN The relationship between super-resolution and phase imaging of SAR data (1987): I review the real zero conversion (RZC) method of producing a phase image from complex SAR data, and I find that the phase image so produced is trivially related to the output voltage of the SAR receiver. I argue that the only useful type of image processing must be related to specific questions which one asks of the original SAR data, and so phase imaging must be related to the specific demands of the user. Super-resolution imaging is a technique for processing an image to enhance its bandwidth by the introduction of prior knowledge. I shall present super-resolution in such a way as to make clear its relationship to phase imaging, and also make clear that complex images of targets are likely to contain useful phase information. The principal practical consequence of this work is that phase information in SAR images must be retained in the vicinity of targets, but it may be discarded elsewhere.

OPEN A proposal for a new method of calibrating radar sensitivity (1987): We derive an invariant form for the degree of overlap of a pair of probability density functions. The invariance property is such that the functional form of the overlap expression is the same for all underlying coordinate systems which are related by nonsingular transformations. A particular application of these results is in the definition of a new calibration procedure for quantifying the ability of a radar system to discriminate between signal plus noise and noise alone. The invariance property of our definition ensures that the underlying receiver law (e.g. linear or logarithmic) does not need to be known in order to conduct the calibration. We present a practical means of implementing our new calibration procedure, and for ensuring that it is consistent with the old 'tangential method'.

This page was last updated on Thursday, 05 February 2004.