(************** Content-type: application/mathematica ************** CreatedBy='Mathematica 5.0' Mathematica-Compatible Notebook This notebook can be used with any Mathematica-compatible application, such as Mathematica, MathReader or Publicon. The data for the notebook starts with the line containing stars above. To get the notebook into a Mathematica-compatible application, do one of the following: * Save the data starting with the line of stars above into a file with a name ending in .nb, then open the file inside the application; * Copy the data starting with the line of stars above to the clipboard, then use the Paste menu command inside the application. Data for notebooks contains only printable 7-bit ASCII and can be sent directly in email or through ftp in text mode. Newlines can be CR, LF or CRLF (Unix, Macintosh or MS-DOS style). NOTE: If you modify the data for this notebook not in a Mathematica- compatible application, you must delete the line below containing the word CacheID, otherwise Mathematica-compatible applications may try to use invalid cache data. For more information on notebooks and Mathematica-compatible applications, contact Wolfram Research: web: http://www.wolfram.com email: info@wolfram.com phone: +1-217-398-0700 (U.S.) Notebook reader applications are available free of charge from Wolfram Research. *******************************************************************) (*CacheID: 232*) (*NotebookFileLineBreakTest NotebookFileLineBreakTest*) (*NotebookOptionsPosition[ 208434, 6935]*) (*NotebookOutlinePosition[ 220454, 7298]*) (* CellTagsIndexPosition[ 217713, 7209]*) (*WindowFrame->Normal*) Notebook[{ Cell[CellGroupData[{ Cell["Notes", "Section 1"], Cell[CellGroupData[{ Cell["Editorial Changes", "Subsection"], Cell["\<\ \"Footnotes\" section added to end of the paper to list the single footnote \ that originally appeared at the foot of one page. A hyperlink to this \ footnote has also been added to the body of the paper.\ \>", "Text"], Cell["\<\ \"eg\" changed to \"e.g.\", and \"ie\" changed to \"i.e.\" throughout the \ paper.\ \>", "Text"], Cell[TextData[{ "Notation \"", Cell[BoxData[ \(TraditionalForm\`\[SelectionPlaceholder]\_\(\[SelectionPlaceholder]\ \ \[SelectionPlaceholder]\)\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\[SelectionPlaceholder]\_\(\[SelectionPlaceholder], \ \[SelectionPlaceholder]\)\)]], "\" throughout the paper." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change1", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change2", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change3", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "Figure moved to a more appropriate point in the text." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["PROBLEM", ButtonData:>"Ed:Problem1", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["PROBLEM", ButtonData:>"Ed:Problem2", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "Notation \"", Cell[BoxData[ \(TraditionalForm\`\[SelectionPlaceholder]\_\(\[SelectionPlaceholder]\ \ \[SelectionPlaceholder]\)\)]], "\" used in figure has not yet been changed to \"", Cell[BoxData[ \(TraditionalForm\`\[SelectionPlaceholder]\_\(\[SelectionPlaceholder], \ \[SelectionPlaceholder]\)\)]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change4", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], "/", \(Z(c)\)}], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ FormBox[ FractionBox[ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], \(Z(c)\)], TraditionalForm]]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change5", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"are\" changed to \"are:\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change6", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ RowBox[{ SubscriptBox[ StyleBox["Q", FontWeight->"Plain"], "c"], "/", \(Z\_c\)}], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ FormBox[ FractionBox[ SubscriptBox[ StyleBox["Q", FontWeight->"Plain"], "c"], \(Z\_c\)], TraditionalForm]]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change7", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"essentailly\" changed to \"essentially\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change8", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ RowBox[{"log", "[", RowBox[{ RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], "/", RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}]}], "]"}], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ FormBox[ RowBox[{"log", "[", FractionBox[ RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}]], "]"}], TraditionalForm]]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change9", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`1/\@N\_0\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`1\/\@N\_0\)]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change11", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"Whence\" changed to \"whence\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change10", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change12", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`log\ 2 \[Pi]\ N\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`log\ \((2 \[Pi]\ N)\)\)]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change13", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`\[Epsilon]\_i/q\_i\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\[Epsilon]\_i\/q\_i\)]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change14", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`q\_i/N\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`q\_i\/N\)]], "\", and \"", Cell[BoxData[ \(TraditionalForm\`1/\@N\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`1\/\@N\)]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change15", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`\(\(\[CenterEllipsis]\)\(-\)\)\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\(\(-\[CenterEllipsis]\)\(-\)\)\)]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change16", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`1/\@N\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`1\/\@N\)]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change17", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"posterior probability ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ FormBox[ RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change18", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"cases\" changed to \"cases:\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change19", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"filter\" changed to \"", "filter.", "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change20", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "Equation reformatted to make it more readable." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change21", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`C\^\(1/2\)\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`C\^\(1\/2\)\)]], "\"." }], "Text"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell["\<\ Error measures in adaptive networks \ \>", "Title"], Cell["\<\ Stephen P Luttrell Pattern Processing Principles SP4 division, RSRE Malvern, WORCS, WR14 3PS\ \>", "Author"], Cell["\<\ This paper appeared as SP4 Research Note, No. 111, 9th March 1990.\ \>", "Text"], Cell["Copyright \[Copyright] Controller HMSO, London, 1990.", "Text"], Cell[TextData[{ StyleBox["Abstract", FontWeight->"Bold"], "\n\nThe main purpose of this note is to remove the need to assume that the \ output of a network has to be trained using an arbitrary choice of error \ measure, such as ", Cell[BoxData[ \(TraditionalForm\`L\_2\)]], ". The solution to this problem is subtle and indirect, and it requires one \ to embed the network model in a Bayesian framework. Thus we develop a \ Bayesian framework for constructing data generation models which may be \ inverted to yield the posterior probability over classes, and show how \ maximising relative entropy (of posterior probabilities) leads to adaptive \ networks that are similar (or identical) to standard network models (such as \ the multilayer perceptron network, radial basis function network, and hidden \ Markov model network)", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:*", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Abstract"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " Introduction" }], "Section 1"], Cell["Let us begin with a quotation", "Text"], Cell[TextData[{ "\"", StyleBox["It's the collision of disparate ideas that alters one's \ perspective. This is the key to unlocking our tendency towards traditional or \ superficial thinking that leads to stereotypical solutions.", FontSlant->"Italic"], "\"\nJohn Sculley (1987)" }], "Text", CellFrame->True], Cell["\<\ This summarises exactly what we felt when faced with the torrent of \ alternative network designs for data processing. Conventionally, the attitude \ is to regard a network as a \"black box\" that is somehow going to do the job \ of classifying your data, provided that you can structure and train it the \ right way. This is the attitude of mind of a \"signal processor\". On the \ other hand, those who wish to understand what processing is actually going \ on, by deriving their data processing scheme from first principles, have the \ attitude of mind of a \"physicist\" (for want of a better term). A \ theoretical physicist's first reaction when confronted with alternative \ network designs (the \"disparate ideas\" in the quotation) is to try to find \ a viewpoint from which the various alternatives are simply special cases of \ some grander scheme. It is the discovery of this grander scheme that we call \ \"unification\". It represents progress, it simplifies one's way of thinking \ (in the long term), and thus it consolidates separate bodies of work in \ preparation for the next stage of unification (whatever that might be).\ \>", "Text"], Cell[TextData[{ "In ", ButtonBox["\[Section]", ButtonData:>"Sect:2", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:2"], " we shall introduce our scheme as a \"recipe\" which can be applied to \ each situation as appropriate. In ", ButtonBox["\[Section]", ButtonData:>"Sect:3", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:3"], " we shall present the simplest application of our scheme - the adaptive \ linear filter, and in ", ButtonBox["\[Section]", ButtonData:>"Sect:4", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:4"], " we present a Gibbs distribution probability model which extends these \ results to adaptive non-linear filters, which we shall exemplify using the \ two-layer perceptron and radial basis function networks. In ", ButtonBox["\[Section]", ButtonData:>"Sect:5", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:5"], " we briefly repeat the results of ", ButtonBox["\[Section]", ButtonData:>"Sect:4", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:4"], ", but this time using a mixture distribution probability model. Finally, \ in ", ButtonBox["\[Section]", ButtonData:>"Sect:6", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:6"], " we shall apply our scheme to discriminative hidden Markov modelling." }], "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " ", "Unified model" }], "Section", CellTags->"Sect:2"], Cell["\<\ In this section we shall present our unification of adaptive network \ algorithms. Strictly speaking, we should derive this unification from first \ principles, but we prefer to supply the reader with a concrete framework \"up \ front\" to act as a motivation for the somewhat mysterious derivations that \ we shall perform in later sections of this note.\ \>", "Text"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Notation" }], "Subsection"], Cell[TextData[Cell[TextData[{ " ", ButtonBox["OPEN", ButtonData:>{ URL[ "http://www.luttrell.org.uk/papers/sp4_111/fig1.gif"], None}, Active->True, ButtonStyle->"Hyperlink"], " " }]]], "NumberedFigure", TextAlignment->Center, CellTags->{"Fig:1", "Ed:Change1"}], Cell["Forward and inverse probabilities", "Caption"], Cell[TextData[{ "First of all we shall define some notation in ", ButtonBox["figure", ButtonData:>"Fig:1", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:1"], ". ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " is a vector that represents the data, and ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " is a vector class label - ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " and ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " can be discrete or continuous valued, or some combination of these two. \ We shall use a continuous-valued notation in this note with the proviso that \ the appropriate discrete-valued notation should be used where appropriate \ (e.g. replacing integrations by surnmations", ")", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:1", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], ".", "\n" }], "Text"], Cell[TextData[{ ButtonBox["Figure", ButtonData:>"Fig:1", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:1"], " can be interpreted in two ways:" }], "Text"], Cell[TextData[{ "\t\[FilledSmallCircle] Right to left: ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["c", FontWeight->"Bold"], "\[LongRightArrow]", StyleBox["x", FontWeight->"Bold"]}], TraditionalForm]]], " represents ", StyleBox["physical causation", FontSlant->"Italic"], ". A dataset ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " is generated causally from a class label ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], ". The properties of this physical process are described by the probability \ ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], "." }], "Text"], Cell[TextData[{ "\t\[FilledSmallCircle] Left to right: ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["x", FontWeight->"Bold"], "\[LongRightArrow]", StyleBox["c", FontWeight->"Bold"]}], TraditionalForm]]], " represents ", StyleBox["logical inference", FontSlant->"Italic"], ". The knowledge that we can extract from a dataset ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " about the class ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " that might have caused it is described by the probability ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], "." }], "Text"], Cell[TextData[{ "Note the asymmetry between these two interpretations - one is ", StyleBox["physical", FontSlant->"Italic"], " whereas the other is ", StyleBox["logical", FontSlant->"Italic"], "." }], "Text"], Cell[TextData[{ "The two interpretations of ", ButtonBox["figure", ButtonData:>"Fig:1", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:1"], " are related by Bayes' theorem which we may represent symmetrically as" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["c", FontWeight->"Bold"]}], ")"}], "=", RowBox[{ RowBox[{ RowBox[{"P", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}]}], "=", RowBox[{ RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], RowBox[{"P", "(", StyleBox["c", FontWeight->"Bold"], ")"}]}]}]}], TraditionalForm]], "NumberedEquation", CellTags->"Eq:1"], Cell[TextData[{ "In ", ButtonBox["equation", ButtonData:>"Eq:1", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:1"], ") no distinction is made between the r\[OHat]les of ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " and ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " - both ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["c", FontWeight->"Bold"], "\[LongRightArrow]", StyleBox["x", FontWeight->"Bold"]}], TraditionalForm]]], " and ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["x", FontWeight->"Bold"], "\[LongRightArrow]", StyleBox["c", FontWeight->"Bold"]}], TraditionalForm]]], " are regarded as logical inferences. The asymmetry in ", ButtonBox["figure", ButtonData:>"Fig:1", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:1"], " arises because of an effect that Bayes' theorem itself knows nothing \ about, namely physical causation", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:2", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Designing a network" }], "Subsection", CellTags->"Sect:2.2"], Cell[TextData[{ "We shall now enumerate the various stages in the design of a network that \ computes an approximation to ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " - this contains all the information that is needed to classify ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], "." }], "Text"], Cell[TextData[{ "\t1. ", StyleBox["Data generation model", FontSlant->"Italic"], StyleBox[".", FontSlant->"Italic"], " Construct a model ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " of the data generation process ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ". In general, we do not know, and cannot anticipate, all of the structure \ that ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " might possess, so ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " can at best be a somewhat degraded representation of the ideal ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ". However, we should use all means at our disposal (laws of physics, \ symmetries, dimensional analysis, etc) to construct an appropriate model." }], "Text"], Cell[TextData[{ "\t2. ", StyleBox["Posterior probability.", FontSlant->"Italic"], " In practice we need to solve the \"inverse problem\" of deducing ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " from ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], ". This is related to the above data generation model by Bayes' theorem \ (see ", ButtonBox["equation", ButtonData:>"Eq:1", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:1"], ")). Thus the inverse probability ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " is given by" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], "=", RowBox[{ FractionBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}]}], RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}]], "\[LongRightArrow]", FractionBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], RowBox[{"P", "(", StyleBox["c", FontWeight->"Bold"], ")"}]}], RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}]]}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "The replacement ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], "\[LongRightArrow]", RowBox[{"P", "(", StyleBox["c", FontWeight->"Bold"], ")"}]}], TraditionalForm]]], " is not obligatory - it should be made if you happen to know the true \ class prior probabilities ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], ". ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " is the function that we should ideally compute when presented with a \ dataset ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " - it contains all the information that is available about ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], "." }], "Text"], Cell[TextData[{ "\t3. ", StyleBox["Simplified posterior probability.", FontSlant->"Italic"], " Computing ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " raises a number of problems." }], "Text"], Cell[TextData[{ "\t\t\[FilledSmallCircle] ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " is usually a parametric model, whose parameters need to be optimised by \ feeding an adaptive algorithm with examples of ", Cell[BoxData[ FormBox[ RowBox[{"(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["c", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " taken from a \"training set\"", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:3", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"], Cell[TextData[{ "\t\t\[FilledSmallCircle] The problem of adaptively training ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " may be too computationally expensive. In this note we shall insist that \ only \"network\" style algorithms should be allowed when computing ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ". This is clearly a subjective requirement, so no fundamental significance \ should be placed upon the particular choice of \"networks\" that we shall \ use", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:4", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"], Cell[TextData[{ "The solution to these problems is to introduce an approximation to ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " in the form of a simpler parametric model ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]] }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], "\[TildeTilde]", RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "whose parameters must now be adapted according to an appropriate training \ algorithm. The meaning of \"simpler\" is evidently subjective, so the \ particular choice of ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " will reflect the computing resources that are available. In some cases ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], "=", RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}]}], TraditionalForm]]], " might be used." }], "Text"], Cell[TextData[{ "\t4. ", Cell[BoxData[ \(TraditionalForm\`G\)]], "-", StyleBox["maximisation", FontSlant->"Italic"], ". Optirnise the choice of ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " by maximising the relative entropy ", Cell[BoxData[ \(TraditionalForm\`G\)]] }], "Text"], Cell[BoxData[ FormBox[ RowBox[{"G", "\[Congruent]", RowBox[{"-", RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], " ", RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["c", FontWeight->"Bold"]}]], " ", RowBox[{"P", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], RowBox[{"log", "[", FractionBox[ RowBox[{"P", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}]], "]"}]}]}]}]}]}], "\[LessEqual]", "0"}], TraditionalForm]], "NumberedEquation", SpanMaxSize->Infinity], Cell[TextData[{ "with respect to the parameters contained in ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ". In ", ButtonBox["appendix A", ButtonData:>"Appendix:A", ButtonStyle->"Hyperlink"], " we provide an operational justification for the use of ", Cell[BoxData[ \(TraditionalForm\`G\)]], " as a measure for comparing probabilities." }], "Text"], Cell[TextData[{ "\t5. ", Cell[BoxData[ \(TraditionalForm\`E\)]], "-", StyleBox["minimisation.", FontSlant->"Italic"], " Relate ", Cell[BoxData[ \(TraditionalForm\`G\)]], " to an \"error measure\" ", Cell[BoxData[ \(TraditionalForm\`E\)]], " by defining" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{"E", "\[Congruent]", RowBox[{"-", RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], " ", RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["c", FontWeight->"Bold"]}]], " ", RowBox[{"P", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], "log", " ", RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}]}]}]}]}]}]}], TraditionalForm]], "NumberedEquation", CellTags->"Eq:5"], Cell[TextData[{ "Minimising ", Cell[BoxData[ \(TraditionalForm\`E\)]], " is equivalent to maximising ", Cell[BoxData[ \(TraditionalForm\`G\)]], " (with respect to the choice of ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], "). In certain cases ", Cell[BoxData[ \(TraditionalForm\`E\)]], " can take on a familiar form such as average squared error, which is why \ we call ", Cell[BoxData[ \(TraditionalForm\`E\)]], " an \"error measure\" in hindsight." }], "Text"], Cell[TextData[{ "Note how we proceed from an explicit (possibly physical) model of data \ generation ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ", via Bayes' theorem to obtain the posterior probability ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " (together with a simplified model ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], "), and use ", Cell[BoxData[ \(TraditionalForm\`G\)]], "-maximisation to optimise ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ". Up to this point everything is expressed entirely in terms of \ probabilities." }], "Text"], Cell[TextData[{ "We introduce the final stage in which we define an equivalent error \ measure ", Cell[BoxData[ \(TraditionalForm\`E\)]], " in ", ButtonBox["equation", ButtonData:>"Eq:5", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:5"], ") in order to make contact with the conventional way of training adaptive \ models. Note that we do not regard ", Cell[BoxData[ \(TraditionalForm\`E\)]], "-minimisation as the fundamental issue at all, rather we insist on \ developing our model in terms of probabilities (using ", Cell[BoxData[ \(TraditionalForm\`G\)]], "-maximisation) with the expectation that an equivalent error measure will \ emerge from the analysis. On the other hand, those who regard ", Cell[BoxData[ \(TraditionalForm\`E\)]], "-minirnisation as being fundamental might argue that one could always \ deduce a posterior probability ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " that would give rise to their chosen ", Cell[BoxData[ \(TraditionalForm\`E\)]], Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:5", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], ". This is a peculiar way to work - one normally invokes one's model first \ of all, and then deduces its consequences." }], "Text"], Cell[TextData[{ "Note that we shall not make a distinction between computing ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " itself and computing the set of functions (statistics) on which ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " depends. In some cases it is appropriate to compute ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " itself (e.g. a hidden Markov model), and in others the statistics will \ suffice (e.g. a Gaussian probability is determined uniquely by its mean and \ covariance). Most supervised network models output statistics rather than \ probabilities, which we must relate to various statistics of ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " in our Bayesian model. In ", ButtonBox["appendix B", ButtonData:>"Appendix:B", ButtonStyle->"Hyperlink"], " we summarise the Pitman-Koopman theorem which states the class of \ probability laws that can be completely characterised by \"sufficient \ statistics\"." }], "Text"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " One-layer network: adaptive linear filter" }], "Section", CellTags->"Sect:3"], Cell["\<\ In this section we shall introduce a Bayesian model that is exactly \ equivalent to an adaptive linear filter.\ \>", "Text"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Bayesian model" }], "Subsection"], Cell["The stages in building this model are:", "Text"], Cell[TextData[{ "\t1. ", StyleBox["Data generation model.", FontSlant->"Italic"], " Let ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " be a linearly filtered version of ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " with zero mean additive Gaussian noise superimposed after the filter \ operation. Thus" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], "=", RowBox[{\(1\/\@\(det(2 \[Pi]\ N)\)\), RowBox[{"exp", "[", RowBox[{\(-\(1\/2\)\), SuperscriptBox[ RowBox[{"(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "-", RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}]}], ")"}], "T"], RowBox[{\(N\^\(-1\)\), "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "-", RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}]}], ")"}]}], "]"}]}]}], TraditionalForm]], "NumberedEquation", SpanMaxSize->Automatic], Cell[TextData[{ "where ", Cell[BoxData[ \(TraditionalForm\`S\)]], " is a linear operator that represents the filter, and ", Cell[BoxData[ \(TraditionalForm\`N\)]], " is a positive definite noise covariance matrix. This models ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " as a \"coloured noise\" process whose mean is determined by ", Cell[BoxData[ FormBox[ RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], TraditionalForm]]], "." }], "Text"], Cell[TextData[{ "\t2. ", StyleBox["Posterior probability.", FontSlant->"Italic"], " Let us model the prior probability ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], " as as zero mean Gaussian with covariance ", Cell[BoxData[ \(TraditionalForm\`A\)]] }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], "=", RowBox[{\(1\/\@\(det(2 \[Pi]\ A)\)\), RowBox[{"exp", "[", RowBox[{\(-\(1\/2\)\), SuperscriptBox[ StyleBox["c", FontWeight->"Bold"], "T"], \(A\^\(-1\)\), StyleBox["c", FontWeight->"Bold"]}], "]"}]}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "This models ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " as a \"coloured noise\" process. Using Bayes' theorem (see ", ButtonBox["equation", ButtonData:>"Eq:1", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:1"], ")) and performing some simple algebra (see appendix A of [", ButtonBox["2", ButtonData:>"Ref:Luttrell1989a", ButtonStyle->"Hyperlink"], "] where we present an analogous derivation for a coherent imaging model) \ leads eventually to" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], "=", RowBox[{\(1\/\@\(det(2 \[Pi]\ C)\)\), RowBox[{"exp", "[", RowBox[{\(-\(1\/2\)\), SuperscriptBox[ RowBox[{"(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "-", RowBox[{\(L\_0\), StyleBox["x", FontWeight->"Bold"]}]}], ")"}], "T"], RowBox[{\(C\^\(-1\)\), "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "-", RowBox[{\(L\_0\), StyleBox["x", FontWeight->"Bold"]}]}], ")"}]}], "]"}]}]}], TraditionalForm]], "NumberedEquation", SpanMaxSize->Automatic], Cell[TextData[{ "where ", Cell[BoxData[ \(TraditionalForm\`L\_0\)]], " is expressed in terms of the posterior covariance ", Cell[BoxData[ \(TraditionalForm\`C\)]], " of ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " as" }], "Text"], Cell[BoxData[{ \(TraditionalForm\`C = \[AlignmentMarker]\((\(S\^T\) \(N\^\(-1\)\) S + \ A\^\(-1\))\)\^\(-1\)\), "\n", \(TraditionalForm\`L\_0 = \[AlignmentMarker]\(C\^\(-1\)\) \(S\^T\) N\^\(-1\)\)}], "NumberedEquation", TextAlignment->AlignmentMarker, CellTags->"Eq:9"], Cell[TextData[{ "It is also convenient also to write ", Cell[BoxData[ \(TraditionalForm\`L\_0\)]], " using the covariance ", Cell[BoxData[ \(TraditionalForm\`M\)]], " of ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " as" }], "Text"], Cell[BoxData[{ \(TraditionalForm\`M = \[AlignmentMarker]S\ A\ S\^T + N\), "\n", \(TraditionalForm\`L\_0 = \[AlignmentMarker]A\ \(S\^T\) M\^\(-1\)\)}], "NumberedEquation", TextAlignment->AlignmentMarker, CellTags->"Eq:10"], Cell[TextData[{ "\t3. ", StyleBox["Simplified posterior probability.", FontSlant->"Italic"], " In this case the computations are simple enough that we may set ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], "=", RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}]}], TraditionalForm]]], ". However, let us leave the matrix ", Cell[BoxData[ \(TraditionalForm\`L\_0\)]], " to be determined by adaptive training. Thus we shall parameterise a class \ of models using a matrix ", Cell[BoxData[ \(TraditionalForm\`L\)]], " of coefficients" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], "=", RowBox[{\(1\/\@\(det(2 \[Pi]\ C)\)\), RowBox[{"exp", "[", RowBox[{\(-\(1\/2\)\), SuperscriptBox[ RowBox[{"(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "-", RowBox[{"L", " ", StyleBox["x", FontWeight->"Bold"]}]}], ")"}], "T"], RowBox[{\(C\^\(-1\)\), "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "-", RowBox[{"L", " ", StyleBox["x", FontWeight->"Bold"]}]}], ")"}]}], "]"}]}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "Clearly ", Cell[BoxData[ \(TraditionalForm\`L = L\_0\)]], " will recover ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], "=", RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}]}], TraditionalForm]]], ", so we have not made any compromises in our simplified model." }], "Text"], Cell[TextData[{ "\t4. ", Cell[BoxData[ \(TraditionalForm\`G\)]], "-maximisation and ", Cell[BoxData[ \(TraditionalForm\`E\)]], "-minimisation. We may proceed directly to construct ", Cell[BoxData[ \(TraditionalForm\`E\)]], " (because ", Cell[BoxData[ \(TraditionalForm\`G\)]], " is only trivially different from ", Cell[BoxData[ \(TraditionalForm\`E\)]], "), to obtain" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{"E", "=", RowBox[{ RowBox[{\(1\/2\), RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], " ", StyleBox[ RowBox[{"d", StyleBox["c", FontWeight->"Bold"]}]], " ", RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["c", FontWeight->"Bold"]}], ")"}], SuperscriptBox[ RowBox[{"(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "-", RowBox[{"L", " ", StyleBox["x", FontWeight->"Bold"]}]}], ")"}], "T"], RowBox[{\(C\^\(-1\)\), "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "-", RowBox[{"L", " ", StyleBox["x", FontWeight->"Bold"]}]}], ")"}]}]}]}], "+", "constant"}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "This is an ", Cell[BoxData[ \(TraditionalForm\`L\_2\)]], " error measure, and we may solve in closed form for the ", Cell[BoxData[ \(TraditionalForm\`L\)]], " that minimises ", Cell[BoxData[ \(TraditionalForm\`E\)]], ". We obtain (after some algebra that we present in ", ButtonBox["appendix C", ButtonData:>"Appendix:C", ButtonStyle->"Hyperlink"], ") the solution ", Cell[BoxData[ \(TraditionalForm\`L = L\_0\)]], " that minimises ", Cell[BoxData[ \(TraditionalForm\`E\)]], " as" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{\(L\_0\), "=", RowBox[{ RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["c", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["x", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}], SuperscriptBox[ RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["x", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["x", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}], \(-1\)]}]}], TraditionalForm]], "NumberedEquation", CellTags->"Eq:13"], Cell[TextData[{ "This result does not depend on ", Cell[BoxData[ \(TraditionalForm\`C\)]], ", provided that it is positive definite. The inverse covariance matrix ", Cell[BoxData[ FormBox[ SuperscriptBox[ RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["x", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["x", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}], \(-1\)], TraditionalForm]]], " exists because we assumed that the data noise covariance matrix ", Cell[BoxData[ \(TraditionalForm\`N\)]], " was positive definite." }], "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Relationship between covariances" }], "Subsection"], Cell[TextData[{ "We shall now verify that the ", Cell[BoxData[ \(TraditionalForm\`L\_0\)]], " in ", ButtonBox["equation", ButtonData:>"Eq:9", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:9"], ") and ", ButtonBox["equation", ButtonData:>"Eq:10", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:10"], ") are the same as the ", Cell[BoxData[ \(TraditionalForm\`L\_0\)]], " in ", ButtonBox["equation", ButtonData:>"Eq:13", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:13"], "). In the Ganssian model we obtain" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["x", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["x", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}], "=", "\[AlignmentMarker]", RowBox[{ RowBox[{"\[LeftAngleBracket]", RowBox[{ RowBox[{"(", RowBox[{ RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], "+", StyleBox["n", FontWeight->"Bold"]}], ")"}], SuperscriptBox[ RowBox[{"(", RowBox[{ RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], "+", StyleBox["n", FontWeight->"Bold"]}], ")"}], "T"]}], "\[RightAngleBracket]"}], "\[IndentingNewLine]", "=", "\[AlignmentMarker]", RowBox[{ RowBox[{ RowBox[{"S", RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["c", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["c", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}], \(S\^T\)}], "+", RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["n", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["n", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}], "+", RowBox[{"S", RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["c", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["n", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}]}], "+", RowBox[{ RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["n", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["c", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}], \(S\^T\)}]}], "\[IndentingNewLine]", "=", "\[AlignmentMarker]", \(\(S\ A\ S\^T + N\)\[IndentingNewLine]\(\(=\)\(\[AlignmentMarker]\)\(M\)\)\)}]\ }]}], TraditionalForm]], "NumberedEquation", TextAlignment->AlignmentMarker], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["c", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["x", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}], "=", "\[AlignmentMarker]", RowBox[{ RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["c", FontWeight->"Bold"], " ", SuperscriptBox[ RowBox[{"(", RowBox[{ RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], "+", StyleBox["n", FontWeight->"Bold"]}], ")"}], "T"]}], "\[RightAngleBracket]"}], "\[IndentingNewLine]", "=", "\[AlignmentMarker]", RowBox[{ RowBox[{ RowBox[{ RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["c", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["c", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}], \(S\^T\)}], "+", RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["c", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["n", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}]}], "\[IndentingNewLine]", "=", "\[AlignmentMarker]", \(A\ S\^T\)}]}]}], TraditionalForm]], "NumberedEquation", TextAlignment->AlignmentMarker], Cell[TextData[{ "In order to simplify these expressions we have used ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["c", FontWeight->"Bold"], " ", SuperscriptBox[ StyleBox["n", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}], "=", "0"}], TraditionalForm]]], ". Thus ", ButtonBox["equation", ButtonData:>"Eq:10", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:10"], ") reduces to ", ButtonBox["equation", ButtonData:>"Eq:13", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:13"], "), as required." }], "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Discussion" }], "Subsection"], Cell[TextData[{ "In summary, we have shown how an adaptive linear network for minimising \ the mean squared output error ", Cell[BoxData[ \(TraditionalForm\`E\)]], " emerges from maximising the relative entropy ", Cell[BoxData[ \(TraditionalForm\`G\)]], " between the true posterior probability ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], StyleBox["|", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " and an appropriately chosen model ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], StyleBox["|", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], ". With hindsight this equivalence seems to be rather trivial, but we have \ presented it to show the various steps that are needed to establish a maximim \ relative entropy probability model as being equivalent to a minimum mean \ squared error network model." }], "Text"], Cell["\<\ There are several features of the model that are essential in order to derive \ the minimum mean squared error result.\ \>", "Text"], Cell[TextData[{ "\t\[FilledSmallCircle] The \"class label\" ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " is a continuous vector-valued variable." }], "Text"], Cell[TextData[{ "\t\[FilledSmallCircle] The \"class label\" ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " has a Gaussian prior probability." }], "Text"], Cell["\<\ \t\[FilledSmallCircle] The data noise has a Gaussian probability \ distribution.\ \>", "Text"], Cell[TextData[{ "Clearly, this is a highly idealised model, and so the adaptive linear \ filter (i.e. the inverse model) should be used with care. Note how our \ Bayesian approach makes it absolutely clear what the adaptive linear filter \ is computing: namely, the sufficient statistics ", Cell[BoxData[ FormBox[ RowBox[{\(L\_0\), StyleBox["x", FontWeight->"Bold"]}], TraditionalForm]]], " of the posterior probability ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], StyleBox["|", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " (under the above model assumptions)", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:6", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " Two-layer network: Gibbs distribution model" }], "Section", CellTags->"Sect:4"], Cell["\<\ In this section we shall introduce a model that leads to a two layer network, \ which includes the two-layer perceptron and the radial basis function \ networks as special cases.\ \>", "Text"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Bayesian model" }], "Subsection"], Cell["The stages in building this model are:", "Text"], Cell[TextData[{ "\t1. ", StyleBox["Data generation model.", FontSlant->"Italic"], " In this model we generate ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " according to a Gibbs distribution whose potentials ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " are modulated by the class label ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], ". Thus" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], "=", RowBox[{ FractionBox["1", RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]], RowBox[{"exp", "[", RowBox[{"-", RowBox[{ SuperscriptBox[ RowBox[{"(", RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], ")"}], "T"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", StyleBox["x", FontWeight->"Bold"], ")"}]}]}], "]"}]}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "where ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " is a \"partition function\" given by" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], StyleBox["=", FontWeight->"Plain"], RowBox[{ StyleBox["\[Integral]", FontWeight->"Plain"], RowBox[{ StyleBox[ RowBox[{ StyleBox["d", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}]], StyleBox[" ", FontWeight->"Plain"], RowBox[{"exp", "[", RowBox[{"-", RowBox[{ SuperscriptBox[ RowBox[{"(", RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], ")"}], "T"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", StyleBox["x", FontWeight->"Bold"], ")"}]}]}], "]"}]}]}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "This is a two layer model because ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " is first of all linearly mapped to ", Cell[BoxData[ FormBox[ RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], TraditionalForm]]], ", which is then used as input to a second stage involving Gibbs \ potentials." }], "Text"], Cell[TextData[{ "\t2. ", StyleBox["Posterior probability.", FontSlant->"Italic"], " Introduce a zero mean Gaussian prior probability with covariance matrix \ ", Cell[BoxData[ \(TraditionalForm\`A\)]] }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], "=", RowBox[{\(1\/\@\(det(2 \[Pi]\ A)\)\), RowBox[{"exp", "[", RowBox[{\(-\(1\/2\)\), SuperscriptBox[ StyleBox["c", FontWeight->"Bold"], "T"], \(A\^\(-1\)\), StyleBox["c", FontWeight->"Bold"]}], "]"}]}]}], TraditionalForm]], "NumberedEquation", CellTags->"Eq:18"], Cell["whence", "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], "\[Proportional]", "\[AlignmentMarker]", RowBox[{ FractionBox["1", RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]], RowBox[{"exp", "[", RowBox[{ RowBox[{ RowBox[{"-", SuperscriptBox[ StyleBox["c", FontWeight->"Bold"], "T"]}], \(S\^T\), RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", StyleBox["x", FontWeight->"Bold"], ")"}]}], "-", RowBox[{\(1\/2\), SuperscriptBox[ StyleBox["c", FontWeight->"Bold"], "T"], \(A\^\(-1\)\), StyleBox["c", FontWeight->"Bold"]}]}], "]"}]}], "\[IndentingNewLine]", "\[Proportional]", "\[AlignmentMarker]", RowBox[{ FractionBox["1", RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]], RowBox[{"exp", "[", RowBox[{\(-\(1\/2\)\), SuperscriptBox[ RowBox[{"(", RowBox[{ StyleBox["c", FontWeight->"Bold"], StyleBox["+", FontWeight->"Plain"], RowBox[{ StyleBox["A", FontWeight->"Plain"], StyleBox[" ", FontWeight->"Plain"], SuperscriptBox[ StyleBox["S", FontWeight->"Plain"], "T"], RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", StyleBox["x", FontWeight->"Bold"], ")"}]}]}], ")"}], "T"], RowBox[{\(A\^\(-1\)\), "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], StyleBox["+", FontWeight->"Plain"], RowBox[{ StyleBox["A", FontWeight->"Plain"], StyleBox[" ", FontWeight->"Plain"], SuperscriptBox[ StyleBox["S", FontWeight->"Plain"], "T"], RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", StyleBox["x", FontWeight->"Bold"], ")"}]}]}], ")"}]}], "]"}]}]}], TraditionalForm]], "NumberedEquation", TextAlignment->AlignmentMarker, SpanMaxSize->Automatic, CellTags->"Eq:19"], Cell[TextData[{ Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " detemines how much \"weight\" is assigned to each ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " by the interaction of the Gibbs potentials ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], ". Usually this is a non-trivial factor, but in special cases ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " can be calculated in closed form." }], "Text"], Cell[TextData[{ "\t3. ", StyleBox["Simplified posterior probability.", FontSlant->"Italic"], " Let us ignore the ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " factor in ", ButtonBox["equation", ButtonData:>"Eq:19", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:19"], ") and introduce a simplified model ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " parameterised by a matrix ", Cell[BoxData[ \(TraditionalForm\`L\)]], " of coefficients" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], "=", RowBox[{\(1\/\@\(det(2 \[Pi]\ A)\)\), RowBox[{"exp", "[", RowBox[{\(-\(1\/2\)\), SuperscriptBox[ RowBox[{"(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "-", RowBox[{"L", " ", RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}]}]}], ")"}], "T"], \(A\^\(-1\)\), RowBox[{"(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "-", RowBox[{"L", " ", RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}]}]}], ")"}]}], "]"}]}]}], TraditionalForm]], "NumberedEquation", CellTags->"Eq:20"], Cell[TextData[{ "This approximation is only good to the extent that the ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " dependence of ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " is much weaker than that of the Gaussian term in ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ". The relative importance of these two factors can only be determined by a \ detailed examination of ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], "." }], "Text"], Cell[TextData[{ "\t4. ", Cell[BoxData[ \(TraditionalForm\`G\)]], "-maximisation and ", Cell[BoxData[ \(TraditionalForm\`E\)]], "-minimisation. We shall now progress straight to the expression for ", Cell[BoxData[ \(TraditionalForm\`E\)]] }], "Text"], Cell[BoxData[ FormBox[ RowBox[{"E", "=", RowBox[{ RowBox[{\(1\/2\), RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], " ", StyleBox[ RowBox[{"d", StyleBox["c", FontWeight->"Bold"]}]], " ", RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["c", FontWeight->"Bold"]}], ")"}], SuperscriptBox[ RowBox[{"(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "-", RowBox[{"L", " ", RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}]}]}], ")"}], "T"], \(A\^\(-1\)\), RowBox[{"(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "-", RowBox[{"L", " ", RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}]}]}], ")"}]}]}]}], "+", "constant"}]}], TraditionalForm]], "NumberedEquation", CellTags->"Eq:21"], Cell["which is the mean of a quadratic form.", "Text"], Cell[TextData[{ "For a given non-linear mapping ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], ", the solution ", Cell[BoxData[ \(TraditionalForm\`L = L\_0\)]], " that minimises ", Cell[BoxData[ \(TraditionalForm\`E\)]], " may be derived as (see ", ButtonBox["appendix C", ButtonData:>"Appendix:C", ButtonStyle->"Hyperlink"], ")" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{\(L\_0\), "=", RowBox[{ RowBox[{"\[LeftAngleBracket]", RowBox[{ StyleBox["c", FontWeight->"Bold"], " ", SuperscriptBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], "T"]}], "\[RightAngleBracket]"}], SuperscriptBox[ RowBox[{"\[LeftAngleBracket]", RowBox[{ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], " ", SuperscriptBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], "T"]}], "\[RightAngleBracket]"}], \(-1\)]}]}], TraditionalForm]], "NumberedEquation", CellTags->"Eq:22"], Cell[TextData[{ "which does not explicitly depend on the choice of ", Cell[BoxData[ \(TraditionalForm\`A\)]], ". ", ButtonBox["Equation", ButtonData:>"Eq:22", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:22"], ") is equivalent to equation (16) of [", ButtonBox["3", ButtonData:>"Ref:WebbLowe1988a", ButtonStyle->"Hyperlink"], "]. We have been somewhat cavalier in inverting ", Cell[BoxData[ FormBox[ RowBox[{"\[LeftAngleBracket]", RowBox[{ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], " ", SuperscriptBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], "T"]}], "\[RightAngleBracket]"}], TraditionalForm]]], " because we have no assurance that this covariance matrix is positive \ definite - a rigorous derivation of ", Cell[BoxData[ \(TraditionalForm\`L\_0\)]], " would take into account potential rank deficiency", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:7", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"], Cell[TextData[{ "Assumiing that ", Cell[BoxData[ \(TraditionalForm\`L\)]], " is optirnised as above, we may optimise ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " by maximising (see ", ButtonBox["appendix C", ButtonData:>"Appendix:C", ButtonStyle->"Hyperlink"], ")" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{"D", "=", RowBox[{"tr", "[", RowBox[{ RowBox[{"(", SuperscriptBox[ RowBox[{"\[LeftAngleBracket]", RowBox[{ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], " ", SuperscriptBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], "T"]}], "\[RightAngleBracket]"}], \(-1\)], ")"}], RowBox[{"(", RowBox[{ RowBox[{"\[LeftAngleBracket]", RowBox[{ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], " ", SuperscriptBox[ StyleBox["c", FontWeight->"Bold"], "T"]}], "\[RightAngleBracket]"}], \(A\^\(-1\)\), RowBox[{"\[LeftAngleBracket]", " ", RowBox[{ StyleBox["c", FontWeight->"Bold"], StyleBox[" ", FontWeight->"Plain"], SuperscriptBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], "T"]}], "\[RightAngleBracket]"}]}], ")"}]}], "]"}]}], TraditionalForm]], "NumberedEquation", SpanMaxSize->Automatic, CellTags->"Eq:23"], Cell[TextData[{ "The \"network cost functions\" in ", ButtonBox["equation", ButtonData:>"Eq:23", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:23"], ") and equation (19) of [", ButtonBox["3", ButtonData:>"Ref:WebbLowe1988a", ButtonStyle->"Hyperlink"], "] are equivalent. We can combine this with the optimisation of ", Cell[BoxData[ \(TraditionalForm\`L\)]], " to obtain an algorithm for ", Cell[BoxData[ \(TraditionalForm\`E\)]], "-minimisation (or, equivalently ", Cell[BoxData[ \(TraditionalForm\`G\)]], "-maxirnisation)." }], "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " Two-layer perceptron network" }], "Subsection", CellTags->"Sect:4.2"], Cell[TextData[Cell[TextData[{ " ", ButtonBox["OPEN", ButtonData:>{ URL[ "http://www.luttrell.org.uk/papers/sp4_111/fig2.gif"], None}, Active->True, ButtonStyle->"Hyperlink"], " " }]]], "NumberedFigure", TextAlignment->Center, CellTags->{"Ed:Change2", "Fig:2", "Ed:Problem1"}], Cell["Two-layer perceptron network", "Caption"], Cell[TextData[{ "Consider a multilayer perceptron with a single hidden layer. It is a two \ stage network, consisting of a non-linear mapping followed by a linear \ mapping. We show this network in ", ButtonBox["figure", ButtonData:>"Fig:2", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:2"], " where we also define its mappings using the notation that we have \ developed. The ", Cell[BoxData[ \(TraditionalForm\`w\_\(j, k\)\)]], " are the input weights, the scalar function ", Cell[BoxData[ \(TraditionalForm\`\[Sigma]( . )\)]], " represents the usual \"sigmoidal non-linearity\", and the ", Cell[BoxData[ \(TraditionalForm\`L\_\(i, j\)\)]], " are the output weights." }], "Text"], Cell[TextData[{ "In ", ButtonBox["equation", ButtonData:>"Eq:22", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:22"], ") and ", ButtonBox["equation", ButtonData:>"Eq:23", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:23"], ") we showed how to optimise both ", Cell[BoxData[ \(TraditionalForm\`L\)]], " and ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " in general. These procedures map directly onto the conventional schemes \ for training a two-layer perceptron with a linear output stage [", ButtonBox["3", ButtonData:>"Ref:WebbLowe1988a", ButtonStyle->"Hyperlink"], ", ", ButtonBox["4", ButtonData:>"Ref:WebbLowe1988b", ButtonStyle->"Hyperlink"], "]." }], "Text"], Cell[TextData[{ "In this case the two-layer perceptron computes an output ", Cell[BoxData[ FormBox[ StyleBox["o", FontWeight->"Bold"], TraditionalForm]]], " given by ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["o", FontWeight->"Bold"], "=", RowBox[{"L", " ", RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}]}]}], TraditionalForm]]], ", which contains all the ", Cell[BoxData[ FormBox[ StyleBox["z", FontWeight->"Bold"], TraditionalForm]]], " dependence of ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " in ", ButtonBox["equation", ButtonData:>"Eq:20", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:20"], "). The network thus computes a sufficient statistic for ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ", so it effectively computes ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " itself." }], "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Radial basis function network" }], "Subsection", CellTags->"Sect:4.3"], Cell[TextData[Cell[TextData[{ " ", ButtonBox["OPEN", ButtonData:>{ URL[ "http://www.luttrell.org.uk/papers/sp4_111/fig3.gif"], None}, Active->True, ButtonStyle->"Hyperlink"], " " }]]], "NumberedFigure", TextAlignment->Center, CellTags->{"Ed:Problem2", "Fig:3", "Ed:Change3"}], Cell["Radial basis function network", "Caption"], Cell[TextData[{ "This is another type of two-layer perceptron network which we show in ", ButtonBox["figure", ButtonData:>"Fig:3", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:3"], ". The theory of radial basis function networks can be found in [", ButtonBox["5", ButtonData:>"Ref:BroomheadLowe1988a", ButtonStyle->"Hyperlink"], ", ", ButtonBox["6", ButtonData:>"Ref:BroomheadLowe1988b", ButtonStyle->"Hyperlink"], "]. The only difference between ", ButtonBox["figure", ButtonData:>"Fig:3", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:3"], " and ", ButtonBox["figure", ButtonData:>"Fig:2", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:2"], " is in the choice of non-linear mapping. The radial basis function network \ uses non-linear functions that are each a scalar function of a quadratic form \ of ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], ". Each quadratic form is specified by two types of parameter: the \"centre\ \" ", Cell[BoxData[ FormBox[ SuperscriptBox[ StyleBox["x", FontWeight->"Bold"], StyleBox[\((j)\), FontWeight->"Plain"]], TraditionalForm]]], ", and the \"anisotropy matrix\" ", Cell[BoxData[ \(TraditionalForm\`B\^\((j)\)\)]], ". Note that we have absorbed the \"sigmoidal non-linearity\" into the \ definition of the function in ", ButtonBox["figure", ButtonData:>"Fig:3", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:3"], "." }], "Text"], Cell[TextData[{ "The simplest type of radial basis function network uses fixed non-linear \ mappings ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], ", and seeks only to optirnise the linear output stage ", Cell[BoxData[ \(TraditionalForm\`L\)]], " - in this case our derivation leads to precisely the same training \ algorithm. More sophisticated networks also seek to optirnise ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], StyleBox["(", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " - again, our derivation leads to standard results." }], "Text"], Cell["\<\ Otherwise, the interpretation of radial basis function networks is identical \ to two-layer perceptrons.\ \>", "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " Discussion" }], "Subsection"], Cell[TextData[{ "Note how we assumed a Gaussian prior probability ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], " in ", ButtonBox["equation", ButtonData:>"Eq:18", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:18"], "), and we chose to ignore the ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " dependence of the partition function ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], " when constructing the approximate (network) model ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ". This has the fortunate side affect of producing a purely Gaussian model \ for ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ", so that ", Cell[BoxData[ \(TraditionalForm\`G\)]], "-maximisation reduces to a minimum mean square error problem. Many \ adaptive networks use minimum mean square error criteria that are variants of \ ", ButtonBox["equation", ButtonData:>"Eq:21", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:21"], ")." }], "Text"], Cell[TextData[{ "We have shown how to \"derive\" a two-layer network from a data generation \ model ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ". This procedure has two stages: formulating the forward problem \ (generating the data from the class), approximating the inverse problem \ (inferring the class from the data). The inverse problem is completely \ described by ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " (or its approximation ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], "), and we have shown how ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " depends on statistics that are computed by standard networks, such as the \ two-layer perceptron or radial basis function networks." }], "Text"], Cell["There are two disquieting features of our approach:", "Text"], Cell[TextData[{ "\t\[FilledSmallCircle] The network models ignore the partition function ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], ". Provided that ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], " is a sufficiently slowly varying function of ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], ", this approximation is acceptable. It should not be too difficult to \ incorporate ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], " into a network model. Note that because ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], " is a function of ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " (and not ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], "), it plays the r\[OHat]1e of a prior knowledge factor. Omitting ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], " is equivalent to rescaling ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], "\[LongRightArrow]", FractionBox[ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], \(Z(c)\)]}], TraditionalForm]]], "." }], "Text", CellTags->"Ed:Change4"], Cell[TextData[{ "\t\[FilledSmallCircle] We had to introduce a Gaussian prior probability ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], " in order to obtain an ", Cell[BoxData[ \(TraditionalForm\`L\_2\)]], " error measure. In most applications a Gaussian ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], " would not be appropriate, so the networks that we have derived would not \ accurately solve the corresponding inverse problem. In effect, we claim that \ network models (as currently used) are usually not correctly formulated - \ they are at best ad hoc in their implicit choice of prior knowledge." }], "Text"], Cell["\<\ However, modulo these relatively minor problems, we believe that we have \ successfully derived two-layer perceptron and radial basis function networks \ entirely within a Bayesian modelling framework.\ \>", "Text"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " Two-layer network: mixture distribution model" }], "Section", CellTags->"Sect:5"], Cell[TextData[{ "In this section we shall introduce a model in which the data generation is \ described by a rnixture distribution, rather than a Gibbs distribution as in \ ", ButtonBox["\[Section]", ButtonData:>"Sect:4", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:4"], "." }], "Text"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Bayesian model" }], "Subsection"], Cell["The stages in building this model are:", "Text", CellTags->"Ed:Change5"], Cell[TextData[{ "\t1. ", StyleBox["Data generation model.", FontSlant->"Italic"], " Introduce a set of ", Cell[BoxData[ \(TraditionalForm\`M\)]], " data generation probabilities ", Cell[BoxData[ FormBox[ RowBox[{\(Q\_i\), "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], ", and combine these in a weighted linear combination weighted by ", Cell[BoxData[ FormBox[ RowBox[{\(Q\_i\), "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]] }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox["|", FontWeight->"Plain"], StyleBox["c", FontWeight->"Bold"]}], ")"}], "=", RowBox[{ FractionBox["1", RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]], RowBox[{\(\[Sum]\+\(i = 1\)\%M\), RowBox[{ RowBox[{\(Q\_i\), "(", StyleBox["c", FontWeight->"Bold"], ")"}], RowBox[{\(Q\_i\), "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]}]}]}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "where ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " is a \"partition function\" given by" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], StyleBox["=", FontWeight->"Plain"], RowBox[{ StyleBox["\[Integral]", FontWeight->"Plain"], RowBox[{ StyleBox[ RowBox[{ StyleBox["d", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}]], RowBox[{\(\[Sum]\+\(i = 1\)\%M\), RowBox[{ RowBox[{\(Q\_i\), "(", StyleBox["c", FontWeight->"Bold"], ")"}], RowBox[{\(Q\_i\), "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]}]}]}]}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "\t2. ", StyleBox["Posterior probability.", FontSlant->"Italic"], " We shall not assume any particular form for the prior probability ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], ", because (unlike ", ButtonBox["\[Section]", ButtonData:>"Sect:4", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:4"], ") there is no simplification that can be obtained by assuming a Gaussian \ ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], "." }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], StyleBox["|", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}], StyleBox[")", FontWeight->"Plain"]}], StyleBox["=", FontWeight->"Plain"], FractionBox[ RowBox[{ FractionBox[ RowBox[{ StyleBox["Q", FontWeight->"Plain"], StyleBox["(", FontWeight->"Plain"], StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Bold"]}]], RowBox[{"(", RowBox[{\(\[Sum]\+\(i = 1\)\%M\), RowBox[{ RowBox[{\(Q\_i\), "(", StyleBox["c", FontWeight->"Bold"], ")"}], RowBox[{\(Q\_i\), "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]}]}], ")"}]}], RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "\t3. ", StyleBox["Simplified posterior probability.", FontSlant->"Italic"], " We shall now recover a \"network-like\" posterior probability ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], StyleBox["|", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " that we can use instead of the ideal ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], StyleBox["|", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], ". We therefore introduce \"1-of-", Cell[BoxData[ \(TraditionalForm\`N\)]], "\" encoding of ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], ", which replaces ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " by an index ", Cell[BoxData[ \(TraditionalForm\`c\)]], " (", Cell[BoxData[ \(TraditionalForm\`c = 1, 2, \[CenterEllipsis], N\)]], "), where each index value corresponds to a single class (out of ", Cell[BoxData[ \(TraditionalForm\`N\)]], " classes). Thus ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], StyleBox["\[LongRightArrow]", FontWeight->"Plain"], StyleBox[\(Q(c)\), FontWeight->"Plain"]}], TraditionalForm]]], ", ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{\(Q\_i\), "(", StyleBox["c", FontWeight->"Bold"], ")"}], "\[LongRightArrow]", \(Q\_\(c, i\)\)}], TraditionalForm]]], " and ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], StyleBox["\[LongRightArrow]", FontWeight->"Plain"], SubscriptBox[ StyleBox["Z", FontWeight->"Plain"], "c"]}], TraditionalForm]]], ", to yield" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{ RowBox[{"q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], StyleBox["|", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}], StyleBox[")", FontWeight->"Plain"]}], StyleBox["\[LongRightArrow]", FontWeight->"Plain"], RowBox[{"q", "(", RowBox[{"c", StyleBox["|", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}], StyleBox[")", FontWeight->"Plain"]}]}], StyleBox["=", FontWeight->"Plain"], FractionBox[ RowBox[{ FractionBox[ SubscriptBox[ StyleBox["Q", FontWeight->"Plain"], "c"], \(Z\_c\)], RowBox[{"(", RowBox[{\(\[Sum]\+\(i = 1\)\%M\), RowBox[{\(Q\_\(c, i\)\), RowBox[{\(Q\_i\), "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]}]}], ")"}]}], RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]]}], TraditionalForm]], "NumberedEquation",\ CellTags->"Eq:27"], Cell[TextData[{ "The ", Cell[BoxData[ FormBox[ RowBox[{\(Q\_i\), "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " and the ", Cell[BoxData[ \(TraditionalForm\`Q\_\(c, i\)\)]], " may both be adaptively optimised from training set data. The ", Cell[BoxData[ FormBox[ RowBox[{\(Q\_i\), "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " correspond to a non-linear preprocessing layer of a two-layer network, \ and the ", Cell[BoxData[ \(TraditionalForm\`Q\_\(c, i\)\)]], " provide a linear second stage of processing. The ", Cell[BoxData[ FormBox[ FractionBox[ SubscriptBox[ StyleBox["Q", FontWeight->"Plain"], "c"], \(Z\_c\)], TraditionalForm]]], " and ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " factors may easily be introduced afterwards." }], "Text", CellTags->"Ed:Change6"], Cell[TextData[{ "\t4. ", Cell[BoxData[ \(TraditionalForm\`G\)]], "-maximisation and ", Cell[BoxData[ \(TraditionalForm\`E\)]], "-minimisation. Because ", Cell[BoxData[ FormBox[ RowBox[{"q", "(", RowBox[{"c", StyleBox["|", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " in ", ButtonBox["equation", ButtonData:>"Eq:27", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:27"], ") does not have the convenient Gaussian form of ", ButtonBox["equation", ButtonData:>"Eq:20", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:20"], "), the expression for ", Cell[BoxData[ \(TraditionalForm\`G\)]], " does not reduce to an ", Cell[BoxData[ \(TraditionalForm\`L\_2\)]], " error measure. ", Cell[BoxData[ \(TraditionalForm\`G\)]], "-maximisation reduces to essentially the same problem that one one \ encounters in the discriminative hidden Markov model (see ", ButtonBox["\[Section]", ButtonData:>"Sect:6", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:6"], ")." }], "Text", CellTags->"Ed:Change7"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Comparison of Gibbs distribution and mixture distribution models" }], "Subsection"], Cell[TextData[{ "It was the peculiar exponential form of the Gibbs distribution that led to \ the convenient ", ButtonBox["equation", ButtonData:>"Eq:20", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:20"], "). The basic difference between Gibbs and mixture distributions is that \ the Gibbs distribution \"mixes potentials\" (and then exponentiates the sum \ to obtain a probability), whereas a mixture distribution \"mixes \ probabilities\" directly." }], "Text"], Cell[TextData[{ "Gibbs distribution models are admirably suited to ", Cell[BoxData[ \(TraditionalForm\`G\)]], "-maximisation, because the logarithm in the definition of ", Cell[BoxData[ \(TraditionalForm\`G\)]], " can be used to invert the exponential function in the definition of a \ Gibbs distribution. However, note that the partition function remains an \ annoying term whose logarithm does not simplify. There can also be very good \ physical and/or maximum entropy reasons for choosing to use Gibbs \ distribution models." }], "Text"], Cell[TextData[{ "On the other hand, mixture distribution models provide a simple means of \ flexibly constructing data generation probabilities. This type of model is \ basically the same as the discriminative hidden Markov modelling approach \ that we shall describe in ", ButtonBox["\[Section]", ButtonData:>"Sect:6", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:6"], "." }], "Text"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " ", "Discriminative hidden Markov model network" }], "Section", CellTags->"Sect:6"], Cell[TextData[{ "In this section we shall show how hidden Markov models fit into our \ Bayesian modelling framework. This type of model is analogous to the mixture \ distribution model in ", ButtonBox["\[Section]", ButtonData:>"Sect:5", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:5"], ", except that we now rnix together Gibbs distributions rather than simple \ probabilities." }], "Text"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Bayesian model" }], "Subsection"], Cell["The stages in building this model are:", "Text"], Cell[TextData[{ "\t1. ", StyleBox["Data generation model.", FontSlant->"Italic"], " In this model we augment the data vector ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " by introducing hidden variables ", Cell[BoxData[ FormBox[ StyleBox["h", FontWeight->"Bold"], TraditionalForm]]], ", and generate ", Cell[BoxData[ FormBox[ RowBox[{"(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " according to a Gibbs distribution whose potentials ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " are modulated by the class label ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], ". Thus" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox[",", FontWeight->"Bold"], RowBox[{ StyleBox["h", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}]}], ")"}], "=", RowBox[{ FractionBox["1", RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]], RowBox[{"exp", "[", RowBox[{"-", RowBox[{ SuperscriptBox[ RowBox[{"(", RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], ")"}], "T"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", StyleBox[\(x, h\), FontWeight->"Bold"], ")"}]}]}], "]"}]}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "where ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " is a \"partition function\" given by" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], StyleBox["=", FontWeight->"Plain"], RowBox[{ StyleBox["\[Integral]", FontWeight->"Plain"], RowBox[{ StyleBox[ RowBox[{ StyleBox["d", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}]], StyleBox[" ", FontWeight->"Plain"], StyleBox[ RowBox[{ StyleBox["d", FontWeight->"Plain"], StyleBox["h", FontWeight->"Bold"]}]], StyleBox[" ", FontWeight->"Plain"], RowBox[{"exp", "[", RowBox[{"-", RowBox[{ SuperscriptBox[ RowBox[{"(", RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], ")"}], "T"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", StyleBox[\(x, h\), FontWeight->"Bold"], ")"}]}]}], "]"}]}]}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "In this case ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " uses a \"1-of-", Cell[BoxData[ \(TraditionalForm\`N\)]], "\" coding to represent the true class, so ", Cell[BoxData[ FormBox[ RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], TraditionalForm]]], " is a vector of coefficients comprised of one column of the ", Cell[BoxData[ \(TraditionalForm\`S\)]], "-matrix. The overall Gibbs potential ", Cell[BoxData[ FormBox[ RowBox[{ SuperscriptBox[ RowBox[{"(", RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], ")"}], "T"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", StyleBox[\(x, h\), FontWeight->"Bold"], ")"}]}], TraditionalForm]]], " is a different linear combination of the \"basis potentials\" ", Cell[BoxData[ FormBox[ RowBox[{\(U\_i\), "(", StyleBox[\(x, h\), FontWeight->"Bold"], ")"}], TraditionalForm]]], " for each class (in the \"1-of-", Cell[BoxData[ \(TraditionalForm\`N\)]], "\" coding scheme). Thus ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox[",", FontWeight->"Bold"], RowBox[{ StyleBox["h", FontWeight->"Bold"], "|", StyleBox["c", FontWeight->"Bold"]}]}], ")"}], TraditionalForm]]], " contains ", Cell[BoxData[ \(TraditionalForm\`N\)]], " hidden Markov models for generating both visible (", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], ") and hidden (", Cell[BoxData[ FormBox[ StyleBox["h", FontWeight->"Bold"], TraditionalForm]]], ") data." }], "Text"], Cell["\<\ Hidden Markov models are normally expressed in terms of their transition \ matrices, but we find it much easier to simply write them as Gibbs \ distributions in terms of potentials. Transition matrices can then be \ extracted from this formalism when necessary.\ \>", "Text"], Cell[TextData[{ "\t2. ", StyleBox["Posterior probability.", FontSlant->"Italic"], " Without being specific about the exact form of ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], ")"}], TraditionalForm]]], " we can write" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["c", FontWeight->"Bold"], StyleBox["|", FontWeight->"Plain"], StyleBox["x", FontWeight->"Bold"]}], ")"}], "=", "\[AlignmentMarker]", RowBox[{ FractionBox[ RowBox[{ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], RowBox[{ StyleBox["Q", FontWeight->"Plain"], StyleBox["(", FontWeight->"Plain"], RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox["|", FontWeight->"Plain"], StyleBox["c", FontWeight->"Bold"]}], StyleBox[")", FontWeight->"Plain"]}]}], RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]], "\[IndentingNewLine]", "=", "\[AlignmentMarker]", RowBox[{ FractionBox[ RowBox[{ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["h", FontWeight->"Bold"]}]], " ", RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", RowBox[{ StyleBox["h", FontWeight->"Bold"], StyleBox["|", FontWeight->"Plain"], StyleBox["c", FontWeight->"Bold"]}]}], StyleBox[")", FontWeight->"Plain"]}]}]}]}], RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]], "\[IndentingNewLine]", "=", "\[AlignmentMarker]", FractionBox[ RowBox[{ FractionBox[ RowBox[{"Q", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], RowBox[{"Z", "(", StyleBox["c", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]], RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["h", FontWeight->"Bold"]}]], " ", RowBox[{"exp", "[", RowBox[{"-", RowBox[{ SuperscriptBox[ RowBox[{"(", RowBox[{"S", " ", StyleBox["c", FontWeight->"Bold"]}], ")"}], "T"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", StyleBox[\(x, h\), FontWeight->"Bold"], ")"}]}]}], "]"}]}]}]}], RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}]]}]}]}], TraditionalForm]], "NumberedEquation", TextAlignment->AlignmentMarker], Cell[TextData[{ "The denominator term ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], StyleBox[")", FontWeight->"Plain"]}], TraditionalForm]]], " is a normalisation factor that is obtained as the sum of the numerator \ over all possible ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " - in the \"1-of-", Cell[BoxData[ \(TraditionalForm\`N\)]], "\" coding scheme ", Cell[BoxData[ FormBox[ StyleBox["c", FontWeight->"Bold"], TraditionalForm]]], " has exactly ", Cell[BoxData[ \(TraditionalForm\`N\)]], " permitted states. For simple hidden Markov models, the integration over \ ", Cell[BoxData[ FormBox[ StyleBox["h", FontWeight->"Bold"], TraditionalForm]]], " is normally computed as a product of transition matrices, and ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["c", FontWei