(************** Content-type: application/mathematica ************** CreatedBy='Mathematica 5.0' Mathematica-Compatible Notebook This notebook can be used with any Mathematica-compatible application, such as Mathematica, MathReader or Publicon. The data for the notebook starts with the line containing stars above. To get the notebook into a Mathematica-compatible application, do one of the following: * Save the data starting with the line of stars above into a file with a name ending in .nb, then open the file inside the application; * Copy the data starting with the line of stars above to the clipboard, then use the Paste menu command inside the application. Data for notebooks contains only printable 7-bit ASCII and can be sent directly in email or through ftp in text mode. Newlines can be CR, LF or CRLF (Unix, Macintosh or MS-DOS style). NOTE: If you modify the data for this notebook not in a Mathematica- compatible application, you must delete the line below containing the word CacheID, otherwise Mathematica-compatible applications may try to use invalid cache data. For more information on notebooks and Mathematica-compatible applications, contact Wolfram Research: web: http://www.wolfram.com email: info@wolfram.com phone: +1-217-398-0700 (U.S.) Notebook reader applications are available free of charge from Wolfram Research. *******************************************************************) (*CacheID: 232*) (*NotebookFileLineBreakTest NotebookFileLineBreakTest*) (*NotebookOptionsPosition[ 267746, 8327]*) (*NotebookOutlinePosition[ 282340, 8760]*) (* CellTagsIndexPosition[ 279045, 8654]*) (*WindowFrame->Normal*) Notebook[{ Cell[CellGroupData[{ Cell["Notes", "Section 1"], Cell[CellGroupData[{ Cell["Editorial Changes", "Subsection"], Cell["\<\ \"Footnotes\" section added to end of the paper to collect together all of \ the footnotes that originally appeared at the foot of various pages. \ Hyperlinks to these footnotes have also been added to the body of the \ paper.\ \>", "Text"], Cell[TextData[{ "Notation \"", Cell[BoxData[ \(TraditionalForm\`\[SelectionPlaceholder]\_\(\[SelectionPlaceholder]\ \ \[SelectionPlaceholder]\)\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\[SelectionPlaceholder]\_\(\[SelectionPlaceholder], \ \[SelectionPlaceholder]\)\)]], "\" (and, analogously, higher rank notations) throughout the paper." }], "Text"], Cell["\"ie\" changed to \"i.e.\" throughout the paper.", "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change1", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"Metropolis N, Rosenbluth A W, Rosenbluth M N and Teller A H\" changed to \ \"Metropolis N, Rosenbluth A W, Rosenbluth M N, Teller A H and Teller E\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change2", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change3", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change8", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change11", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change12", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change14", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change20", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change22", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change24", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change30", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change32", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "Equation reformatted to make it more readable." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change4", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"..\" changed to \".\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change5", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"higher order interactions\" changed to \"higher order interactions.\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change6", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`\[PartialD]G/\[PartialD]\[Lambda]\_i\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\[PartialD]G\/\[PartialD]\[Lambda]\_i\)]], "\" (twice)." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change7", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`\[PartialD]G/\[PartialD]\[Lambda]\_i\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\[PartialD]G\/\[PartialD]\[Lambda]\_i\)]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change9", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change10", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "Figure moved to a more appropriate place in the text." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change13", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`1 \[LessEqual] t \[LessEqual] \[Tau]\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`1 \[LessEqual] t < \[Tau]\)]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change15", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ UnderoverscriptBox["\[Sum]", RowBox[{ StyleBox["h", FontWeight->"Bold"], "=", "0"}], \(N\_h - 1\)], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\[Sum]\+\(h\_1, h\_2, \[CenterEllipsis], h\_T = \ 0\)\%\(N\_h - 1\)\)]], "\" (twice)." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change16", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ UnderoverscriptBox["\[Sum]", RowBox[{ StyleBox["x", FontWeight->"Bold"], "=", "0"}], \(N\_x - 1\)], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\[Sum]\+\(x\_1, x\_2, \[CenterEllipsis], x\_T = \ 0\)\%\(N\_x - 1\)\)]], "\", and \"", Cell[BoxData[ FormBox[ UnderoverscriptBox["\[Sum]", RowBox[{ StyleBox["h", FontWeight->"Bold"], "=", "0"}], \(N\_h - 1\)], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\[Sum]\+\(h\_1, h\_2, \[CenterEllipsis], h\_T = \ 0\)\%\(N\_h - 1\)\)]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change17", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`\[PartialD]G/\[PartialD]\[Lambda]\_i\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\[PartialD]G\/\[PartialD]\[Lambda]\_i\)]], "\", and \"equation (12))\" changed to \"equation (12)\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change18", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change20", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ UnderoverscriptBox["\[Sum]", RowBox[{ StyleBox["x", FontWeight->"Bold"], "=", "0"}], \(N\_x - 1\)], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\[Sum]\+\(x\_1, x\_2, \[CenterEllipsis], x\_T = \ 0\)\%\(N\_x - 1\)\)]], "\", and \"", Cell[BoxData[ FormBox[ UnderoverscriptBox["\[Sum]", RowBox[{ StyleBox["h", FontWeight->"Bold"], "=", "0"}], \(N\_h - 1\)], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`\[Sum]\+\(h\_1, h\_2, \[CenterEllipsis], h\_T = \ 0\)\%\(N\_h - 1\)\)]], "\" (twice)." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change19", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ RowBox[{\(\[PartialD]G\), "/", RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]}], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ FormBox[ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]], TraditionalForm]]], "\", and \"", Cell[BoxData[ FormBox[ RowBox[{\(\[PartialD]G\), "/", RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]}], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ FormBox[ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]], TraditionalForm]]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change20", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], "(", "\[Tau]", ")"}], \({\[Product]\+\(t = 1\)\%\(\[Tau] - 1\)S\_\(h\_t, x\_t, h\ \_\(t + 1\)\)\%\((+)\)}\)}], TraditionalForm]], TextAlignment->AlignmentMarker, SpanMaxSize->Infinity], "\" changed to \"", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "(", "\[Tau]", ")"}], \({\[Product]\+\(t = 1\)\%\(\[Tau] - 1\)S\_\(h\_t, x\_t, h\ \_\(t + 1\)\)\%\((+)\)}\)}], TraditionalForm]], TextAlignment->AlignmentMarker, SpanMaxSize->Infinity], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change21", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ RowBox[{\(\[PartialD]G\), "/", RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]}], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ FormBox[ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]], TraditionalForm]]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change22", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change24", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ RowBox[{ FractionBox["1", RowBox[{"Z", "(", StyleBox["x", FontWeight->"Bold"], ")"}]], UnderscriptBox["\[Sum]", RowBox[{ StyleBox["x", FontWeight->"Bold"], "\[Element]", \(training\ set\)}]]}], TraditionalForm]], TextAlignment->Left, SpanMaxSize->Automatic], "\" changed to \"", Cell[BoxData[ FormBox[ RowBox[{ UnderscriptBox["\[Sum]", RowBox[{ StyleBox["x", FontWeight->"Bold"], "\[Element]", \(training\ set\)}]], FractionBox["1", RowBox[{"Z", "(", StyleBox["x", FontWeight->"Bold"], ")"}]]}], TraditionalForm]], TextAlignment->Left, SpanMaxSize->Automatic], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change23", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ RowBox[{\(\[PartialD]G\), "/", RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]}], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ FormBox[ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]], TraditionalForm]]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change25", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ RowBox[{"1", "/", SubsuperscriptBox["S", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ FormBox[ FractionBox["1", SubsuperscriptBox["S", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]], TraditionalForm]]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change26", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ RowBox[{\(\[PartialD]G\), "/", RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, 0\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]}], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ FormBox[ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, 0\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]], TraditionalForm]]], "\", and \"", Cell[BoxData[ FormBox[ RowBox[{\(\[PartialD]G\), "/", RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, 0\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]}], TraditionalForm]]], "\" changed to \"", Cell[BoxData[ FormBox[ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, 0\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]], TraditionalForm]]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change27", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"an tandem\" changed to \"in tandem\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change28", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`e\^\(-\[Lambda]\_\(i, j\)\%b\)\)]], "\" changed to \"", Cell[BoxData[ \(TraditionalForm\`e\^\(-\(\[Lambda]\^\[Prime]\)\_\(i, j\)\%b\)\)]], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change29", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"[2]\" changed to \"[2].\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change31", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["log", FontWeight->"Plain"], StyleBox[" ", FontWeight->"Plain"], RowBox[{ StyleBox["det", FontWeight->"Plain"], StyleBox["(", FontWeight->"Plain"], RowBox[{ StyleBox["2", FontWeight->"Plain"], StyleBox["\[Pi]", FontWeight->"Plain"], StyleBox[" ", FontWeight->"Plain"], RowBox[{ SuperscriptBox[ StyleBox[ SuperscriptBox["C", StyleBox["\[Prime]", FontWeight->"Plain"]], FontWeight->"Bold"], \(-1\)], "(", "s", ")"}]}], StyleBox[")", FontWeight->"Plain"]}]}], TraditionalForm]], GridBoxOptions->{ColumnAlignments->{Left}}], "\" changed to \"", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["log", FontWeight->"Plain"], StyleBox[" ", FontWeight->"Plain"], RowBox[{ StyleBox["det", FontWeight->"Plain"], StyleBox["(", FontWeight->"Plain"], RowBox[{ StyleBox[ FractionBox[ StyleBox["1", FontWeight->"Plain"], StyleBox[\(2 \[Pi]\), FontWeight->"Plain"]], FontWeight->"Plain"], StyleBox[" ", FontWeight->"Plain"], RowBox[{ SuperscriptBox[ StyleBox[ SuperscriptBox["C", StyleBox["\[Prime]", FontWeight->"Plain"]], FontWeight->"Bold"], \(-1\)], "(", "s", ")"}]}], StyleBox[")", FontWeight->"Plain"]}]}], TraditionalForm]], GridBoxOptions->{ColumnAlignments->{Left}}], "\"." }], "Text"], Cell[TextData[{ Cell[BoxData[ FormBox[ ButtonBox["TYPO", ButtonData:>"Ed:Change33", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "\"", Cell[BoxData[ \(TraditionalForm\`\(+constant\)\)]], "\" appended to the right hand side of the inequality." }], "Text"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell["\<\ The Gibbs Machine applied to hidden Markov model problems. Part 1: Basic \ theory \ \>", "Title"], Cell["\<\ Stephen P Luttrell Pattern Processing Principles SP4 division, RSRE Malvern, WORCS, WR14 3PS\ \>", "Author"], Cell["\<\ This paper appeared as SP4 Research Note, No. 99, 10 October 1989.\ \>", "Text"], Cell["Copyright \[Copyright] Controller HMSO, London, 1989.", "Text"], Cell[TextData[{ StyleBox["Abstract", FontWeight->"Bold"], "\n\nI show how a hidden Markov model can be expressed as a Gibbs \ distribution. I review my Gibbs distribution training algorithm (a \"Gibbs \ Machine [", ButtonBox["9", ButtonData:>"Ref:Luttrell1985", ButtonStyle->"Hyperlink"], "] rather than a \"Boltzmann Machine\" [", ButtonBox["11", ButtonData:>"Ref:ZhaoAtlasZhuang1988", ButtonStyle->"Hyperlink"], "]), which I use to perform gradient ascent on the relative entropy between \ the Gibbs distribution and the data distribution. I demonstrate how this \ reduces to elementary matrix computations of exactly the same form as \ encountered in the Baum-Welch re-estimation method [", ButtonBox["2", ButtonData:>"Ref:Baum1972", ButtonStyle->"Hyperlink"], "]. Although this toy problem is amenable to Baum-Welch re-estimation, the \ same cannot be said of non-tree-like Markov models. In such cases I propose \ that a hybrid Baum-Welch/Gibbs Machine optimisation scheme should be used." }], "Abstract"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " Introduction" }], "Section 1"], Cell[TextData[{ "Bayesian information processing has recently enjoyed a belated [", ButtonBox["7", ButtonData:>"Ref:Jeffries1939", ButtonStyle->"Hyperlink"], "] increase in popularity in diverse fields. Gibbs distribution (or, \ equivalently, Markov random field) models are now used extensively in image \ modelling, where Monte Carlo techniques such as simulated annealing can be \ used to locate approximately the maxima of ", StyleBox["posterior", FontSlant->"Italic"], " probabilities. Hidden Markov models are used extensively in speech \ modelling, where dynamic programming can be used to discover the maxima of ", StyleBox["posterior", FontSlant->"Italic"], " probabilities. The Bayesian approach to inference is robust in the sense \ that it provides the only consistent means of information processing [", ButtonBox["5", ButtonData:>"Ref:Cox1946", ButtonStyle->"Hyperlink"], "]." }], "Text"], Cell[TextData[{ "Adaptive training of Markov-type models has also undergone a recent \ interest in popularity, in line with the resurgence of interest in \"neural \ networks\"", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:1", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], ". Markov random field model training has received rather patchy attention \ in the literature. The best example is the celebrated Boltzmann machine \ training algorithm [", ButtonBox["1", ButtonData:>"Ref:AckleyHintonSejnowski1985", ButtonStyle->"Hyperlink"], "], which I developed into the Gibbs machine training algorithm [", ButtonBox["9", ButtonData:>"Ref:Luttrell1985", ButtonStyle->"Hyperlink"], "] - this can be used to train any type of Markov model with hidden \ variables, arbitrary interactions, and arbitrary state spaces, by hill \ climbing on a relative entropy criterion. On the other hand, hidden Markov \ model training has received much systematic attention over the years [", ButtonBox["2", ButtonData:>"Ref:Baum1972", ButtonStyle->"Hyperlink"], ", ", ButtonBox["8", ButtonData:>"Ref:Liporace1982", ButtonStyle->"Hyperlink"], "], especially in the field of speech modelling. In this case, a \ re-estimation method can be used to train the model to maximise the same \ relative entropy criterion that is used for Markov random field training." }], "Text"], Cell["\<\ Gibbs machine hill-climbing and Markov model re-estimation are two sides of \ the same coin - both maximise a relative entropy criterion. The problem is to \ unify these two approaches. Thus I will formulate a standard hidden Markov \ model as a Gibbs distribution (and vice-versa), then show that the Gibbs \ distribution training algorithm can be expressed in terms of the same \ forwards/backwards computations that arise during Baum-Welch re-estimation. \ This demonstrates the detailed connection between the two approaches for the \ simplest case of interest - the hidden Markov model. The two approaches can \ then be applied in a hybrid fashion to more complicated Markov model training \ problems.\ \>", "Text"], Cell[TextData[{ "A particularly desirable advantage of the Gibbs distribution approach is \ its ability to build up a Markov model systematically using ", Cell[BoxData[ \(TraditionalForm\`n\)]], "-point interactions between the model variables. This permits one to \ construct a \"cheap and cheerful\" representation of the mutual correlations \ between ", Cell[BoxData[ \(TraditionalForm\`n\)]], " variables by using functions of only 2 variables. The simplest model of \ interest where this trick can be used to advantage is the standard hidden \ Markov model with ", StyleBox["direct", FontSlant->"Italic"], " causal influence between successive emitted symbols. This type of model \ has a wide range of applications in fields as diverse as radar modelling \ (correlated clutter) and speech modelling (co-articulation)." }], "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " Gibbs distributions" }], "Section"], Cell[TextData[{ "In this section 1 give a resume\[EAcute] of Gibbs distributions and the \ Hammersley-Clifford expansion. I strongly recommend that the interested \ reader refers to [", ButtonBox["3", ButtonData:>"Ref:Besag1974", ButtonStyle->"Hyperlink"], "] for further information." }], "Text"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " The Gibbs distribution" }], "Subsection"], Cell[TextData[{ "Denote the state of a system as ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " and its probability of occurrence as ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], ". This probability can always be written in the form" }], "Text"], Cell[BoxData[{ FormBox[ RowBox[{ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], "\[Congruent]", "\[AlignmentMarker]", RowBox[{\(1\/Z\), SuperscriptBox["e", RowBox[{"-", RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}]}]]}]}], TraditionalForm], "\n", FormBox[ RowBox[{"Z", "\[Congruent]", "\[AlignmentMarker]", RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], " ", SuperscriptBox["e", RowBox[{"-", RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}]}]]}]}]}], TraditionalForm]}], "NumberedEquation", TextAlignment->AlignmentMarker], Cell[TextData[{ "where ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}], "=", RowBox[{ RowBox[{"-", RowBox[{"log", "(", RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], ")"}]}], "+", "constant"}]}], TraditionalForm]]], ". This parameterisation of ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " is called a Gibbs distribution", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:2", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " The Hammersley-Clifford expansion" }], "Subsection"], Cell[TextData[{ "A Gibbs distribution can always be written using a ", Cell[BoxData[ FormBox[ RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " of the form ", "[", ButtonBox["3", ButtonData:>"Ref:Besag1974", ButtonStyle->"Hyperlink"], "]" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}], "=", \(\[Sum]\+i\(\( G\_i\%\((1)\)\)(x\_i)\) x\_i + \[Sum]\+\(i < j\)\(\(G\_\(i, j\)\%\((2)\)\)(x\_i, x\_j)\) \(x\_i\) x\_j + \[Sum]\+\(i < j < k\)\(\(G\_\(i, j, k\)\%\((3)\)\)(x\_i, x\_j, x\_k)\) \(x\_i\) \(x\_j\) x\_k + \[CenterEllipsis]\)}], TraditionalForm]], "NumberedEquation", CellTags->"Eq:2"], Cell[TextData[{ "where I assume that the ", Cell[BoxData[ \(TraditionalForm\`x\_i\)]], " each have ", Cell[BoxData[ \(TraditionalForm\`N\_x\)]], " possible states, thus ", Cell[BoxData[ \(TraditionalForm\`x\_i \[Element] {0, 1, 2, \[CenterEllipsis], N\_x - 1}\)]], ". This is known as the Hammersely-Clifford expansion. In the \ Hammersley-Clifford expansion every ", Cell[BoxData[ \(TraditionalForm\`n\)]], "-tuple (", Cell[BoxData[ \(TraditionalForm\`n \[Element] {1, 2, 3, \[CenterEllipsis]}\)]], ") of components extracted from ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " has its own unique piece of ", Cell[BoxData[ FormBox[ RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], ", except for those ", Cell[BoxData[ \(TraditionalForm\`n\)]], "-tuples that have one or more components equal to zero. The \ Harnmersley-Clifford expansion provides a systematic expansion of ", Cell[BoxData[ FormBox[ RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " by order of statistical properties: 1-tuples, 2-tuples, etc", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:3", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], ". Although ", ButtonBox["equation", ButtonData:>"Eq:2", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:2"], ") demonstrates a dislike for the zero state, the important point is that \ the parameterisation of ", Cell[BoxData[ FormBox[ RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " has exactly the right number of parameters in the right places. This \ could be achieved in other less easily stated ways." }], "Text"], Cell[TextData[{ "A special case of the Hammersley-Clifford expansion is used to express ", Cell[BoxData[ FormBox[ RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " for a Boltzmann machine [", ButtonBox["1", ButtonData:>"Ref:AckleyHintonSejnowski1985", ButtonStyle->"Hyperlink"], "]. Thus the components ", Cell[BoxData[ \(TraditionalForm\`x\_i\)]], " of the Boltzmann machine state ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " are restricted to be binary variables (", Cell[BoxData[ \(TraditionalForm\`x\_i \[Element] {0, 1}\)]], "), and only the 1-tuple and 2-tuple terms of the Hammersley-Clifford \ expansion are used, whence" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{ SubscriptBox["U", StyleBox["BM", FontSlant->"Italic"]], "(", StyleBox["x", FontWeight->"Bold"], ")"}], "=", \(\[Sum]\+i\( a\_i\) x\_i + \[Sum]\+\(i < j\)\(w\_\(i, j\)\) \(x\_i\) x\_j\)}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "Note that since ", Cell[BoxData[ \(TraditionalForm\`x\_i \[Element] {0, 1}\)]], " the coefficients ", Cell[BoxData[ \(TraditionalForm\`a\_i\)]], " and ", Cell[BoxData[ \(TraditionalForm\`w\_\(i, j\)\)]], " need not have an explicit dependence on ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], "." }], "Text"], Cell["\<\ I shall usually represent the Hammersley-Clifford expansion in the compact \ form\ \>", "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}], "=", RowBox[{\(\[Sum]\+i\), RowBox[{\(\[Lambda]\_i\), RowBox[{\(U\_i\), "(", StyleBox["x", FontWeight->"Bold"], ")"}]}]}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "where the ", Cell[BoxData[ FormBox[ RowBox[{\(U\_i\), "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " are pieces taken from the right hand side of the Hammersley-Clifford \ expansion, and the ", Cell[BoxData[ \(TraditionalForm\`\[Lambda]\_i\)]], " are weighting coefficients. I shall also use the following representation \ of the Hammersley-Clifford expansion" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}], "=", "\[AlignmentMarker]", \(\[Sum]\+i\(\[Sum]\+\(a = 1\)\%\(N\_x - 1\)\(\ \[Lambda]\_\(i, a\)\%\((1)\)\) \[Delta]\_\(x\_i, a\)\) + \[Sum]\+\(i < \ j\)\(\[Sum]\+\(a, b = 1\)\%\(N\_x - 1\)\(\[Lambda]\_\(i, j, a, b\)\%\((2)\)\) \(\[Delta]\_\(x\_i, a\)\) \[Delta]\_\(x\_j, b\)\) + \[Sum]\+\(i < j < k\)\(\ \[Sum]\+\(a, b, c = 1\)\%\(N\_x - 1\)\(\[Lambda]\_\(i, j, k, a, b, c\)\%\((3)\)\) \(\[Delta]\_\(x\_i, a\)\) \(\[Delta]\_\(x\_j, b\)\) \[Delta]\_\(x\_k, c\)\) + \[CenterEllipsis]\)}], TraditionalForm]], "NumberedEquation", TextAlignment->AlignmentMarker, CellTags->{"Ed:Change2", "Eq:5"}], Cell[TextData[{ "where ", Cell[BoxData[ \(TraditionalForm\`\[Delta]\_\(m, n\)\)]], " is a Kronecker delta. This expression provides a separate contribution to \ ", Cell[BoxData[ FormBox[ RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " from each individual ", Cell[BoxData[ \(TraditionalForm\`n\)]], "-tuple state, and thus provides very fine control over the form of ", Cell[BoxData[ FormBox[ RowBox[{"U", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], "." }], "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Hidden variables" }], "Subsection"], Cell[TextData[{ "The system state ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " might be only part of the state of a larger system whose joint state is \ ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["X", FontWeight->"Bold"], "=", RowBox[{"(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}], TraditionalForm]]], ", where ", Cell[BoxData[ FormBox[ StyleBox["h", FontWeight->"Bold"], TraditionalForm]]], " is the state of a set of unobserved, or ", StyleBox["hidden", FontSlant->"Italic"], ", variables. In this case, I shall call ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " the ", StyleBox["visible", FontSlant->"Italic"], " variables, to emphasise the distinction between ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " and ", Cell[BoxData[ FormBox[ StyleBox["h", FontWeight->"Bold"], TraditionalForm]]], ". Write ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " as a Gibbs distribution ", Cell[BoxData[ FormBox[ RowBox[{"U", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]] }], "Text"], Cell[BoxData[{ FormBox[ RowBox[{ RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], "\[Congruent]", "\[AlignmentMarker]", RowBox[{\(1\/Z\), SuperscriptBox["e", RowBox[{"-", RowBox[{"U", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}]]}]}], TraditionalForm], "\n", FormBox[ RowBox[{"Z", "\[Congruent]", "\[AlignmentMarker]", RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], " ", StyleBox[ RowBox[{"d", StyleBox["h", FontWeight->"Bold"]}]], " ", SuperscriptBox["e", RowBox[{"-", RowBox[{"U", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}]]}]}]}], TraditionalForm]}], "NumberedEquation", TextAlignment->AlignmentMarker], Cell[TextData[{ "whence ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " is a marginal probability given by" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], "\[Congruent]", RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["h", FontWeight->"Bold"]}]], " ", RowBox[{"P", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}]}]}], TraditionalForm]], "NumberedEquation"], Cell[TextData[{ "Strictly speaking, the Boltzmann machine is a hidden variables model, so \ ", Cell[BoxData[ FormBox[ RowBox[{ SubscriptBox["U", StyleBox["BM", FontSlant->"Italic"]], "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " should be replaced by ", Cell[BoxData[ FormBox[ RowBox[{ SubscriptBox["U", StyleBox["BM", FontSlant->"Italic"]], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ", where" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{ SubscriptBox["U", StyleBox["BM", FontSlant->"Italic"]], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], "\[Congruent]", RowBox[{\(\[Sum]\+i\( a\_i\%\((x)\)\) x\_i\), "+", RowBox[{\(\[Sum]\+\(i < j\)\), RowBox[{ SubsuperscriptBox["w", \(i, j\), RowBox[{"(", StyleBox["xx", FontSlant->"Italic"], ")"}]], \(x\_i\), \(x\_j\)}]}], "+", \(\[Sum]\+i\( a\_i\%\((h)\)\) h\_i\), "+", RowBox[{\(\[Sum]\+\(i < j\)\), RowBox[{ SubsuperscriptBox["w", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], \(h\_i\), \(h\_j\)}]}], "+", RowBox[{\(\[Sum]\+\(i < j\)\), RowBox[{ SubsuperscriptBox["w", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], \(h\_i\), \(x\_j\)}]}]}]}], TraditionalForm]], "NumberedEquation", CellTags->"Ed:Change3"], Cell[TextData[{ "where the ", Cell[BoxData[ \(TraditionalForm\`h\_i\)]], " are both coupled to each other and to the ", Cell[BoxData[ \(TraditionalForm\`x\_i\)]], ". A particular advantage of models with hidden variables is the ability of \ hidden variables (together with their interactions with the visible \ variables) to capture high order correlations amongst the visible variables. \ In a sense, all models have hidden variables because one observes superficial \ quantities in experiments (call these the visible variables), and then one \ seeks an underlying explanation (call these the hidden variables) for those \ observations", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:4", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " The Gibbs machine training algorithm" }], "Section"], Cell[TextData[{ "In this section I shall present a Gibbs distribution form of the Boltzmann \ machine training algorithm [", ButtonBox["1", ButtonData:>"Ref:AckleyHintonSejnowski1985", ButtonStyle->"Hyperlink"], "]. 1 originally presented this derivation in [", ButtonBox["9", ButtonData:>"Ref:Luttrell1985", ButtonStyle->"Hyperlink"], "], and called it the Gibbs machine (training algorithm)", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:5", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text", CellTags->"Ed:Change4"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "The relative entropy optimisation criterion" }], "Subsection"], Cell[TextData[{ "I wish to build a model ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " of the training set ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], ", so the appropriate cost function to maximise is relative entropy ", Cell[BoxData[ \(TraditionalForm\`G\)]], " defined as" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{"G", "\[Congruent]", RowBox[{"-", RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], " ", RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], RowBox[{"log", "[", FractionBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}]], "]"}]}]}]}], "\[LessEqual]", "0"}], TraditionalForm]], "NumberedEquation", SpanMaxSize->Infinity], Cell[TextData[{ "with ", Cell[BoxData[ FormBox[ RowBox[{"G", "=", RowBox[{ RowBox[{"0", " ", "iff", " ", RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}]}], "=", RowBox[{ RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}], " ", RowBox[{"\[ForAll]", " ", StyleBox["x", FontWeight->"Bold"]}]}]}]}], TraditionalForm]]], ". ", Cell[BoxData[ \(TraditionalForm\`G\)]], " can be interpreted as follows. Suppose one draws a large number ", Cell[BoxData[ \(TraditionalForm\`M\)]], " of samples of ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " at random from the training set ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], ", then ", Cell[BoxData[ \(TraditionalForm\`e\^\(M\ G\)\)]], " is the probability that these ", Cell[BoxData[ \(TraditionalForm\`M\)]], " samples could have been drawn at random from the model ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], ". Thus ", Cell[BoxData[ \(TraditionalForm\`e\^G\)]], " is the ", StyleBox["per-sample", FontSlant->"Italic"], " probability that ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " produces samples that look as if they had come from ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:6", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"], Cell[TextData[{ "I shall construct the model ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " as a Gibbs distribution with hidden variables. Thus" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}], "\[Congruent]", RowBox[{\(1\/Z\), RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["h", FontWeight->"Bold"]}]], " ", SuperscriptBox["e", RowBox[{"-", RowBox[{ StyleBox["\[Lambda]", FontWeight->"Bold"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}]}]]}]}]}]}], TraditionalForm]], "NumberedEquation", CellTags->"Eq:10"], Cell["whence", "Text"], Cell[BoxData[ FormBox[ RowBox[{"G", "=", RowBox[{ RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], " ", RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], " ", RowBox[{"log", "[", RowBox[{\(1\/Z\), RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["h", FontWeight->"Bold"]}]], " ", SuperscriptBox["e", RowBox[{"-", RowBox[{ StyleBox["\[Lambda]", FontWeight->"Bold"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}]}]]}]}]}], "]"}]}]}], "+", "constant"}]}], TraditionalForm]], "NumberedEquation", SpanMaxSize->Infinity], Cell[TextData[{ "The constant term depends only on the true system probability ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " over which one has no control." }], "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Gradient of the relative entropy" }], "Subsection"], Cell[TextData[{ "I shall now derive a gradient ascent scheme for maximising ", Cell[BoxData[ \(TraditionalForm\`G\)]], " with respect to the coefficient vector ", Cell[BoxData[ FormBox[ StyleBox["\[Lambda]", FontWeight->"Bold"], TraditionalForm]]], ", which will lead to the Gibbs machine training algorithm [", ButtonBox["9", ButtonData:>"Ref:Luttrell1985", ButtonStyle->"Hyperlink"], "]. Thus ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]] }], "Text"], Cell[BoxData[ FormBox[ RowBox[{\(\[PartialD]G\/\[PartialD]\[Lambda]\_i\), "=", "\[AlignmentMarker]", RowBox[{ RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], " ", RowBox[{ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], " ", "[", RowBox[{ RowBox[{\(1\/Z\), RowBox[{"\[Integral]", RowBox[{ SuperscriptBox[ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], "\[Prime]"], StyleBox[ RowBox[{"d", StyleBox["h", FontWeight->"Bold"]}]], " ", RowBox[{\(U\_i\), "(", RowBox[{ SuperscriptBox[ StyleBox["x", FontWeight->"Bold"], "\[Prime]"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], SuperscriptBox["e", RowBox[{"-", RowBox[{ StyleBox["\[Lambda]", FontWeight->"Bold"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox[ SuperscriptBox["x", StyleBox["\[Prime]", FontWeight->"Plain"]], FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}]}]]}]}]}], "-", FractionBox[ RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["h", FontWeight->"Bold"]}]], " ", RowBox[{\(U\_i\), "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], SuperscriptBox["e", RowBox[{"-", RowBox[{ StyleBox["\[Lambda]", FontWeight->"Bold"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}]}]]}]}], RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["h", FontWeight->"Bold"]}]], " ", SuperscriptBox["e", RowBox[{"-", RowBox[{ StyleBox["\[Lambda]", FontWeight->"Bold"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}]}]]}]}]]}], "]"}]}]}], "\[IndentingNewLine]", "=", "\[AlignmentMarker]", RowBox[{ RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], " ", RowBox[{ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], " ", "[", RowBox[{ RowBox[{"\[Integral]", RowBox[{ SuperscriptBox[ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], "\[Prime]"], StyleBox[ RowBox[{"d", StyleBox["h", FontWeight->"Bold"]}]], " ", RowBox[{\(U\_i\), "(", RowBox[{ SuperscriptBox[ StyleBox["x", FontWeight->"Bold"], "\[Prime]"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], RowBox[{"Q", "(", RowBox[{ StyleBox[ SuperscriptBox["x", StyleBox["\[Prime]", FontWeight->"Plain"]], FontWeight->"Bold"], StyleBox[",", FontWeight->"Plain"], StyleBox["h", FontWeight->"Bold"]}], ")"}]}]}], "-", FractionBox[ RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["h", FontWeight->"Bold"]}]], " ", RowBox[{\(U\_i\), "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox[",", FontWeight->"Plain"], StyleBox["h", FontWeight->"Bold"]}], ")"}]}]}], RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}]]}], "]"}]}]}], "\[IndentingNewLine]", "=", "\[AlignmentMarker]", RowBox[{ RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["x", FontWeight->"Bold"]}]], " ", RowBox[{ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], " ", "[", RowBox[{ SubscriptBox[\(\[LeftAngleBracket]U\_i\ \[RightAngleBracket]\), RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox[",", FontWeight->"Plain"], StyleBox["h", FontWeight->"Bold"]}], ")"}]], "-", SubscriptBox[\(\[LeftAngleBracket]U\_i\ \[RightAngleBracket]\), RowBox[{"Q", "(", RowBox[{ StyleBox["h", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}]]}], "]"}]}]}], "\[IndentingNewLine]", "=", "\[AlignmentMarker]", RowBox[{ SubscriptBox[\(\[LeftAngleBracket]U\_i\[RightAngleBracket]\), RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox[",", FontWeight->"Plain"], StyleBox["h", FontWeight->"Bold"]}], ")"}]], "-", SubscriptBox[\(\[LeftAngleBracket]U\_i\[RightAngleBracket]\), RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["h", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}]}]]}]}]}]}]}], TraditionalForm]], "NumberedEquation", TextAlignment->AlignmentMarker, SpanMaxSize->Infinity, CellTags->"Eq:12"], Cell[TextData[{ "The notation ", Cell[BoxData[ \(TraditionalForm\`\[LeftAngleBracket]\[CenterEllipsis]\ \[RightAngleBracket]\)]], " is defined as follows" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ SubscriptBox[\(\[LeftAngleBracket]\([\)\(\[CenterEllipsis]\)\(]\)\ \[RightAngleBracket]\), RowBox[{"\[Rho]", "(", StyleBox["X", FontWeight->"Bold"], ")"}]], "\[Congruent]", RowBox[{"\[Integral]", RowBox[{ StyleBox[ RowBox[{"d", StyleBox["X", FontWeight->"Bold"]}]], " ", RowBox[{ RowBox[{"\[Rho]", "(", StyleBox["X", FontWeight->"Bold"], ")"}], " ", "[", "\[CenterEllipsis]", "]"}]}]}]}], TraditionalForm]], "NumberedEquation", CellTags->"Eq:13"], Cell[TextData[{ "where ", Cell[BoxData[ FormBox[ RowBox[{"\[Rho]", "(", StyleBox["X", FontWeight->"Bold"], ")"}], TraditionalForm]]], " is any probability measure. In ", ButtonBox["equation", ButtonData:>"Eq:12", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:12"], ") ", Cell[BoxData[ \(TraditionalForm\`\[PartialD]G\/\[PartialD]\[Lambda]\_i\)]], " is the difference between the averages of ", Cell[BoxData[ \(TraditionalForm\`U\_i\)]], " obtained under two distinct conditions: the model Gibbs distribution ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox[",", FontWeight->"Plain"], StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ", and a constrained Gibbs distribution ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["h", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}]}], TraditionalForm]]], ". In Boltzmann machine terminology the ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox[",", FontWeight->"Plain"], StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], "-average is called ", StyleBox["free", FontSlant->"Italic"], " because the system state is described by an unconstrained Gibbs \ distribution, whereas the ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["h", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}]}], TraditionalForm]]], "-average is called ", StyleBox["clamped", FontSlant->"Italic"], " because part of the system state ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " is described in a way that is determined externally by ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " whilst the rest of the state ", Cell[BoxData[ FormBox[ StyleBox["h", FontWeight->"Bold"], TraditionalForm]]], " is described by a conditional Gibbs distribution. The beauty of the \ expression for ", Cell[BoxData[ \(TraditionalForm\`\[PartialD]G\/\[PartialD]\[Lambda]\_i\)]], " is that it allows one to optimise all of the Gibbs distribution \ parameters ", Cell[BoxData[ \(TraditionalForm\`\[Lambda]\_i\)]], " - not just those concerned directly with the observed part ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " of the system state. This powerful approach thus allows one to build \ hidden variable models ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox[",", FontWeight->"Plain"], StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " whose marginal distribution ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " approximates (in the ", Cell[BoxData[ \(TraditionalForm\`G\)]], " measure sense) the training set distribution ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], "." }], "Text", CellTags->"Ed:Change6"], Cell[TextData[{ "Practical implementations of the ", Cell[BoxData[ \(TraditionalForm\`\[PartialD]G\/\[PartialD]\[Lambda]\_i\)]], " computation use a Monte Carlo technique (in the form of the Metropolis \ algorithm", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:7", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], " [", ButtonBox["10", ButtonData:>"Ref:MetropolisRosenbluthRosenbluthTellerTeller1953", ButtonStyle->"Hyperlink"], "]) to provide samples from ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox[",", FontWeight->"Plain"], StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " and ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["h", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ", whilst the training set provides samples from ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], ". There are many variations on a theme here. For instance, the Boltzmann \ machine is usually trained with some sort of simulated annealing to \ circumvent the tendency of the Metropolis algorithm to get trapped sampling \ in the vicinity of local maxima of ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], StyleBox[",", FontWeight->"Plain"], StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " (or ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["h", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], "). It is important to recognise that the Metropolis algorithm and \ simulated annealing are nothing more than algorithmic tricks", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:8", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text", CellTags->"Ed:Change7"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " Hidden Markov model and Gibbs distribution parameterisations" }], "Section", CellTags->"Sect:4"], Cell[TextData[{ "In this section I shall derive the Gibbs distribution formulation of \ hidden Markov model training. My derivation cures major problems that would \ arise if the approach discussed in [", ButtonBox["4", ButtonData:>"Ref:BridleMoore1984", ButtonStyle->"Hyperlink"], "] were to be attempted", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:9", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], ". My approach is similar to, but more general than, that given in [", ButtonBox["11", ButtonData:>"Ref:ZhaoAtlasZhuang1988", ButtonStyle->"Hyperlink"], "]", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:10", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Hidden Markov model as a Gibbs distribution" }], "Subsection", CellTags->"Sect:4.1"], Cell[TextData[{ "A simple (first order) Markov chain model has a joint state space defined \ as ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["h", FontWeight->"Bold"], "\[Congruent]", \((h\_1, h\_2, \[CenterEllipsis], h\_T)\)}], TraditionalForm]]], " with a set of conditional probabilities ", Cell[BoxData[ \(TraditionalForm\`Q(h\_\(t + 1\) | h\_t), t = 1, 2, \[CenterEllipsis], T\)]], ". Note that I use the hidden variable notation ", Cell[BoxData[ FormBox[ StyleBox["h", FontWeight->"Bold"], TraditionalForm]]], " for convenience, and I assume that the conditional probabilities are not \ dependent on the \"time\" index ", Cell[BoxData[ \(TraditionalForm\`t\)]], ". The probability that joint " }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", StyleBox["h", FontWeight->"Bold"], ")"}], "=", \({\[Product]\+\(t = 1\)\%\(T - 1\)Q(h\_\(t + 1\) | h\_t)} \(Q( h\_1)\)\)}], TraditionalForm]], "NumberedEquation", SpanMaxSize->Infinity], Cell[TextData[{ "where ", Cell[BoxData[ \(TraditionalForm\`Q(h\_1)\)]], " is needed to prime the conditional probability chain. This may, or may \ not, be equal to the equilibrium state probability." }], "Text"], Cell[TextData[{ "A hidden Markov model may easily be obtained by forming a joint state \ space defined as ", Cell[BoxData[ FormBox[ RowBox[{"(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " and a further set of conditional probabilities ", Cell[BoxData[ \(TraditionalForm\`Q(x\_t | h\_t), t = 1, 2, \[CenterEllipsis], T\)]], ". Thus each ", Cell[BoxData[ \(TraditionalForm\`h\_t\)]], " can be thought of as producing ", Cell[BoxData[ \(TraditionalForm\`h\_\(t + 1\)\)]], " and ", Cell[BoxData[ \(TraditionalForm\`x\_t\)]], ", where each is produced independently of the other (given that ", Cell[BoxData[ \(TraditionalForm\`h\_t\)]], " is known). The probability that joint state ", Cell[BoxData[ FormBox[ RowBox[{"(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " occurs is given by" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], "=", RowBox[{\({\[Product]\+\(t = 1\)\%T Q(x\_t | h\_t)}\), RowBox[{"Q", "(", StyleBox["h", FontWeight->"Bold"], ")"}]}]}], TraditionalForm]], "NumberedEquation", SpanMaxSize->Infinity, CellTags->"Eq:15"], Cell[TextData[{ "The expression in ", ButtonBox["equation", ButtonData:>"Eq:15", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:15"], ") is the starting point for standard derivations of hidden Markov model \ re-estimation. Invariably, one chooses the ", Cell[BoxData[ \(TraditionalForm\`h\_t\)]], " to be discrete, but chooses the ", Cell[BoxData[ \(TraditionalForm\`x\_t\)]], " to be either discrete or continuous. Without loss of generality", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:11", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], ", I choose both to be discrete with ", Cell[BoxData[ \(TraditionalForm\`x\_t \[Element] {0, 1, 2, \[CenterEllipsis], N\_x - 1}\)]], " and ", Cell[BoxData[ \(TraditionalForm\`h\_t \[Element] {0, 1, 2, \[CenterEllipsis], N\_h - 1}\)]], ", ", Cell[BoxData[ \(TraditionalForm\`1 < t < T\)]], "." }], "Text"], Cell[TextData[{ "A Gibbs distribution formulation of ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " is then" }], "Text"], Cell[BoxData[{ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], "=", "\[AlignmentMarker]", SuperscriptBox["e", RowBox[{"-", RowBox[{"U", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}]]}], TraditionalForm], "\n", FormBox[ RowBox[{ RowBox[{"U", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], "=", "\[AlignmentMarker]", \(\(-log[ Q \((h\_1)\)]\) - \[Sum]\+\(t = 1\)\%\(T - 1\)log[ Q \((h\_\(t + 1\) | h\_t)\)] - \[Sum]\+\(t = 1\)\%T log[ Q \((x\_t | h\_t)\)]\)}], TraditionalForm]}], "NumberedEquation",\ TextAlignment->AlignmentMarker, CellTags->"Eq:16"], Cell["\<\ This is a tautology, because I have merely placed the conditional probability \ inside an exponential function. I have made this replacement to demonstrate \ that the standard hidden Markov model formulation maps directly onto the \ language of Gibbs distributions.\ \>", "Text"] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " Gibbs distribution as a hidden Markov model" }], "Subsection"], Cell[TextData[{ "A general Gibbs distribution with hidden variables takes the form shown in \ ", ButtonBox["equation", ButtonData:>"Eq:10", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:10"], "). I shall parameterise this in the style of ", ButtonBox["equation", ButtonData:>"Eq:5", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:5"], ") to yield" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{ StyleBox["\[Lambda]", FontWeight->"Bold"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}], "=", "\[AlignmentMarker]", RowBox[{ RowBox[{\(\[Sum]\+\(t = 1\)\%T\), RowBox[{"[", RowBox[{\(\[Sum]\+\(i = 1\)\%\(N\_x - \ 1\)\(\[Lambda]\_i\%\((x)\)\) \[Delta]\_\(x\_t, i\)\), "+", \(\[Sum]\+\(i = 1\)\%\(N\_h - 1\)\(\[Lambda]\_i\%\((h)\)\ \) \[Delta]\_\(h\_t, i\)\), "+", RowBox[{\(\[Sum]\+\(i = 1\)\%\(N\_h - 1\)\), RowBox[{\(\[Sum]\+\(j = 1\)\%\(N\_x - 1\)\), RowBox[{ SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], \(\[Delta]\_\(h\_t, i\)\), \(\[Delta]\_\(x\_t, j\)\)}]}]}]}], "]"}]}], "+", RowBox[{\(\[Sum]\+\(t = 1\)\%\(T - 1\)\), RowBox[{\(\[Sum]\+\(i, j = 1\)\%\(N\_h - 1\)\), RowBox[{ SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], \(\[Delta]\_\(h\_t, i\)\), \(\[Delta]\_\(h\_\(t + 1\), j\)\)}]}]}]}]}], TraditionalForm]], "NumberedEquation", TextAlignment->AlignmentMarker, SpanMaxSize->Infinity, CellTags->{"Ed:Change8", "Eq:17"}], Cell[TextData[{ "where I retain only the 1-point and 2-point terms to correspond to the \ hidden Markov model in ", ButtonBox["\[Section]", ButtonData:>"Sect:4.1", ButtonStyle->"Hyperlink"], CounterBox["Section", "Sect:4"], ".", CounterBox["Subsection", "Sect:4.1"], ". Note that I have assumed time shift invariance, so the ", Cell[BoxData[ \(TraditionalForm\`\[Lambda]\)]], "-coefficients do not depend on ", Cell[BoxData[ \(TraditionalForm\`T\)]], ". This use of symmetry to simplify the model leads to a \"spread network\" \ [", ButtonBox["4", ButtonData:>"Ref:BridleMoore1984", ButtonStyle->"Hyperlink"], "] in which the model structure is repeated identically at each time slice. \ ", ButtonBox["Figure", ButtonData:>"Fig:1", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:1"], " shows a piece of a hidden Markov model corresponding to ", ButtonBox["equation", ButtonData:>"Eq:17", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:17"], "), with its 1-point terms represented as small circles, and its 2-point \ terms represented as ovals each of which encloses a pair of 1-point terms." }], "Text"], Cell[TextData[Cell[TextData[{ " ", ButtonBox["OPEN", ButtonData:>{ URL[ "http://www.luttrell.org.uk/papers/sp4_99/fig1.gif"], None}, Active->True, ButtonStyle->"Hyperlink"], " " }]]], "NumberedFigure", TextAlignment->Center, CellTags->{"Fig:1", "Ed:Change9"}], Cell["\<\ A hidden Markov model with all 1-point and 2-point terms shown.\ \>", "Caption"], Cell[TextData[{ "It is both instructive and theoretically convenient to reparameterise ", ButtonBox["equation", ButtonData:>"Eq:17", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:17"], ") to make clear its relationship to ", ButtonBox["equation", ButtonData:>"Eq:16", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:16"], "). Diagrammatically we must extract pieces of ", ButtonBox["Figure", ButtonData:>"Fig:1", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:1"], " that look like conditional probabilities. I show these in ", ButtonBox["Figure", ButtonData:>"Fig:2", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:2"], ". ", ButtonBox["Figure", ButtonData:>"Fig:2", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:2"], "(a) corresponds to the ", Cell[BoxData[ \(TraditionalForm\`log[Q(h\_\(t + 1\) | h\_t)]\)]], " piece of ", ButtonBox["equation", ButtonData:>"Eq:16", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:16"], "), whereas ", ButtonBox["Figure", ButtonData:>"Fig:2", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:2"], "(b) corresponds to the ", Cell[BoxData[ \(TraditionalForm\`log[Q(x\_t | h\_t)]\)]], " piece." }], "Text"], Cell[TextData[Cell[TextData[{ " ", ButtonBox["OPEN", ButtonData:>{ URL[ "http://www.luttrell.org.uk/papers/sp4_99/fig2.gif"], None}, Active->True, ButtonStyle->"Hyperlink"], " " }]]], "NumberedFigure", TextAlignment->Center, CellTags->{"Fig:2", "Ed:Change10"}], Cell["The conditional probability pieces of a hidden Markov model.", "Caption"], Cell[TextData[{ "Now that it is clear from ", ButtonBox["Figure", ButtonData:>"Fig:2", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:2"], " what I intend to do to ", ButtonBox["equation", ButtonData:>"Eq:17", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:17"], "), 1 can proceed with confidence to rearrange it as follows" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{ StyleBox["\[Lambda]", FontWeight->"Bold"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}], "=", "\[AlignmentMarker]", RowBox[{\(\[Sum]\+\(i = 1\)\%\(N\_h - 1\)\(\[Lambda]\_i\%\((h)\)\) \ \[Delta]\_\(h\_1, i\)\), "+", RowBox[{\(\[Sum]\+\(t = 1\)\%\(T - 1\)\), RowBox[{\(\[Sum]\+\(i = 0\)\%\(N\_h - 1\)\), RowBox[{\(\[Sum]\+\(j = 1\)\%\(N\_h - 1\)\), RowBox[{ RowBox[{"(", RowBox[{\(\[Lambda]\_j\%\((h)\)\), "+", RowBox[{ SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "(", \(1 - \[Delta]\_\(i, 0\)\), ")"}]}], ")"}], \(\[Delta]\_\(h\_t, i\)\), \(\[Delta]\_\(h\_\(t + 1\), j\)\)}]}]}]}], "+", RowBox[{\(\[Sum]\+\(t = 1\)\%T\), RowBox[{\(\[Sum]\+\(i = 0\)\%\(N\_h - 1\)\), RowBox[{\(\[Sum]\+\(j = 1\)\%\(N\_x - 1\)\), RowBox[{ RowBox[{"(", RowBox[{\(\[Lambda]\_j\%\((x)\)\), "+", RowBox[{ SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], "(", \(1 - \[Delta]\_\(i, 0\)\), ")"}]}], ")"}], \(\[Delta]\_\(h\_t, i\)\), \(\[Delta]\_\(x\_t, j\)\)}]}]}]}]}]}], TraditionalForm]], "NumberedEquation",\ TextAlignment->AlignmentMarker, SpanMaxSize->Infinity, CellTags->"Ed:Change11"], Cell["Redefine the notation as follows", "Text"], Cell[BoxData[{ FormBox[ RowBox[{\(\[Lambda]\_j\%\((h)\)\), "+", RowBox[{ RowBox[{ SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "(", \(1 - \[Delta]\_\(i, 0\)\), ")"}], "\[LongRightArrow]", "\[AlignmentMarker]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]}], TraditionalForm], "\n", FormBox[ RowBox[{\(\[Lambda]\_j\%\((x)\)\), "+", RowBox[{ RowBox[{ SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], "(", \(1 - \[Delta]\_\(i, 0\)\), ")"}], "\[LongRightArrow]", "\[AlignmentMarker]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]}], TraditionalForm], "\n", FormBox[ RowBox[{\(\[Lambda]\_i\%\((h)\)\), "\[LongRightArrow]", "\[AlignmentMarker]", SubsuperscriptBox["\[Lambda]", \(0, i\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}], TraditionalForm]}], "NumberedEquation", TextAlignment->AlignmentMarker], Cell["whence", "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{ StyleBox["\[Lambda]", FontWeight->"Bold"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}], "=", "\[AlignmentMarker]", RowBox[{ RowBox[{\(\[Sum]\+\(i = 1\)\%\(N\_h - 1\)\), RowBox[{ SubsuperscriptBox["\[Lambda]", \(0, i\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], \(\[Delta]\_\(h\_1, i\)\)}]}], "+", RowBox[{\(\[Sum]\+\(t = 1\)\%\(T - 1\)\), RowBox[{\(\[Sum]\+\(i = 0\)\%\(N\_h - 1\)\), RowBox[{\(\[Sum]\+\(j = 1\)\%\(N\_h - 1\)\), RowBox[{ SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], \(\[Delta]\_\(h\_t, i\)\), \(\[Delta]\_\(h\_\(t + 1\), j\)\)}]}]}]}], "+", RowBox[{\(\[Sum]\+\(t = 1\)\%T\), RowBox[{\(\[Sum]\+\(i = 0\)\%\(N\_h - 1\)\), RowBox[{\(\[Sum]\+\(j = 1\)\%\(N\_x - 1\)\), RowBox[{ SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], \(\[Delta]\_\(h\_t, i\)\), \(\[Delta]\_\(x\_t, j\)\)}]}]}]}]}]}], TraditionalForm]], "NumberedEquation", TextAlignment->AlignmentMarker, SpanMaxSize->Infinity, CellTags->"Eq:20"], Cell[TextData[{ ButtonBox["Equation", ButtonData:>"Eq:20", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:20"], ") may now be compared directly with", " ", ButtonBox["equation", ButtonData:>"Eq:16", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:16"], ") ", "term for term." }], "Text"], Cell[TextData[{ "We can verify that the total number of parameters is the same in each \ case. In ", ButtonBox["equation", ButtonData:>"Eq:16", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:16"], ") there are two conditional probability matrices to parameterise. ", Cell[BoxData[ \(TraditionalForm\`Q(h\_\(t + 1\) | h\_t)\)]], " requires ", Cell[BoxData[ \(TraditionalForm\`N\_h\ \((N\_h - 1)\)\)]], " parameters (", Cell[BoxData[ \(TraditionalForm\`N\_h\)]], " separate conditional probability, each requiring ", Cell[BoxData[ \(TraditionalForm\`N\_h - 1\)]], " parameters). Similarly, ", Cell[BoxData[ \(TraditionalForm\`Q(x\_t | h\_t)\)]], " requires ", Cell[BoxData[ \(TraditionalForm\`N\_h\ \((N\_x - 1)\)\)]], " parameters. Thus the total number of parameters is ", Cell[BoxData[ \(TraditionalForm\`N\_h\%2 + \(N\_h\) N\_x - 2 N\_h\)]], ". On the other hand, in ", ButtonBox["equation", ButtonData:>"Eq:20", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:20"], ") there are two matrices of Gibbs distribution parameters ", Cell[BoxData[ FormBox[ RowBox[{ SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], ",", " ", \(0 \[LessEqual] i \[LessEqual] N\_h - 1\), ",", " ", \(1 \[LessEqual] j \[LessEqual] N\_h - 1\)}], TraditionalForm]]], " and ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{ RowBox[{ SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], " ", "0"}], "\[LessEqual]", "i", "\[LessEqual]", \(N\_h - 1\)}], ",", " ", \(1 \[LessEqual] j \[LessEqual] N\_x - 1\)}], TraditionalForm]]], ". The ", StyleBox["number of parameters", FontSlant->"Italic"], " in the Gibbs distribution and the conditional probability formalisms is \ the same. However, in general, the ", StyleBox["interpretation of the parameters", FontSlant->"Italic"], " is not in one to one correspondence", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:12", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"], Cell[TextData[{ "There remains the problem of reconciling the ", Cell[BoxData[ \(TraditionalForm\`\(-log[Q(h\_1)]\)\)]], " term in ", ButtonBox["equation", ButtonData:>"Eq:16", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:16"], ") with the ", Cell[BoxData[ FormBox[ RowBox[{\(\[Sum]\_i\), RowBox[{ SubsuperscriptBox["\[Lambda]", \(0, i\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], \(\[Delta]\_\(h\_1, i\)\)}]}], TraditionalForm]]], " term in ", ButtonBox["equation", ButtonData:>"Eq:20", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:20"], "). We are free to choose ", Cell[BoxData[ \(TraditionalForm\`Q(h\_1)\)]], " at will as an initial condition in the conditional probability formalism, \ but ", Cell[BoxData[ FormBox[ SubsuperscriptBox["\[Lambda]", \(0, i\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], TraditionalForm]]], " is already determined as a special case of the ", Cell[BoxData[ FormBox[ SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], TraditionalForm]]], "-matrix by the form of the Hanimersley-Clifford expansion. This is not \ surprising since we assume time shift invariance in the Hammersley-Clifford \ expansion, but we explicitly violate this invariance in the conditional \ probability approach by treating ", Cell[BoxData[ \(TraditionalForm\`t = 1\)]], " as a special case (i.e. an initial condition)." }], "Text"], Cell[TextData[{ "For some applications (such as radar clutter modelling) one should \ preserve explicit time shift invariance, whilst in other applications (such \ as speech modelling) one might wish to impose explicit initial and final \ conditions. I shall restrict my attention to the initial condition ", Cell[BoxData[ \(TraditionalForm\`h\_1 = 0\)]], ", thus ", Cell[BoxData[ \(TraditionalForm\`Q(h\_1) = \[Delta]\_\(h\_1, 0\)\)]], ". The ", Cell[BoxData[ \(TraditionalForm\`\(-log[Q(h\_1)]\)\)]], " term in ", ButtonBox["equation", ButtonData:>"Eq:16", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:16"], ") and the ", Cell[BoxData[ FormBox[ RowBox[{\(\[Sum]\_i\), RowBox[{ SubsuperscriptBox["\[Lambda]", \(0, i\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], \(\[Delta]\_\(h\_1, i\)\)}]}], TraditionalForm]]], " term in ", ButtonBox["equation", ButtonData:>"Eq:20", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:20"], ") may thus be ornitted ", StyleBox["provided that we clamp", FontSlant->"Italic"], " ", Cell[BoxData[ \(TraditionalForm\`h\_1 = 0\)]], " ", StyleBox["wherever else it appears", FontSlant->"Italic"], Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:13", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " ", "Training a hidden Markov model as a Gibbs distribution" }], "Section"], Cell[TextData[{ "In this section 1 shall derive the detail of the Gibbs machine training \ algorithm as applied to a hidden Maxkov model. The training scheme turns out \ to have a similar structure to the Baum-Welch re-estimation scheme [", ButtonBox["2", ButtonData:>"Ref:Baum1972", ButtonStyle->"Hyperlink"], "]." }], "Text"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Summing over histories" }], "Subsection"], Cell[TextData[{ "I shall henceforth use the expression for ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["\[Lambda]", FontWeight->"Bold"], ".", RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}], TraditionalForm]]], " given in ", ButtonBox["equation", ButtonData:>"Eq:20", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:20"], "). Usually one needs to run a Metropolis algorithm in order to estimate \ the averages in ", ButtonBox["equation", ButtonData:>"Eq:27", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:27"], "), which leads to long training times for the ", Cell[BoxData[ \(TraditionalForm\`\[Lambda]\)]], "-coefficients. However, in the case of an hidden Markov model this may be \ circumvented by performing an explicit sum over states of the Markov chain \ [", ButtonBox["11", ButtonData:>"Ref:ZhaoAtlasZhuang1988", ButtonStyle->"Hyperlink"], "]. 1 shall call this a \"sum over histories\"", Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:14", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"], Cell[TextData[{ "In order to develop the sum over histories I shall introduce some \ convenient notation. Define two matrices ", Cell[BoxData[ FormBox[ SubsuperscriptBox["S", \(h\_t, h\_\(t + 1\)\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], TraditionalForm]]], " and ", Cell[BoxData[ FormBox[ SubsuperscriptBox["S", \(h\_t, x\_t\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], TraditionalForm]]], " as follows" }], "Text"], Cell[BoxData[{ FormBox[ RowBox[{ SubsuperscriptBox["S", \(h\_t, h\_\(t + 1\)\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "\[Congruent]", "\[AlignmentMarker]", SuperscriptBox["e", RowBox[{"-", RowBox[{ SubsuperscriptBox["\[Lambda]", \(h\_t, h\_\(t + 1\)\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "(", \(1 - \[Delta]\_\(h\_\(t + 1\), 0\)\), ")"}]}]]}], TraditionalForm], "\n", FormBox[ RowBox[{ SubsuperscriptBox["S", \(h\_t, x\_t\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], "\[Congruent]", "\[AlignmentMarker]", SuperscriptBox["e", RowBox[{"-", RowBox[{ SubsuperscriptBox["\[Lambda]", \(h\_t, x\_t\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], "(", \(1 - \[Delta]\_\(x\_t, 0\)\), ")"}]}]]}], TraditionalForm]}], "NumberedEquation", TextAlignment->AlignmentMarker, CellTags->"Eq:21"], Cell[TextData[{ "Now define ", StyleBox["forwards", FontSlant->"Italic"], " and ", StyleBox["backwards", FontSlant->"Italic"], " matrices ", Cell[BoxData[ \(TraditionalForm\`S\^\((+)\)\)]], " and ", Cell[BoxData[ \(TraditionalForm\`S\^\((-)\)\)]], " respectively as follows" }], "Text"], Cell[BoxData[{ FormBox[ RowBox[{\(S\_\(h\_t, x\_t, h\_\(t + 1\)\)\%\((+)\)\), "\[Congruent]", "\[AlignmentMarker]", RowBox[{ SubsuperscriptBox["S", \(h\_t, x\_t\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], SubsuperscriptBox["S", \(h\_t, h\_\(t + 1\)\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]}], TraditionalForm], "\n", FormBox[ RowBox[{\(S\_\(h\_t, h\_\(t + 1\), x\_\(t + 1\)\)\%\((-)\)\), "\[Congruent]", "\[AlignmentMarker]", RowBox[{ SubsuperscriptBox["S", \(h\_\(t + 1\), x\_\(t + 1\)\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], SubsuperscriptBox["S", \(h\_t, h\_\(t + 1\)\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]}], TraditionalForm]}], "NumberedEquation", TextAlignment->AlignmentMarker], Cell[TextData[Cell[TextData[{ " ", ButtonBox["OPEN", ButtonData:>{ URL[ "http://www.luttrell.org.uk/papers/sp4_99/fig3.gif"], None}, Active->True, ButtonStyle->"Hyperlink"], " " }]]], "NumberedFigure", TextAlignment->Center, CellTags->{"Fig:3", "Ed:Change12"}], Cell["The forwards and backwards pieces of a hidden Markov model.", "Caption"], Cell[TextData[{ "In ", ButtonBox["Figure", ButtonData:>"Fig:3", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:3"], "(a) and ", ButtonBox["Figure", ButtonData:>"Fig:3", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:3"], "(b) I show diagrammatic representations of ", Cell[BoxData[ \(TraditionalForm\`S\^\((+)\)\)]], " and ", Cell[BoxData[ \(TraditionalForm\`S\^\((-)\)\)]], " respectively. Note how ", ButtonBox["Figure", ButtonData:>"Fig:3", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:3"], "(a) and ", ButtonBox["Figure", ButtonData:>"Fig:3", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:3"], "(b) are two different ways of uniting ", ButtonBox["Figure", ButtonData:>"Fig:2", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:2"], "(a) with ", ButtonBox["Figure", ButtonData:>"Fig:2", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:2"], "(b)." }], "Text"], Cell[TextData[{ "With these definitions I can now write ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " as" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], "=", RowBox[{\(1\/Z\), \({\[Product]\+\(t = 1\)\%\(\[Tau] - 1\)S\_\(h\_t, \ x\_t, h\_\(t + 1\)\)\%\((+)\)}\), SubsuperscriptBox["S", \(h\_\[Tau], x\_\[Tau]\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], \({\[Product]\+\(t = \[Tau]\)\%\(T - 1\)S\_\(h\_t, \ h\_\(t + 1\), x\_\(t + 1\)\)\%\((-)\)}\)}]}], TraditionalForm]], "NumberedEquation", SpanMaxSize->Infinity, CellTags->"Eq:23"], Cell[TextData[{ "In ", ButtonBox["equation", ButtonData:>"Eq:23", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:23"], ") 1 have split the product of contributions at ", Cell[BoxData[ \(TraditionalForm\`t = \[Tau]\)]], ", and used forwards matrices for ", Cell[BoxData[ \(TraditionalForm\`1 \[LessEqual] t < \[Tau]\)]], " and backwards matrices for ", Cell[BoxData[ \(TraditionalForm\`\[Tau] \[LessEqual] t \[LessEqual] T - 1\)]], ". There is also an extra term at ", Cell[BoxData[ \(TraditionalForm\`t = \[Tau]\)]], ". which completes the product. In ", ButtonBox["Figure", ButtonData:>"Fig:4", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:4"], " I show the decomposition of ", ButtonBox["equation", ButtonData:>"Eq:23", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:23"], ") in diagrammatic form. It is clear how the pieces fit together to build \ up the whole hidden Markov model chain of ", ButtonBox["Figure", ButtonData:>"Fig:1", ButtonStyle->"Hyperlink"], " ", CounterBox["NumberedFigure", "Fig:1"], ", less the 1-point term corresponding to the initial condition which I \ have discarded." }], "Text", CellTags->"Ed:Change13"], Cell[TextData[Cell[TextData[{ " ", ButtonBox["OPEN", ButtonData:>{ URL[ "http://www.luttrell.org.uk/papers/sp4_99/fig4.gif"], None}, Active->True, ButtonStyle->"Hyperlink"], " " }]]], "NumberedFigure", TextAlignment->Center, CellTags->{"Fig:4", "Ed:Change14"}], Cell["A forwards/backwards decomposed hidden Markov model.", "Caption"], Cell[TextData[{ "The marginal probability ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " may now be expressed as" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{ RowBox[{ RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}], "\[Congruent]", RowBox[{\(\[Sum]\+\(h\_1, h\_2, \[CenterEllipsis], h\_T = 0\)\%\(N\_h - 1\)\), RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}]}]}], "=", RowBox[{\(1\/Z\), RowBox[{\(\[Sum]\+\(h\_1, h\_2, \[CenterEllipsis], h\_T = 0\)\%\(N\_h - 1\)\), RowBox[{\({\[Product]\+\(t = 1\)\%\(T - 1\)S\_\(h\_t, x\_t, \ h\_\(t + 1\)\)\%\((+)\)}\), SubsuperscriptBox["S", \(h\_T, x\_T\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]}]}]}], TraditionalForm]], "NumberedEquation", SpanMaxSize->Infinity, CellTags->"Ed:Change15"], Cell[TextData[{ "An equivalent expression using ", Cell[BoxData[ \(TraditionalForm\`S\^\((-)\)\)]], " can be constructed. ", Cell[BoxData[ \(TraditionalForm\`Z\)]], " may be evaluated by summing the numerator over ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " (note that ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{ SubscriptBox["\[Sum]", StyleBox["x", FontWeight->"Bold"]], RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}]}], "=", "1"}], TraditionalForm]]], " is guaranteed), thus" }], "Text"], Cell[BoxData[{ FormBox[ RowBox[{"Z", "\[Congruent]", "\[AlignmentMarker]", RowBox[{\(\[Sum]\+\(x\_1, x\_2, \[CenterEllipsis], x\_T = 0\)\%\(N\_x - 1\)\), RowBox[{"Z", "(", StyleBox["x", FontWeight->"Bold"], ")"}]}]}], TraditionalForm], "\n", FormBox[ RowBox[{ RowBox[{"Z", "(", StyleBox["x", FontWeight->"Bold"], ")"}], "\[Congruent]", "\[AlignmentMarker]", RowBox[{\(\[Sum]\+\(h\_1, h\_2, \[CenterEllipsis], h\_T = 0\)\%\(N\_h - 1\)\), RowBox[{\({\[Product]\+\(t = 1\)\%\(T - 1\)S\_\(h\_t, x\_t, h\_\(t \ + 1\)\)\%\((+)\)}\), SubsuperscriptBox["S", \(h\_T, x\_T\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]}]}], TraditionalForm]}], "NumberedEquation", TextAlignment->AlignmentMarker, SpanMaxSize->Infinity, CellTags->{"Ed:Change16", "Eq:25"}] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Maximising ", Cell[BoxData[ FormBox[ StyleBox["G", FontWeight->"Plain"], TraditionalForm]]] }], "Subsection"], Cell[TextData[{ "1 shall now use the ", Cell[BoxData[ FormBox[ RowBox[{ StyleBox["U", FontWeight->"Bold"], "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], " defined in ", ButtonBox["equation", ButtonData:>"Eq:20", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:20"], ") to evaluate ", Cell[BoxData[ \(TraditionalForm\`\[PartialD]G\/\[PartialD]\[Lambda]\_i\)]], ", using the Gibbs machine training algorithm in ", ButtonBox["equation", ButtonData:>"Eq:12", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:12"], "). The ", Cell[BoxData[ \(TraditionalForm\`\[Lambda]\_i\)]], " are the various ", Cell[BoxData[ \(TraditionalForm\`\[Lambda]\)]], "-coefficients that appear in ", ButtonBox["equation", ButtonData:>"Eq:20", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:20"], "). Firstly define two matrices ", Cell[BoxData[ FormBox[ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], TraditionalForm]]], " and ", Cell[BoxData[ FormBox[ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], TraditionalForm]]], " as" }], "Text", CellTags->"Ed:Change17"], Cell[BoxData[{ FormBox[ RowBox[{ RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "(", "t", ")"}], "\[Congruent]", "\[AlignmentMarker]", \(\((1 - \[Delta]\_\(j, 0\))\) \(\[Delta]\_\(h\ \_t, i\)\) \[Delta]\_\(h\_\(t + 1\), j\)\)}], TraditionalForm], "\n", FormBox[ RowBox[{ RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], "(", "t", ")"}], "\[Congruent]", "\[AlignmentMarker]", \(\((1 - \[Delta]\_\(j, 0\))\) \(\[Delta]\_\(h\ \_t, i\)\) \[Delta]\_\(x\_t, j\)\)}], TraditionalForm]}], "NumberedEquation", TextAlignment->AlignmentMarker], Cell["Using these definitions I obtain two types of derivative", "Text"], Cell[BoxData[{ FormBox[ RowBox[{ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]], "=", "\[AlignmentMarker]", RowBox[{\(\[Sum]\+\(t = 1\)\%\(T - 1\)\), RowBox[{"[", RowBox[{ SubscriptBox[ RowBox[{"\[LeftAngleBracket]", RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "(", "t", ")"}], "\[RightAngleBracket]"}], StyleBox["free", FontSlant->"Italic"]], "-", SubscriptBox[ RowBox[{"\[LeftAngleBracket]", RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "(", "t", ")"}], "\[RightAngleBracket]"}], StyleBox["clamped", FontSlant->"Italic"]]}], "]"}]}]}], TraditionalForm], "\n", FormBox[ RowBox[{ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]], "=", "\[AlignmentMarker]", RowBox[{\(\[Sum]\+\(t = 1\)\%T\), RowBox[{"[", RowBox[{ SubscriptBox[ RowBox[{"\[LeftAngleBracket]", RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], "(", "t", ")"}], "\[RightAngleBracket]"}], StyleBox["free", FontSlant->"Italic"]], "-", SubscriptBox[ RowBox[{"\[LeftAngleBracket]", RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], "(", "t", ")"}], "\[RightAngleBracket]"}], StyleBox["clamped", FontSlant->"Italic"]]}], "]"}]}]}], TraditionalForm]}], "NumberedEquation", TextAlignment->AlignmentMarker, SpanMaxSize->Infinity, CellTags->"Eq:27"], Cell[TextData[{ "where ", StyleBox["free", FontSlant->"Italic"], " denotes averaging with respect to ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ", and ", StyleBox["clamped", FontSlant->"Italic"], " denotes averaging with respect to ", Cell[BoxData[ FormBox[ RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["h", FontWeight->"Bold"], "|", StyleBox["x", FontWeight->"Bold"]}], ")"}], RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}]}], TraditionalForm]]], ", using the same notation as defined in ", ButtonBox["equation", ButtonData:>"Eq:13", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:13"], ")." }], "Text"], Cell[TextData[{ "Each of the averages in ", ButtonBox["equation", ButtonData:>"Eq:27", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:27"], ") is a linear combination of average state occupations. These are \ analogous to histograms or frequencies of occurrence. For instance, ", Cell[BoxData[ \(TraditionalForm\`\[LeftAngleBracket]\(\[Delta]\_\(h\_t, i\)\) \[Delta]\_\(x\_t, j\)\[RightAngleBracket]\)]], " averages (over ", Cell[BoxData[ FormBox[ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], TraditionalForm]]], ") a quantity that is 1 when ", Cell[BoxData[ \(TraditionalForm\`h\_t = i\)]], " and ", Cell[BoxData[ \(TraditionalForm\`x\_t = j\)]], ", and zero otherwise. This average is the answer to the question \"What \ proportion of the time is ", Cell[BoxData[ \(TraditionalForm\`\((h\_t, x\_t)\)\)]], " to be found in state ", Cell[BoxData[ \(TraditionalForm\`\((i, j)\)\)]], " when the Gibbs distribution is allowed to explore its states without \ external influence?\". There are analogous interpretations of each of the \ other averages in ", ButtonBox["equation", ButtonData:>"Eq:27", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:27"], ")." }], "Text"], Cell[TextData[{ "The ", StyleBox["free", FontSlant->"Italic"], " and ", StyleBox["clamped", FontSlant->"Italic"], " averages are explicitly" }], "Text"], Cell[BoxData[{ FormBox[ RowBox[{ SubscriptBox[\(\[LeftAngleBracket]D\_\(i, j\)\%\((\[CenterDot])\)\ \[RightAngleBracket]\), StyleBox["free", FontSlant->"Italic"]], "\[Congruent]", "\[AlignmentMarker]", RowBox[{\(\[Sum]\+\(x\_1, x\_2, \[CenterEllipsis], x\_T = 0\)\%\(N\_x - 1\)\), RowBox[{\(\[Sum]\+\(h\_1, h\_2, \[CenterEllipsis], h\_T = 0\)\%\(N\_h - 1\)\), RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], \(D\_\(i, j\)\%\((\[CenterDot])\)\)}]}]}]}], TraditionalForm], "\n", FormBox[ RowBox[{ SubscriptBox[\(\[LeftAngleBracket]D\_\(i, j\)\%\((\[CenterDot])\)\ \[RightAngleBracket]\), StyleBox["clamped", FontSlant->"Italic"]], "\[Congruent]", "\[AlignmentMarker]", RowBox[{ UnderscriptBox["\[Sum]", RowBox[{ StyleBox["x", FontWeight->"Bold"], "\[Element]", \(training\ set\)}]], FractionBox[ RowBox[{\(\[Sum]\+\(h\_1, h\_2, \[CenterEllipsis], h\_T = 0\)\%\(N\_h - 1\)\), RowBox[{ RowBox[{"Q", "(", RowBox[{ StyleBox["x", FontWeight->"Bold"], ",", StyleBox["h", FontWeight->"Bold"]}], ")"}], \(D\_\(i, j\)\%\((\[CenterDot])\)\)}]}], RowBox[{"Q", "(", StyleBox["x", FontWeight->"Bold"], ")"}]]}]}], TraditionalForm]}], "NumberedEquation", TextAlignment->AlignmentMarker, SpanMaxSize->Infinity, CellTags->"Ed:Change18"], Cell[TextData[{ "The gradients ", Cell[BoxData[ FormBox[ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]], TraditionalForm]]], " and ", Cell[BoxData[ FormBox[ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]], TraditionalForm]]], " can thus be evaluated by measuring the various state occupations that \ appear in ", ButtonBox["equation", ButtonData:>"Eq:27", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:27"], ") to yield" }], "Text", CellTags->"Ed:Change19"], Cell[BoxData[{ FormBox[ RowBox[{ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]]}]], "=", "\[AlignmentMarker]", RowBox[{ RowBox[{\(1\/Z\), RowBox[{\(\[Sum]\+\(\[Tau] = 1\)\%\(T - 1\)\), "\[AlignmentMarker]", RowBox[{\(\[Sum]\+\(x\_1, x\_2, \[CenterEllipsis], x\_T = 0\)\%\(N\_x - 1\)\), RowBox[{\(\[Sum]\+\(h\_1, h\_2, \[CenterEllipsis], h\_T = 0\)\%\(N\_h - 1\)\), RowBox[{ RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "(", "\[Tau]", ")"}], \({\[Product]\+\(t = 1\)\%\(\[Tau] - \ 1\)S\_\(h\_t, x\_t, h\_\(t + 1\)\)\%\((+)\)}\), SubsuperscriptBox["S", \(h\_\[Tau], x\_\[Tau]\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], \({\[Product]\+\(t = \[Tau]\)\%\(T - \ 1\)S\_\(h\_t, h\_\(t + 1\), x\_\(t + 1\)\)\%\((-)\)}\)}]}]}]}]}], "-", RowBox[{\(1\/N\), RowBox[{\(\[Sum]\+\(\[Tau] = 1\)\%\(T - 1\)\), RowBox[{ UnderscriptBox["\[Sum]", RowBox[{ StyleBox["x", FontWeight->"Bold"], "\[Element]", \(training\ set\)}]], RowBox[{ FractionBox["1", RowBox[{"Z", "(", StyleBox["x", FontWeight->"Bold"], ")"}]], RowBox[{\(\[Sum]\+\(h\_1, h\_2, \[CenterEllipsis], h\_T = 0\)\%\(N\_h - 1\)\), RowBox[{ RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "(", "\[Tau]", ")"}], \({\[Product]\+\(t = 1\)\%\(\[Tau] - 1\)S\_\(h\ \_t, x\_t, h\_\(t + 1\)\)\%\((+)\)}\), SubsuperscriptBox["S", \(h\_\[Tau], x\_\[Tau]\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], \({\[Product]\+\(t = \[Tau]\)\%\(T - 1\)S\_\ \(h\_t, h\_\(t + 1\), x\_\(t + 1\)\)\%\((-)\)}\)}]}]}]}]}]}]}]}], TraditionalForm], "\n", FormBox[ RowBox[{ FractionBox[\(\[PartialD]G\), RowBox[{"\[PartialD]", SubsuperscriptBox["\[Lambda]", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]]}]], "=", "\[AlignmentMarker]", RowBox[{ RowBox[{"ditto", " ", "with", " ", RowBox[{ RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "(", "\[Tau]", ")"}], "\[LongRightArrow]", RowBox[{ SubsuperscriptBox["D", \(i, j\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], "(", "\[Tau]", ")"}]}], " ", "and", " ", "T"}], "-", \(1\[LongRightArrow]T\)}]}], TraditionalForm]}], "NumberedEquation", TextAlignment->AlignmentMarker, SpanMaxSize->Infinity, CellTags->{"Ed:Change20", "Eq:29"}], Cell[TextData[{ "where ", Cell[BoxData[ \(TraditionalForm\`Z\)]], " and ", Cell[BoxData[ FormBox[ RowBox[{"Z", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], " are defined in ", ButtonBox["equation", ButtonData:>"Eq:25", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:25"], "). The ", StyleBox["free", FontSlant->"Italic"], " average is self explanatory. The ", StyleBox["clamped", FontSlant->"Italic"], " average depends on a sum over the training set, from which ", Cell[BoxData[ FormBox[ StyleBox["x", FontWeight->"Bold"], TraditionalForm]]], " should be selected at random in order to emulate the probability ", Cell[BoxData[ FormBox[ RowBox[{"P", "(", StyleBox["x", FontWeight->"Bold"], ")"}], TraditionalForm]]], "." }], "Text"], Cell[TextData[{ "The value of ", Cell[BoxData[ \(TraditionalForm\`G\)]], " itself can also be calculated up to an additive constant. Thus" }], "Text"], Cell[BoxData[ FormBox[ RowBox[{"G", "=", RowBox[{\(-log[Z]\), "+", RowBox[{\(1\/N\), RowBox[{ UnderscriptBox["\[Sum]", RowBox[{ StyleBox["x", FontWeight->"Bold"], "\[Element]", \(training\ set\)}]], RowBox[{"log", "[", RowBox[{"Z", "(", StyleBox["x", FontWeight->"Bold"], ")"}], "]"}]}]}], "+", "constant"}]}], TraditionalForm]], "NumberedEquation", CellTags->"Eq:30"], Cell[TextData[{ ButtonBox["Equation", ButtonData:>"Eq:30", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:30"], ") can be used to verify that a gradient ascent algorithm based on ", ButtonBox["equation", ButtonData:>"Eq:29", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:29"], ") is indeed increasing the value of ", Cell[BoxData[ \(TraditionalForm\`G\)]], Cell[BoxData[ FormBox[ ButtonBox["FOOTNOTE", ButtonData:>"Footnote:15", Active->True, ButtonStyle->"Hyperlink"], TextForm]]], "." }], "Text"] }, Closed]] }, Closed]], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], " Algorithmic details" }], "Section"], Cell["\<\ In this section I shall explain how a general purpose hidden Markov model \ training algorithm can be written, which allows both Gibbs machine and \ Baum-Welch training algorithms to be run.\ \>", "Text"], Cell[CellGroupData[{ Cell[TextData[{ CounterBox["Section"], ".", CounterBox["Subsection"], " ", "Clamped matrix products" }], "Subsection"], Cell[TextData[{ "The hidden Markov model training algorithms of interest all reduce to \ evaluating products of matrices. For instance, Baum-Welch re-estimation \ involves the recursive computation of so-called ", Cell[BoxData[ \(TraditionalForm\`\[Alpha]\)]], " and ", Cell[BoxData[ \(TraditionalForm\`\[Beta]\)]], " matrices [", ButtonBox["2", ButtonData:>"Ref:Baum1972", ButtonStyle->"Hyperlink"], "]. The Gibbs machine training algorithm reduces to ", ButtonBox["equation", ButtonData:>"Eq:29", ButtonStyle->"Hyperlink"], " (", CounterBox["NumberedEquation", "Eq:29"], "), where the \"clamped\" case is essentially the same as Baum-Welch, and \ the \"free\" case involves a complete sum over histories. It would be prudent \ to write an algorithm that allows products of ", Cell[BoxData[ \(TraditionalForm\`S\^\((\[PlusMinus])\)\)]], " matrices to be evaluated whilst clamping arbitrary subsets of indices \ that would normally be summed over. Everything that I require may be \ evaluated as one or another special case of this general algorithm." }], "Text"], Cell["The matrix products that I need to evaluate are", "Text"], Cell[BoxData[{ FormBox[ RowBox[{\(\(M\_\(h\_t, x\_t\)\%\((+)\)\) \((t)\)\), "=", "\[AlignmentMarker]", RowBox[{ RowBox[{ SubsuperscriptBox["C", \(h\_t, x\_t\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], "(", "t", ")"}], SubsuperscriptBox["S", \(h\_t, x\_t\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], RowBox[{\(\[Sum]\+\(h\_t = 0\)\%\(N\_h - 1\)\), RowBox[{\(\[Sum]\+\(x\_t = 0\)\%\(N\_x - 1\)\), RowBox[{ RowBox[{ SubsuperscriptBox["C", \(h\_\(t - 1\), h\_t\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], "(", \(t - 1\), ")"}], SubsuperscriptBox["S", \(h\_\(t - 1\), h\_t\), RowBox[{"(", StyleBox["hh", FontSlant->"Italic"], ")"}]], \(\(M\_\(h\_\(t - 1\), x\_\(t - 1\)\)\%\((+)\)\)( t - 1)\)}]}]}]}]}], TraditionalForm], "\n", FormBox[ RowBox[{\(\(M\_\(h\_t, x\_t\)\%\((-)\)\) \((t)\)\), "=", "\[AlignmentMarker]", RowBox[{ RowBox[{ SubsuperscriptBox["C", \(h\_t, x\_t\), RowBox[{"(", StyleBox["hx", FontSlant->"Italic"], ")"}]], "(", "t", ")"}],