00001 //$$ newmatnl.h definition file for non-linear optimisation 00002 00003 // Copyright (C) 1993,4,5: R B Davies 00004 00005 #ifndef NEWMATNL_LIB 00006 #define NEWMATNL_LIB 0 00007 00008 #include "newmat.h" 00009 00010 #ifdef use_namespace 00011 namespace NEWMAT { 00012 #endif 00013 00014 00015 00016 /* 00017 00018 This is a beginning of a series of classes for non-linear optimisation. 00019 00020 At present there are two classes. FindMaximum2 is the basic optimisation 00021 strategy when one is doing an optimisation where one has first 00022 derivatives and estimates of the second derivatives. Class 00023 NonLinearLeastSquares is derived from FindMaximum2. This provides the 00024 functions that calculate function values and derivatives. 00025 00026 A third class is now added. This is for doing maximum-likelihood when 00027 you have first derviatives and something like the Fisher Information 00028 matrix (eg the variance covariance matrix of the first derivatives or 00029 minus the second derivatives - this matrix is assumed to be positive 00030 definite). 00031 00032 00033 00034 class FindMaximum2 00035 00036 Suppose T is the ColumnVector of parameters, F(T) the function we want 00037 to maximise, D(T) the ColumnVector of derivatives of F with respect to 00038 T, and S(T) the matrix of second derivatives. 00039 00040 Then the basic iteration is given a value of T, update it to 00041 00042 T - S.i() * D 00043 00044 where .i() denotes inverse. 00045 00046 If F was quadratic this would give exactly the right answer (except it 00047 might get a minimum rather than a maximum). Since F is not usually 00048 quadratic, the simple procedure would be to recalculate S and D with the 00049 new value of T and keep iterating until the process converges. This is 00050 known as the method of conjugate gradients. 00051 00052 In practice, this method may not converge. FindMaximum2 considers an 00053 iteration of the form 00054 00055 T - x * S.i() * D 00056 00057 where x is a number. It tries x = 1 and uses the values of F and its 00058 slope with respect to x at x = 0 and x = 1 to fit a cubic in x. It then 00059 choses x to maximise the resulting function. This gives our new value of 00060 T. The program checks that the value of F is getting better and carries 00061 out a variety of strategies if it is not. 00062 00063 The program also has a second strategy. If the successive values of T 00064 seem to be lying along a curve - eg we are following along a curved 00065 ridge, the program will try to fit this ridge and project along it. This 00066 does not work at present and is commented out. 00067 00068 FindMaximum2 has three virtual functions which need to be over-ridden by 00069 a derived class. 00070 00071 void Value(const ColumnVector& T, bool wg, Real& f, bool& oorg); 00072 00073 T is the column vector of parameters. The function returns the value of 00074 the function to f, but may instead set oorg to true if the parameter 00075 values are not valid. If wg is true it may also calculate and store the 00076 second derivative information. 00077 00078 bool NextPoint(ColumnVector& H, Real& d); 00079 00080 Using the value of T provided in the previous call of Value, find the 00081 conjugate gradients adjustment to T, that is - S.i() * D. Also return 00082 00083 d = D.t() * S.i() * D. 00084 00085 NextPoint should return true if it considers that the process has 00086 converged (d very small) and false otherwise. The previous call of Value 00087 will have set wg to true, so that S will be available. 00088 00089 Real LastDerivative(const ColumnVector& H); 00090 00091 Return the scalar product of H and the vector of derivatives at the last 00092 value of T. 00093 00094 The function Fit is the function that calls the iteration. 00095 00096 void Fit(ColumnVector&, int); 00097 00098 The arguments are the trial parameter values as a ColumnVector and the 00099 maximum number of iterations. The program calls a DataException if the 00100 initial parameters are not valid and a ConvergenceException if the 00101 process fails to converge. 00102 00103 00104 class NonLinearLeastSquares 00105 00106 This class is derived from FindMaximum2 and carries out a non-linear 00107 least squares fit. It uses a QR decomposition to carry out the 00108 operations required by FindMaximum2. 00109 00110 A prototype class R1_Col_I_D is provided. The user needs to derive a 00111 class from this which includes functions the predicted value of each 00112 observation its derivatives. An object from this class has to be 00113 provided to class NonLinearLeastSquares. 00114 00115 Suppose we observe n normal random variables with the same unknown 00116 variance and such the i-th one has expected value given by f(i,P) 00117 where P is a column vector of unknown parameters and f is a known 00118 function. We wish to estimate P. 00119 00120 First derive a class from R1_Col_I_D and override Real operator()(int i) 00121 to give the value of the function f in terms of i and the ColumnVector 00122 para defined in class R1_CoL_I_D. Also override ReturnMatrix 00123 Derivatives() to give the derivates of f at para and the value of i 00124 used in the preceeding call to operator(). Return the result as a 00125 RowVector. Construct an object from this class. Suppose in what follows 00126 it is called pred. 00127 00128 Now constuct a NonLinearLeastSquaresObject accessing pred and optionally 00129 an iteration limit and an accuracy critierion. 00130 00131 NonLinearLeastSquares NLLS(pred, 1000, 0.0001); 00132 00133 The accuracy critierion should be somewhat less than one and 0.0001 is 00134 about the smallest sensible value. 00135 00136 Define a ColumnVector P containing a guess at the value of the unknown 00137 parameter, and a ColumnVector Y containing the unknown data. Call 00138 00139 NLLS.Fit(Y,P); 00140 00141 If the process converges, P will contain the estimates of the unknown 00142 parameters. If it does not converge an exception will be generated. 00143 00144 The following member functions can be called after you have done a fit. 00145 00146 Real ResidualVariance() const; 00147 00148 The estimate of the variance of the observations. 00149 00150 void GetResiduals(ColumnVector& Z) const; 00151 00152 The residuals of the individual observations. 00153 00154 void GetStandardErrors(ColumnVector&); 00155 00156 The standard errors of the observations. 00157 00158 void GetCorrelations(SymmetricMatrix&); 00159 00160 The correlations of the observations. 00161 00162 void GetHatDiagonal(DiagonalMatrix&) const; 00163 00164 Forms a diagonal matrix of values between 0 and 1. If the i-th value is 00165 larger than, say 0.2, then the i-th data value could have an undue 00166 influence on your estimates. 00167 00168 00169 */ 00170 00171 class FindMaximum2 00172 { 00173 virtual void Value(const ColumnVector&, bool, Real&, bool&) = 0; 00174 virtual bool NextPoint(ColumnVector&, Real&) = 0; 00175 virtual Real LastDerivative(const ColumnVector&) = 0; 00176 public: 00177 void Fit(ColumnVector&, int); 00178 virtual ~FindMaximum2() {} // to keep gnu happy 00179 }; 00180 00181 class R1_Col_I_D 00182 { 00183 // The prototype for a Real function of a ColumnVector and an 00184 // integer. 00185 // You need to derive your function from this one and put in your 00186 // function for operator() and Derivatives() at least. 00187 // You may also want to set up a constructor to enter in additional 00188 // parameter values (that will not vary during the solve). 00189 00190 protected: 00191 ColumnVector para; // Current x value 00192 00193 public: 00194 virtual bool IsValid() { return true; } 00195 // is the current x value OK 00196 virtual Real operator()(int i) = 0; // i-th function value at current para 00197 virtual void Set(const ColumnVector& X) { para = X; } 00198 // set current para 00199 bool IsValid(const ColumnVector& X) 00200 { Set(X); return IsValid(); } 00201 // set para, check OK 00202 Real operator()(int i, const ColumnVector& X) 00203 { Set(X); return operator()(i); } 00204 // set para, return value 00205 virtual ReturnMatrix Derivatives() = 0; 00206 // return derivatives as RowVector 00207 virtual ~R1_Col_I_D() {} // to keep gnu happy 00208 }; 00209 00210 00211 class NonLinearLeastSquares : public FindMaximum2 00212 { 00213 // these replace the corresponding functions in FindMaximum2 00214 void Value(const ColumnVector&, bool, Real&, bool&); 00215 bool NextPoint(ColumnVector&, Real&); 00216 Real LastDerivative(const ColumnVector&); 00217 00218 Matrix X; // the things we need to do the 00219 ColumnVector Y; // QR triangularisation 00220 UpperTriangularMatrix U; // see the write-up in newmata.txt 00221 ColumnVector M; 00222 Real errorvar, criterion; 00223 int n_obs, n_param; 00224 const ColumnVector* DataPointer; 00225 RowVector Derivs; 00226 SymmetricMatrix Covariance; 00227 DiagonalMatrix SE; 00228 R1_Col_I_D& Pred; // Reference to predictor object 00229 int Lim; // maximum number of iterations 00230 00231 public: 00232 NonLinearLeastSquares(R1_Col_I_D& pred, int lim=1000, Real crit=0.0001) 00233 : criterion(crit), Pred(pred), Lim(lim) {} 00234 void Fit(const ColumnVector&, ColumnVector&); 00235 Real ResidualVariance() const { return errorvar; } 00236 void GetResiduals(ColumnVector& Z) const { Z = Y; } 00237 void GetStandardErrors(ColumnVector&); 00238 void GetCorrelations(SymmetricMatrix&); 00239 void GetHatDiagonal(DiagonalMatrix&) const; 00240 00241 private: 00242 void MakeCovariance(); 00243 }; 00244 00245 00246 // The next class is the prototype class for calculating the 00247 // log-likelihood. 00248 // I assume first derivatives are available and something like the 00249 // Fisher Information or variance/covariance matrix of the first 00250 // derivatives or minus the matrix of second derivatives is 00251 // available. This matrix must be positive definite. 00252 00253 class LL_D_FI 00254 { 00255 protected: 00256 ColumnVector para; // current parameter values 00257 bool wg; // true if FI matrix wanted 00258 00259 public: 00260 virtual void Set(const ColumnVector& X) { para = X; } 00261 // set parameter values 00262 virtual void WG(bool wgx) { wg = wgx; } 00263 // set wg 00264 00265 virtual bool IsValid() { return true; } 00266 // return true is para is OK 00267 bool IsValid(const ColumnVector& X, bool wgx=true) 00268 { Set(X); WG(wgx); return IsValid(); } 00269 00270 virtual Real LogLikelihood() = 0; // return the loglikelihhod 00271 Real LogLikelihood(const ColumnVector& X, bool wgx=true) 00272 { Set(X); WG(wgx); return LogLikelihood(); } 00273 00274 virtual ReturnMatrix Derivatives() = 0; 00275 // column vector of derivatives 00276 virtual ReturnMatrix FI() = 0; // Fisher Information matrix 00277 virtual ~LL_D_FI() {} // to keep gnu happy 00278 }; 00279 00280 // This is the class for doing the maximum likelihood estimation 00281 00282 class MLE_D_FI : public FindMaximum2 00283 { 00284 // these replace the corresponding functions in FindMaximum2 00285 void Value(const ColumnVector&, bool, Real&, bool&); 00286 bool NextPoint(ColumnVector&, Real&); 00287 Real LastDerivative(const ColumnVector&); 00288 00289 // the things we need for the analysis 00290 LL_D_FI& LL; // reference to log-likelihood 00291 int Lim; // maximum number of iterations 00292 Real Criterion; // convergence criterion 00293 ColumnVector Derivs; // for the derivatives 00294 LowerTriangularMatrix LT; // Cholesky decomposition of FI 00295 SymmetricMatrix Covariance; 00296 DiagonalMatrix SE; 00297 00298 public: 00299 MLE_D_FI(LL_D_FI& ll, int lim=1000, Real criterion=0.0001) 00300 : LL(ll), Lim(lim), Criterion(criterion) {} 00301 void Fit(ColumnVector& Parameters); 00302 void GetStandardErrors(ColumnVector&); 00303 void GetCorrelations(SymmetricMatrix&); 00304 00305 private: 00306 void MakeCovariance(); 00307 }; 00308 00309 00310 #ifdef use_namespace 00311 } 00312 #endif 00313 00314 00315 00316 #endif 00317 00318 // body file: newmatnl.cpp 00319 00320 00321 00322