.. _trep_doptimizer: :class:`DOptimizer` - Discrete Trajectory Optimization ====================================================== .. currentmodule:: trep.discopt .. contents:: :local: Discrete trajectory optimization is performed with :class:`DOptimizer` objects. The optimizer finds a trajectory for a :class:`DSystem` that minimizes a :class:`DCost`. The optimizer should work for arbitrary systems, not just ones based on variational integrators. You would need to define a compatible :class:`DSystem` class that supports the right functionality. .. admonition:: Examples ``pend-on-cart-optimization.py`` DOptimizer Objects ------------------ .. class:: DOptimizer(dsys, cost, first_method_iterations=10, monitor=None) You should create a new optimizer for each new system or cost, but for a given combination you can optimize as many different trajectories as you want. The optimization is designed to mostly be used through the :meth:`optimize` method. You can use the :class:`DOptimizerMonitor` class to monitor progress during optimizations. If *monitor* is :data:`None`, the optimizer will use :class:`DOptimizerDefaultMonitor` which prints useful information to the console. .. attribute:: DOptimizer.dsys The :class:`DSystem` that is optimized by this instance. .. attribute:: DOptimizer.cost = cost The :class:`DCost` that is optimized by this instance. .. attribute:: DOptimizer.optimize_ic This is a Boolean indicating whether or not the optimizer should optimize the initial condition during an optimization. Defaults to :data:`False`. .. warning:: This is broken for constrained systems. .. attribute:: DOptimizer.monitor The :class:`DOptimizerMonitor` that this optimization reports progress to. The default is a new instance of :class:`DOptimizerDefaultMonitor`. .. attribute:: DOptimizer.Qproj DOptimizer.Rproj These are the weights for an LQR problem used to generate a linear feedback controller for each trajectory. Each should be a function of k that returns an appropriately sized 2D :class:`ndarray`. The default values are identity matrices. .. attribute:: DOptimizer.descent_tolerance A trajectory is considered a local minimizer if the norm of the cost derivative is less than this. The default value is 1e-6. Cost Functions ^^^^^^^^^^^^^^ .. method:: DOptimizer.calc_cost(X, U) Calculate the cost of a trajectory *X,U*. .. method:: DOptimizer.calc_dcost(X, U, dX, dU) Calculate the derivative of the cost function evaluated at *X,U* in the direction of a tangent trajectory *dX,dU*\ . It is important that *dX,dU* be an actual tangent trajectory of the system at *X,U* to get the correct cost. See :meth:`check_ddcost` for an example where this is important. .. method:: DOptimizer.calc_ddcost(X, U, dX, dU, Q, R, S) Calculate the second derivative of the cost function evaluated at *X,U* in the direction of a tangent trajectory *dX,dU*\ . The second order model parameters must be specified in *Q,R,S*\ . These can be obtained through :meth:`calc_newton_model` or by :meth:`calc_descent_direction` when *method*\ ="newton". It is important that *dX,dU* be an actual tangent trajectory of the system at *X,U* to get the correct cost. See :meth:`check_ddcost` for an example where this is important. Descent Directions ^^^^^^^^^^^^^^^^^^ .. method:: DOptimizer.calc_steepest_model() Calculate a quadratic model to find a steepest descent direction: .. math:: Q = \mathcal{I} \quad R = \mathcal{I} \quad S = 0 .. method:: DOptimizer.calc_quasi_model(X, U) Calculate a quadratic model to find a quasi-newton descent direction. This uses the second derivative of the un-projected cost function. .. math:: Q = \derivII[h]{x}{x} \quad R = \derivII[h]{u}{u} \quad S = \derivII[h]{x}{u} This method does not use the second derivative of the system dynamics, so it tends to be as fast :meth:`calc_steepest_model`, but usually converges much faster. .. method:: DOptimizer.calc_newton_model(X, U, A, B, K) Calculate a quadratic model to find a newton descent direction. This is the true second derivative of the projected cost function: .. math:: \begin{align*} Q(k_f) &= D^2m(x(k_f)) \\ Q(k) &= \derivII[\ell]{x}{x}(k) + z^T(k+1) \derivII[f]{x}{x}(k) \\ S(k) &= \derivII[\ell]{x}{u}(k) + z^T(k+1) \derivII[f]{x}{u}(k) \\ R(k) &= \derivII[\ell]{u}{u}(k) + z^T(k+1) \derivII[f]{u}{u}(k) \end{align*} where: .. math:: \begin{align*} z(k_f) &= Dm^T(x(k_f)) \\ z(k) &= \deriv[\ell]{x}^T(k) - \mathcal{K}^T(k) \deriv[\ell]{u}^T(k) + \left[ \deriv[f]{x}^T(k) - \mathcal{K}^T(k) \deriv[f]{u}^T(i) \right] z(k+1) \end{align*} This method depends on the second derivative of the system's dynamics, so it can be significantly slower than other step methods. However, it converges extremely quickly near the minimizer. .. method:: DOptimizer.calc_descent_direction(X, U, method='steepest') Calculate the descent direction from the trajectory *X,U* using the specified method. Valid methods are: * "steepest" - Use :meth:`calc_steepest_model` * "quasi" - Use :meth:`calc_quasi_model` * "newton" - Use :meth:`calc_newton_model` The method returns the named tuple ``(Kproj, dX, dU, Q, R, S)``. Armijo Line Search ^^^^^^^^^^^^^^^^^^ .. attribute:: DOptimizer.armijo_beta DOptimizer.armijo_alpha DOptimizer.armijo_max_iterations Parameters for the Armijo line search performed at each step along the calculated descent direction. *armijo_beta* should be between 0 and 1 (not inclusive). The default value is 0.7. *armijo_alpha* should be between 0 (inclusive) and 1 (not inclusive). The default value is 0.00001. *armijo_max_iterations* should be a positive integer. If the line search cannot satisfy the sufficient decrease criteria after this number of iterations, a :exc:`trep.ConvergenceError` is raised. The default value is 30. .. method:: DOptimizer.armijo_simulate(bX, bU, Kproj) This is a sub-function for armijo search. It projects the trajectory bX,bU to a real trajectory like DSystem.project, but it also returns a partial trajectory if the simulation fails. It is not intended to be used directly. .. method:: DOptimizer.armijo_search(X, U, Kproj, dX, dU) Perform an Armijo line search from the trajectory X,U along the tangent trajectory dX, dU. Returns the named tuple ``(nX, nU, nCost)`` or raises :exc:`trep.ConvergenceError` if the search doesn't terminate before taking the maximum number of iterations. This method is used by :meth:`step` once a descent direction has been found. Optimizing a Trajectory ^^^^^^^^^^^^^^^^^^^^^^^ .. method:: DOptimizer.step(iteration, X, U, method='steepest') Perform an optimization step using a particular method. This finds a new trajectory nX, nU that has a lower cost than the trajectory X,U. Valid methods are defined in DOptimizer.calc_descent_direction(). If the specified method fails to find an acceptable descent direction, :meth:`step` will try again with the method returned by :meth:`select_fallback_method`. *iteration* is an integer that is used by :meth:`select_fallback_method` and passed to the :class:`DOptimizerMonitor` when reporting the current step progress. Returns the named tuple ``(done, nX, nU, dcost0, cost1)`` where: * *done* is a Boolean that is :data:`True` if the trajectory *X,U* cannot be improved (i.e, *X,U* is a local minimizer of the cost). * *nX,nU* are the improved trajectory * *dcost0* is the derivative of the cost at *X,U*. * *cost1* is the cost of the improved trajectory. .. method:: DOptimizer.optimize(X, U, max_steps=50) Iteratively optimize the trajectory *X,U* until a local minimizer is found or *max_steps* are taken. The descent direction method used at each step is determined by :meth:`select_method`. Returns the named tuple ``(converged, X, U)`` where: * *converged* is a Boolean indicating if the optimization finished on a local minimizer. * *X,U* is the improved trajectory. .. attribute:: DOptimizer.first_method_iterations Number of steps to take using :attr:`first_method` before switching to :attr:`second_method` for the remaining steps. See :meth:`select_method` for more information on controlling the step method. *Default: 10* .. attribute:: DOptimizer.first_method Descent method to use for the first iterations of the optimization. *Default: "quasi"* .. attribute:: DOptimizer.second_method Descent method to use for the optimzation after :attr:`first_method_iterations` iterations have been taken. *Default: "netwon"* .. method:: DOptimizer.select_method(iteration) Select a descent direction method for the specified iteration. This is called by :meth:`optimize` to choose a descent direction method for each step. The default implementation takes a pre-determined number (:attr:`lower_order_iterations`) of "quasi" steps and then switches to the "newton" method. You can customize the method selection by inheriting :class:`DOptimizer` and overriding this method. .. method:: DOptimizer.select_fallback_method(iteration, current_method) When :meth:`step` finds a bad descent direction (e.g, positive cost derivative), this method is called to figure out what descent direction it should try next. Debugging Tools ^^^^^^^^^^^^^^^ .. method:: DOptimizer.descent_plot(X, U, method='steepest', points=40, legend=True) Create a descent direction plot at X,U for the specified method. This is a useful plot for examining the line search portion of an optimization step. It plots several values along the descent direction line. All values are plotted against the line search parameter :math:`z \in \mathbb{R}`: * The true cost :math:`g(\xi + z\delta\xi) - g(\xi)` * The modeled cost :math:`Dg(\xi)\op\delta\xi z + \half q\op(\delta\xi, \delta\xi)z^2` * The sufficient decrease line: :math:`g(\xi + z \delta\xi) < g(\xi) + \alpha z Dg(\xi)\op\delta\xi` * The armijo evaluation points: :math:`z = \beta^m` This is an example plot for a steepest descent plot during a particular optimization. This plot shows that the true cost increases much faster than the model predicts. As a result, 6 Armijo steps are required before the sufficient decrease condition is satisfied. .. image:: descent-direction-plot-example.png Example usage:: >>> import matplotlib.pyplot as pyplot >>> optimizer.descent_plot(X, U, method='steepest', legend=True) >>> pyplot.show() .. method:: check_dcost(X, U, method='steepest', delta=1e-6, tolerance=1e-5) Check the calculated derivative of the cost function at X,U with a numeric approximation determined from the original cost function. .. method:: DOptimizer.check_ddcost(X, U, method='steepest', delta=1e-6, tolerance=1e-5) Check the second derivative of the cost function at X,U with a numeric approximation determined from the first derivative. DOptimizerMonitor Objects ------------------------- :class:`DOptimizer` objects report optimization progress to their :attr:`monitor` object. The base implementation does nothing. The default monitor :class:`DOptimizerDefaultMonitor` mainly prints reports to the console. You can define your own monitor to gather more detailed information like saving each intermediate trajectory. Note that if you do want to save any values, you should save copies. The optimizer might reuse the same variables in each step to optimize memory usage. .. class:: DOptimizerMonitor This is the base class for Optimizer Monitors. It does absolutely nothing, so you can use this as your monitor if you want completely silent operation. .. method:: DOptimizerMonitor.optimize_begin(X, U) Called when DOptimizer.optimize() is called with the initial trajectory. .. method:: DOptimizerMonitor.optimize_end(converged, X, U, cost) Called before DOptimizer.optimize() returns with the results of the optimization. .. method:: DOptimizerMonitor.step_begin(iteration) Called at the start of each DOptimize.step(). Note that step calls itself with the new method when one method fails, so this might be called multiple times with the same iteration. All calls will be related to the same iteration until step_termination or step_completed are called. .. method:: DOptimizerMonitor.step_info(method, cost, dcost, X, U, dX, dU, Kproj) Called after a descent direction has been calculated. .. method:: DOptimizerMonitor.step_method_failure(method, cost, dcost, fallback_method) Called when a descent method results in a positive cost derivative. .. method:: DOptimizerMonitor.step_termination(cost, dcost) Called if dcost satisfies the descent tolerance, indicating that the current trajectory is a local minimizer. .. method:: DOptimizerMonitor.step_completed(method, cost, nX, nU) Called at the end of an optimization step with information about the new trajectory. .. method:: DOptimizerMonitor.armijo_simulation_failure(armijo_iteration, nX, nU, bX, bU) Called when a simulation fails (usually an instability) during the evaluation of the cost in an armijo step. The Armijo search continues after this. .. method:: DOptimizerMonitor.armijo_search_failure(X, U, dX, dU, cost0, dcost0, Kproj) Called when the Armijo search reaches the maximum number of iterations without satisfying the sufficient decrease criteria. The optimization cannot proceed after this. .. method:: DOptimizerMonitor.armijo_evaluation(armijo_iteration, nX, nU, bX, bU, cost, max_cost) Called after each Armijo evaluation. The semi-trajectory bX,bU was successfully projected into the new trajectory nX,nU and its cost was measured. The search will continue if the cost is greater than the maximum cost. DOptimizerDefaultMonitor Objects ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is the default monitor for :class:`DOptimizer`. It prints out information to the console and records the cost and cost derivative history. .. class:: DOptimizerDefaultMonitor This is the default DOptimizer Monitor. It mainly prints status updates to stdout and records the cost and dcost history so you can create convergence plots. .. attribute:: DOptimizerDefaultMonitor.cost_history DOptimizerDefaultMonitor.dcost_history Dictionaries mapping the iteration number to the cost and cost derivative, respectively. These are reset at the beginning of each optimization.