Classes | Public Types | Public Member Functions | Public Attributes | Protected Member Functions | Private Attributes
ETUCT Class Reference

#include <ETUCT.hh>

Inheritance diagram for ETUCT:
Inheritance graph
[legend]

List of all members.

Classes

struct  state_info
struct  state_samples

Public Types

typedef const std::vector
< float > * 
state_t

Public Member Functions

std::vector< float > discretizeState (const std::vector< float > &s)
 ETUCT (int numactions, float gamma, float rrange, float lambda, int MAX_ITER, float MAX_TIME, int MAX_DEPTH, int modelType, const std::vector< float > &featmax, const std::vector< float > &featmin, const std::vector< int > &statesPerDim, bool trackActual, int history, Random rng=Random())
 ETUCT (const ETUCT &)
void fillInState (std::vector< float >s, int depth)
virtual int getBestAction (const std::vector< float > &s)
void initStates ()
void logValues (ofstream *of, int xmin, int xmax, int ymin, int ymax)
virtual void planOnNewModel ()
virtual void setFirst ()
virtual void setModel (MDPModel *model)
virtual void setSeeding (bool seed)
virtual bool updateModelWithExperience (const std::vector< float > &last, int act, const std::vector< float > &curr, float reward, bool term)
virtual ~ETUCT ()

Public Attributes

bool ACTDEBUG
bool HISTORYDEBUG
MDPModelmodel
bool MODELDEBUG
bool PLANNERDEBUG
bool REALSTATEDEBUG
bool UCTDEBUG

Protected Member Functions

std::vector< float > addVec (const std::vector< float > &a, const std::vector< float > &b)
void calculateReachableStates ()
state_t canonicalize (const std::vector< float > &s)
void canonNextStates (StateActionInfo *modelInfo)
void createPolicy ()
void deleteInfo (state_info *info)
double getSeconds ()
void initNewState (state_t s)
void initStateInfo (state_t s, state_info *info)
void printStates ()
void removeUnreachableStates ()
void resetUCTCounts ()
virtual void savePolicy (const char *filename)
int selectUCTAction (state_info *info)
std::vector< float > simulateNextState (const std::vector< float > &actualState, state_t discState, state_info *info, const std::deque< float > &searchHistory, int action, float *reward, bool *term)
std::vector< float > subVec (const std::vector< float > &a, const std::vector< float > &b)
float uctSearch (const std::vector< float > &actualS, state_t state, int depth, std::deque< float > history)
void updateStateActionFromModel (state_t s, int a, state_info *info)
void updateStateActionHistoryFromModel (const std::vector< float > modState, int a, StateActionInfo *newModel)

Private Attributes

std::vector< float > featmax
std::vector< float > featmin
const float gamma
const int HISTORY_FL_SIZE
const int HISTORY_SIZE
const float lambda
int lastUpdate
const int MAX_DEPTH
const int MAX_ITER
const float MAX_TIME
const int modelType
int nactions
int nstates
const int numactions
double planTime
int prevact
state_infoprevinfo
state_t prevstate
const float rrange
std::deque< float > saHistory
bool seedMode
std::map< state_t, state_infostatedata
std::set< std::vector< float > > statespace
const std::vector< int > & statesPerDim
bool timingType
const bool trackActual

Detailed Description

This class defines a modified version of UCT, which plans on a model using Monte Carlo rollouts. Unlike the original UCT, it does not separate values by tree depth, and it incorporates eligibility traces.

Definition at line 25 of file ETUCT.hh.


Member Typedef Documentation

typedef const std::vector<float>* ETUCT::state_t

The implementation maps all sensations to a set of canonical pointers, which serve as the internal representation of environment state.

Definition at line 93 of file ETUCT.hh.


Constructor & Destructor Documentation

ETUCT::ETUCT ( int  numactions,
float  gamma,
float  rrange,
float  lambda,
int  MAX_ITER,
float  MAX_TIME,
int  MAX_DEPTH,
int  modelType,
const std::vector< float > &  featmax,
const std::vector< float > &  featmin,
const std::vector< int > &  statesPerDim,
bool  trackActual,
int  history,
Random  rng = Random() 
)

Standard constructor

Parameters:
numactions,numactionsin the domain
gammadiscount factor
rrangerange of one-step rewards in the domain
lambdafor use with eligibility traces
MAX_ITERmaximum number of MC rollouts to perform
MAX_TIMEmaximum amount of time to run Monte Carlo rollouts
MAX_DEPTHmaximum depth to perform rollout to
modelTypespecifies model type
featmaxmaximum value of each feature
featminminimum value of each feature
statesPerDim# of values to discretize each feature into
trackActualtrack actual real-valued states (or just discrete states)
history# of previous actions to use for delayed domains
rngrandom number generator

Definition at line 15 of file ETUCT.cc.

ETUCT::ETUCT ( const ETUCT )

Unimplemented copy constructor: internal state cannot be simply copied.

ETUCT::~ETUCT ( ) [virtual]

Definition at line 75 of file ETUCT.cc.


Member Function Documentation

std::vector< float > ETUCT::addVec ( const std::vector< float > &  a,
const std::vector< float > &  b 
) [protected]

Add two vectors together.

Definition at line 889 of file ETUCT.cc.

void ETUCT::calculateReachableStates ( ) [protected]

Calculate which states are reachable from states the agent has actually visited.

ETUCT::state_t ETUCT::canonicalize ( const std::vector< float > &  s) [protected]

Produces a canonical representation of the given sensation.

Parameters:
sThe current sensation from the environment.
Returns:
A pointer to an equivalent state in statespace.

Definition at line 443 of file ETUCT.cc.

void ETUCT::canonNextStates ( StateActionInfo modelInfo) [protected]

Canonicalize all the next states predicted by this model.

Definition at line 296 of file ETUCT.cc.

void ETUCT::createPolicy ( ) [protected]

Compuate a policy from a model

void ETUCT::deleteInfo ( state_info info) [protected]

Delete a state_info struct

Definition at line 529 of file ETUCT.cc.

std::vector< float > ETUCT::discretizeState ( const std::vector< float > &  s)

Return a discretized version of the input state.

Definition at line 864 of file ETUCT.cc.

void ETUCT::fillInState ( std::vector< float >  s,
int  depth 
)

Fill in a state based on featmin and featmax

Definition at line 111 of file ETUCT.cc.

int ETUCT::getBestAction ( const std::vector< float > &  s) [virtual]

Implements Planner.

Definition at line 325 of file ETUCT.cc.

double ETUCT::getSeconds ( ) [protected]

Get the current time in seconds

Definition at line 537 of file ETUCT.cc.

void ETUCT::initNewState ( state_t  s) [protected]

Initialize a new state

Definition at line 132 of file ETUCT.cc.

void ETUCT::initStateInfo ( state_t  s,
state_info info 
) [protected]

Initialize state info struct

Definition at line 479 of file ETUCT.cc.

Initialize the states for this domain (based on featmin and featmax)

Definition at line 104 of file ETUCT.cc.

void ETUCT::logValues ( ofstream *  of,
int  xmin,
int  xmax,
int  ymin,
int  ymax 
)

Output value function to a file

Definition at line 845 of file ETUCT.cc.

void ETUCT::planOnNewModel ( ) [virtual]

Implements Planner.

Definition at line 381 of file ETUCT.cc.

void ETUCT::printStates ( ) [protected]

Print information for each state.

Definition at line 506 of file ETUCT.cc.

void ETUCT::removeUnreachableStates ( ) [protected]

Remove states from set that were deemed unreachable.

void ETUCT::resetUCTCounts ( ) [protected]

Reset UCT visit counts to some baseline level (to decrease our confidence in q-values because model has changed.

Definition at line 413 of file ETUCT.cc.

void ETUCT::savePolicy ( const char *  filename) [protected, virtual]

Reimplemented from Planner.

Definition at line 816 of file ETUCT.cc.

int ETUCT::selectUCTAction ( state_info info) [protected]

Select UCT action based on UCB1 algorithm.

Definition at line 695 of file ETUCT.cc.

void ETUCT::setFirst ( ) [virtual]

Reimplemented from Planner.

Definition at line 914 of file ETUCT.cc.

void ETUCT::setModel ( MDPModel model) [virtual]

Implements Planner.

Definition at line 97 of file ETUCT.cc.

void ETUCT::setSeeding ( bool  seed) [virtual]

Reimplemented from Planner.

Definition at line 923 of file ETUCT.cc.

std::vector< float > ETUCT::simulateNextState ( const std::vector< float > &  actualState,
state_t  discState,
state_info info,
const std::deque< float > &  searchHistory,
int  action,
float *  reward,
bool term 
) [protected]

Return a sampled state from the next state distribution of the model. Simulate the next state from the given state, action, and possibly history of past actions.

Definition at line 737 of file ETUCT.cc.

std::vector< float > ETUCT::subVec ( const std::vector< float > &  a,
const std::vector< float > &  b 
) [protected]

Subtract two vectors.

Definition at line 901 of file ETUCT.cc.

float ETUCT::uctSearch ( const std::vector< float > &  actualS,
state_t  state,
int  depth,
std::deque< float >  history 
) [protected]

Perform UCT/Monte Carlo rollout from the given state. If terminal or at depth, return some reward. Otherwise, select an action based on UCB. Simulate action to get reward and next state. Call search on next state at depth+1 to get reward return from there on. Update q value towards new value: reward + gamma * searchReturn Update visit counts for confidence bounds Return q

From "Bandit Based Monte Carlo Planning" by Kocsis and Szepesv´ari.

Definition at line 545 of file ETUCT.cc.

bool ETUCT::updateModelWithExperience ( const std::vector< float > &  last,
int  act,
const std::vector< float > &  curr,
float  reward,
bool  term 
) [virtual]

Implements Planner.

Definition at line 145 of file ETUCT.cc.

void ETUCT::updateStateActionFromModel ( state_t  s,
int  a,
state_info info 
) [protected]

Update the state_info copy of the model for the given state-action from the MDPModel

Definition at line 251 of file ETUCT.cc.

void ETUCT::updateStateActionHistoryFromModel ( const std::vector< float >  modState,
int  a,
StateActionInfo newModel 
) [protected]

Update the state_info copy of the model for the given state-action and k-action history from the MDPModel.

Definition at line 283 of file ETUCT.cc.


Member Data Documentation

Definition at line 82 of file ETUCT.hh.

std::vector<float> ETUCT::featmax [private]

Definition at line 212 of file ETUCT.hh.

std::vector<float> ETUCT::featmin [private]

Definition at line 213 of file ETUCT.hh.

const float ETUCT::gamma [private]

Definition at line 229 of file ETUCT.hh.

const int ETUCT::HISTORY_FL_SIZE [private]

Definition at line 240 of file ETUCT.hh.

const int ETUCT::HISTORY_SIZE [private]

Definition at line 239 of file ETUCT.hh.

Definition at line 85 of file ETUCT.hh.

const float ETUCT::lambda [private]

Definition at line 231 of file ETUCT.hh.

int ETUCT::lastUpdate [private]

Definition at line 225 of file ETUCT.hh.

const int ETUCT::MAX_DEPTH [private]

Definition at line 235 of file ETUCT.hh.

const int ETUCT::MAX_ITER [private]

Definition at line 233 of file ETUCT.hh.

const float ETUCT::MAX_TIME [private]

Definition at line 234 of file ETUCT.hh.

MDPModel that we're using with planning

Definition at line 88 of file ETUCT.hh.

Definition at line 81 of file ETUCT.hh.

const int ETUCT::modelType [private]

Definition at line 236 of file ETUCT.hh.

int ETUCT::nactions [private]

Definition at line 224 of file ETUCT.hh.

int ETUCT::nstates [private]

Definition at line 223 of file ETUCT.hh.

const int ETUCT::numactions [private]

Definition at line 228 of file ETUCT.hh.

Definition at line 80 of file ETUCT.hh.

double ETUCT::planTime [private]

Definition at line 219 of file ETUCT.hh.

int ETUCT::prevact [private]

Definition at line 216 of file ETUCT.hh.

Definition at line 217 of file ETUCT.hh.

Definition at line 215 of file ETUCT.hh.

Definition at line 84 of file ETUCT.hh.

const float ETUCT::rrange [private]

Definition at line 230 of file ETUCT.hh.

std::deque<float> ETUCT::saHistory [private]

Current history of previous actions.

Definition at line 210 of file ETUCT.hh.

bool ETUCT::seedMode [private]

Definition at line 221 of file ETUCT.hh.

std::map<state_t, state_info> ETUCT::statedata [private]

Hashmap mapping state vectors to their state_info structs.

Definition at line 207 of file ETUCT.hh.

std::set<std::vector<float> > ETUCT::statespace [private]

Set of all distinct sensations seen. Pointers to elements of this set serve as the internal representation of the environment state.

Definition at line 204 of file ETUCT.hh.

const std::vector<int>& ETUCT::statesPerDim [private]

Definition at line 237 of file ETUCT.hh.

Definition at line 226 of file ETUCT.hh.

const bool ETUCT::trackActual [private]

Definition at line 238 of file ETUCT.hh.

Definition at line 83 of file ETUCT.hh.


The documentation for this class was generated from the following files:


rl_agent
Author(s): Todd Hester
autogenerated on Thu Jun 6 2019 22:00:14