#include <PO_ETUCT.hh>

Inheritance diagram for PO_ETUCT:

Classes
struct	state_info
struct	state_samples
Public Types
typedef const std::vector < float > *	state_t
Public Member Functions
std::vector< float >	discretizeState (const std::vector< float > &s)
virtual int	getBestAction (const std::vector< float > &s)
void	logValues (ofstream *of, int xmin, int xmax, int ymin, int ymax)
virtual void	planOnNewModel ()
	PO_ETUCT (int numactions, float gamma, float rrange, float lambda, int MAX_ITER, float MAX_TIME, int MAX_DEPTH, int modelType, const std::vector< float > &featmax, const std::vector< float > &featmin, const std::vector< int > &statesPerDim, bool trackActual, int history, Random rng=Random())
	PO_ETUCT (const PO_ETUCT &)
virtual void	setFirst ()
virtual void	setModel (MDPModel *model)
virtual void	setSeeding (bool seed)
virtual bool	updateModelWithExperience (const std::vector< float > &last, int act, const std::vector< float > &curr, float reward, bool term)
virtual	~PO_ETUCT ()
Public Attributes
bool	ACTDEBUG
bool	HISTORYDEBUG
MDPModel *	model
bool	MODELDEBUG
bool	PLANNERDEBUG
bool	REALSTATEDEBUG
bool	UCTDEBUG
Protected Member Functions
std::vector< float >	addVec (const std::vector< float > &a, const std::vector< float > &b)
void	calculateReachableStates ()
state_t	canonicalize (const std::vector< float > &s)
void	canonNextStates (StateActionInfo *modelInfo)
void	createPolicy ()
void	deleteInfo (state_info *info)
double	getSeconds ()
void	initNewState (state_t s)
void	initStateInfo (state_t s, state_info *info)
void	printStates ()
void	removeUnreachableStates ()
void	resetUCTCounts ()
virtual void	savePolicy (const char *filename)
int	selectUCTAction (state_info *info)
std::vector< float >	simulateNextState (const std::vector< float > &actualState, state_t discState, state_info info, int action, float reward, bool *term)
std::vector< float >	subVec (const std::vector< float > &a, const std::vector< float > &b)
float	uctSearch (const std::vector< float > &actualS, state_t state, int depth)
void	updateStateActionFromModel (state_t s, int a, state_info *info)
void	updateStateActionHistoryFromModel (const std::vector< float > modState, int a, StateActionInfo *newModel)
Private Attributes
std::vector< float >	featmax
std::vector< float >	featmin
const float	gamma
const int	HISTORY_FL_SIZE
const int	HISTORY_SIZE
const float	lambda
int	lastUpdate
const int	MAX_DEPTH
const int	MAX_ITER
const float	MAX_TIME
const int	modelType
int	nactions
int	nstates
const int	numactions
double	planTime
int	prevact
state_info *	previnfo
state_t	prevstate
const float	rrange
std::deque< float >	saHistory
bool	seedMode
std::map< state_t, state_info >	statedata
std::set< std::vector< float > >	statespace
const std::vector< int > &	statesPerDim
bool	timingType
const bool	trackActual

Detailed Description

This class defines a modified version of UCT, which plans on a model using Monte Carlo rollouts. Unlike the original UCT, it does not separate values by tree depth, and it incorporates eligibility traces. This version plans over states augmented with k-action histories, for delayed or partially observable domains.

Definition at line 24 of file PO_ETUCT.hh.

Member Typedef Documentation

typedef const std::vector<float>* PO_ETUCT::state_t

The implementation maps all sensations to a set of canonical pointers, which serve as the internal representation of environment state.

Definition at line 85 of file PO_ETUCT.hh.

Constructor & Destructor Documentation

PO_ETUCT::PO_ETUCT	(	int	numactions,
		float	gamma,
		float	rrange,
		float	lambda,
		int	MAX_ITER,
		float	MAX_TIME,
		int	MAX_DEPTH,
		int	modelType,
		const std::vector< float > &	featmax,
		const std::vector< float > &	featmin,
		const std::vector< int > &	statesPerDim,
		bool	trackActual,
		int	history,
		Random	rng = `Random()`
	)

Standard constructor

Parameters:

numactions,numactions	in the domain
gamma	discount factor
rrange	range of one-step rewards in the domain
lambda	for use with eligibility traces
MAX_ITER	maximum number of MC rollouts to perform
MAX_TIME	maximum amount of time to run Monte Carlo rollouts
MAX_DEPTH	maximum depth to perform rollout to
modelType	specifies model type
featmax	maximum value of each feature
featmin	minimum value of each feature
statesPerDim	# of values to discretize each feature into
trackActual	track actual real-valued states (or just discrete states)
history	# of previous actions to use for delayed domains
rng	random number generator

Definition at line 16 of file PO_ETUCT.cc.

PO_ETUCT::PO_ETUCT ( const PO_ETUCT & )

Unimplemented copy constructor: internal state cannot be simply copied.

PO_ETUCT::~PO_ETUCT ( ) [virtual]

Definition at line 73 of file PO_ETUCT.cc.

Member Function Documentation

std::vector< float > PO_ETUCT::addVec	(	const std::vector< float > &	a,
		const std::vector< float > &	b
	)		`[protected]`

Add two vectors together.

Definition at line 907 of file PO_ETUCT.cc.

void PO_ETUCT::calculateReachableStates ( ) [protected]

Calculate which states are reachable from states the agent has actually visited.

PO_ETUCT::state_t PO_ETUCT::canonicalize ( const std::vector< float > & s ) [protected]

Produces a canonical representation of the given sensation.

Parameters:

s	The current sensation from the environment.

Returns:: A pointer to an equivalent state in statespace.

Definition at line 465 of file PO_ETUCT.cc.

void PO_ETUCT::canonNextStates ( StateActionInfo * modelInfo ) [protected]

Canonicalize all the next states predicted by this model.

Definition at line 313 of file PO_ETUCT.cc.

void PO_ETUCT::createPolicy ( ) [protected]

Compuate a policy from a model

void PO_ETUCT::deleteInfo ( state_info * info ) [protected]

Delete a state_info struct

Definition at line 551 of file PO_ETUCT.cc.

std::vector< float > PO_ETUCT::discretizeState ( const std::vector< float > & s )

Return a discretized version of the input state.

Definition at line 879 of file PO_ETUCT.cc.

int PO_ETUCT::getBestAction ( const std::vector< float > & s ) [virtual]

Implements Planner.

Definition at line 342 of file PO_ETUCT.cc.

double PO_ETUCT::getSeconds ( ) [protected]

Get the current time in seconds

Definition at line 559 of file PO_ETUCT.cc.

void PO_ETUCT::initNewState ( state_t s ) [protected]

Initialize a new state

Definition at line 107 of file PO_ETUCT.cc.

void PO_ETUCT::initStateInfo	(	state_t	s,
		state_info *	info
	)		`[protected]`

Initialize state info struct

Definition at line 501 of file PO_ETUCT.cc.

void PO_ETUCT::logValues	(	ofstream *	of,
		int	xmin,
		int	xmax,
		int	ymin,
		int	ymax
	)

Output value function to a file

Definition at line 860 of file PO_ETUCT.cc.

void PO_ETUCT::planOnNewModel ( ) [virtual]

Implements Planner.

Definition at line 403 of file PO_ETUCT.cc.

void PO_ETUCT::printStates ( ) [protected]

Print information for each state.

Definition at line 528 of file PO_ETUCT.cc.

void PO_ETUCT::removeUnreachableStates ( ) [protected]

Remove states from set that were deemed unreachable.

void PO_ETUCT::resetUCTCounts ( ) [protected]

Reset UCT visit counts to some baseline level (to decrease our confidence in q-values because model has changed.

Definition at line 435 of file PO_ETUCT.cc.

void PO_ETUCT::savePolicy ( const char * filename ) [protected, virtual]

Reimplemented from Planner.

Definition at line 831 of file PO_ETUCT.cc.

int PO_ETUCT::selectUCTAction ( state_info * info ) [protected]

Select UCT action based on UCB1 algorithm.

Definition at line 692 of file PO_ETUCT.cc.

void PO_ETUCT::setFirst ( ) [virtual]

Reimplemented from Planner.

Definition at line 932 of file PO_ETUCT.cc.

void PO_ETUCT::setModel ( MDPModel * model ) [virtual]

Implements Planner.

Definition at line 95 of file PO_ETUCT.cc.

void PO_ETUCT::setSeeding ( bool seed ) [virtual]

Reimplemented from Planner.

Definition at line 941 of file PO_ETUCT.cc.

std::vector< float > PO_ETUCT::simulateNextState	(	const std::vector< float > &	actualState,
		state_t	discState,
		state_info *	info,
		int	action,
		float *	reward,
		bool *	term
	)		`[protected]`

Return a sampled state from the next state distribution of the model. Simulate the next state from the given state, action, and possibly history of past actions.

Definition at line 734 of file PO_ETUCT.cc.

std::vector< float > PO_ETUCT::subVec	(	const std::vector< float > &	a,
		const std::vector< float > &	b
	)		`[protected]`

Subtract two vectors.

Definition at line 919 of file PO_ETUCT.cc.

float PO_ETUCT::uctSearch	(	const std::vector< float > &	actualS,
		state_t	state,
		int	depth
	)		`[protected]`

Perform UCT/Monte Carlo rollout from the given state. If terminal or at depth, return some reward. Otherwise, select an action based on UCB. Simulate action to get reward and next state. Call search on next state at depth+1 to get reward return from there on. Update q value towards new value: reward + gamma * searchReturn Update visit counts for confidence bounds Return q

From "Bandit Based Monte Carlo Planning" by Kocsis and Szepesv´ari.

Definition at line 568 of file PO_ETUCT.cc.

bool PO_ETUCT::updateModelWithExperience	(	const std::vector< float > &	last,
		int	act,
		const std::vector< float > &	curr,
		float	reward,
		bool	term
	)		`[virtual]`