Classes | |
class | QValueNetwork |
— Q-value networks More... | |
Functions | |
def | disturb (u, i) |
def | onehot (ix, n=NX) |
def | rendertrial (maxiter=100) |
Variables | |
float | DECAY_RATE = 0.99 |
env = DPendulum() | |
— Environment More... | |
feed_dict | |
list | h_rwd = [] |
— History of search More... | |
float | LEARNING_RATE = 0.1 |
int | NEPISODES = 500 |
— Hyper paramaters More... | |
int | NSTEPS = 50 |
NU = env.nu | |
NX = env.nx | |
optim | |
Q2 = sess.run(qvalue.qvalue,feed_dict={ qvalue.x: onehot(x2) }) | |
Qref = sess.run(qvalue.qvalue,feed_dict={ qvalue.x: onehot(x ) }) | |
qvalue = QValueNetwork() | |
RANDOM_SEED = int((time.time()%10)*1000) | |
— Random seed More... | |
reward | |
float | rsum = 0.0 |
sess = tf.InteractiveSession() | |
u = sess.run(qvalue.u,feed_dict={ qvalue.x: onehot(x) })[0] | |
x = env.reset() | |
— Training More... | |
x2 | |
Example of Q-table learning with a simple discretized 1-pendulum environment using a linear Q network.
qnet.qvalue = QValueNetwork() |