Classes | |
class | QValueNetwork |
Functions | |
def | disturb (u, i) |
def | onehot (ix, n=NX) |
def | rendertrial (maxiter=100) |
Variables | |
float | DECAY_RATE = 0.99 |
env = DPendulum() | |
feed_dict | |
list | h_rwd = [] |
float | LEARNING_RATE = 0.1 |
int | NEPISODES = 500 |
int | NSTEPS = 50 |
NU = env.nu | |
NX = env.nx | |
optim | |
Q2 = sess.run(qvalue.qvalue, feed_dict={qvalue.x: onehot(x2)}) | |
Qref = sess.run(qvalue.qvalue, feed_dict={qvalue.x: onehot(x)}) | |
qvalue = QValueNetwork() | |
RANDOM_SEED = int((time.time() % 10) * 1000) | |
reward | |
float | rsum = 0.0 |
sess = tf.InteractiveSession() | |
u = sess.run(qvalue.u, feed_dict={qvalue.x: onehot(x)})[0] | |
x = env.reset() | |
x2 | |
Example of Q-table learning with a simple discretized 1-pendulum environment using a linear Q network.
qnet.qvalue = QValueNetwork() |