Classes | |
| class | QValueNetwork |
| — Q-value networks More... | |
Functions | |
| def | disturb (u, i) |
| def | onehot (ix, n=NX) |
| def | rendertrial (maxiter=100) |
Variables | |
| float | DECAY_RATE = 0.99 |
| env = DPendulum() | |
| — Environment More... | |
| feed_dict | |
| list | h_rwd = [] |
| — History of search More... | |
| float | LEARNING_RATE = 0.1 |
| int | NEPISODES = 500 |
| — Hyper paramaters More... | |
| int | NSTEPS = 50 |
| NU = env.nu | |
| NX = env.nx | |
| optim | |
| Q2 = sess.run(qvalue.qvalue,feed_dict={ qvalue.x: onehot(x2) }) | |
| Qref = sess.run(qvalue.qvalue,feed_dict={ qvalue.x: onehot(x ) }) | |
| qvalue = QValueNetwork() | |
| RANDOM_SEED = int((time.time()%10)*1000) | |
| — Random seed More... | |
| reward | |
| float | rsum = 0.0 |
| sess = tf.InteractiveSession() | |
| u = sess.run(qvalue.u,feed_dict={ qvalue.x: onehot(x) })[0] | |
| x = env.reset() | |
| — Training More... | |
| x2 | |
Example of Q-table learning with a simple discretized 1-pendulum environment using a linear Q network.
| qnet.qvalue = QValueNetwork() |