While doing my TensorFlow tutorials this evening I ran across a possible solution to add randomness to the decision-making of our Project Pendulum AI bot. TensorFlow includes a stochastic optimization training technique called the Adam Optimizer algorithm.
See the published paper on this method here.