This remarkable arXival by Sam Livingstone, Betancourt, Byrne and Girolami takes a look at the conditions under which Hybrid/Hamiltonian Monte Carlo is ergodic wrt a certain distribution , and, in a second step, under which it will converge in total variation at a geometric rate. A nice feature being that they even prove that setting the integration time dynamically can lead to the sampler yielding geometric rate for a larger class of targets. Under idealized conditions for the dynamic integration time and, admittedly, in a certain exponential family.
Whats great about this paper, apart from the results, is how it manages to give the arguments in a rather accessible manner. Considering the depth of the results that is. The ideas are that
- the gradient should at most grow linearly
- for the gradient of to be of any help, it cannot lead further into the tails of the distribution or vanish (both assumption 4.4.1)
- as we approach infinity, every proposed point thats closer to the origin will be accepted (assumption 4.4.2)
If the gradient grows faster than linear, numerics will punch you in the face. That happened to Heiko Strathmann and me just 5 days ago in a GradientIS-type algorithm, where an estimated gradient had length and led to Nirvana. If not the estimation is the problem but the actual gradient, of course you have the same effect. The second assumption is also very intuitive: if the gradient leads into the tails, then it hurts rather than helps convergence. And your posterior is improper. The final assumption basically requires that your chance to improve by proposing a point closer to the origin goes to the further you move away from it.
A helpful guidance they provide is the following. For a class of exponential family distributions given by , whenever the tails of the distribution are no heavier than Laplace () and no thinner than Gaussian (), HMC will be geometrically ergodic. In all other cases, it won’t.
Another thing that caught my eye was their closed form expression for the proposal after leapfrog steps with stepsize , eq. (13):
Which gives me hope that one should be able to compute the proposal probability after any number of leapfrog steps as long as is invertible and thus a Jacobian exists to account for the transformation of hidden in the . A quick check with given by a bayesian posterior shows that this might not be trivial however.