My friend Patrick Jähnichen asked me to read this AISTATS 2014 paper by Ranganath, Gerrish and Blei. For somebody using mainly variational inference, I guess the prospect of not having to derive an individual algorithm for each new model is very appealing (while for poor sampling Ingmar, it would be very appealing to scale like VI does).
However, I found the paper hard to read and unclear in its conclusions. For the Rao-Blackwellization procedure it is unclear what is conditioned upon in the main paper. I guess it’s the variational parameters of all other components. Similarly, I’m not sure where the first equality in equation (13) stems from, which is crucial for the Control Variate ansatz. These things may just be me.
More problematic I think is their evaluation, especially comparing to MH-within-Gibbs. This is a particularly bad sampling scheme, and I don’t even suggest to use HMC instead. One could easily use a simple Adaptive Metropolis algorithm and still be black box. Also, Rao-Blackwellization can be applied in this pure adaptive MCMC setting as well. Another conservative and black-box possibility would be adaptive SMC by Fearnhead. My intuition is that both variants might actually be better than the suggested Black Box VI or at least the difference in the one performance plot of the paper will barely be noticeable.
However, I think this direction of research is a useful one and I hope they can improve upon the current state.