Statistical Rethinking (2nd Edition) with Tensorflow Probability

Statistical Rethinking (2nd Edition) with Tensorflow Probability

This repository provides jupyter notebooks that port various R code fragments found in the chapters of Statistical Rethinking 2nd Edition by Professor Richard McElreath to python using tensorflow probability framework.

Note - These notebooks are based on the 8th December 2019 draft. I will update the notebooks once the book is released.

Misc Notes

  • Why Tensorflow Probability ? There are many great probabilitic frameworks (PPLs) out there. I especially like Numpyro & PyMC3. There are 2 main reasons why I chose to do this exercise in tfp.

    • First and main reason is to not use the magic of the libraries. Sometimes higher level libraries hide the details which are necessary for one to truly understand the subject. As a matter of fact, working with TFP has resulted in me becoming more appreciable of these high level libraries as indeed they not only provide great helpers but make the code easy to read and reuse.

    • Second is that I have other investments in Tensorflow ecosystem so am not keen on switching to pyTorch even though I really like what Pyro team has done.

    For production use, I strongly recommend that one must use these higher level libraries i.e. Numpyro, PyMC3

  • What worked ? Well of course this book is the best there is in this area. The community is also great. I got quick responses from tensorflow probability team whenever I asked questions on tfp google group.

  • What was hard ? It may be tad bit subjective because I am challenged when it comes to manipulating shapes (high dimensional arrays). I find numpy to be difficult and tensorflow is way more harder when it comes to working with multi-dimensional arrays. This is one of the main problems I have faced and continue to face. Another problem is that the stack trace generated by TFP can be really difficult to understand. This mostly is the side effect of graphs that make debugging difficult. Quite often as long as I used only 1 chain things would work but working with multiple chains require that you pay special attention to the shapes/batches of the various tensors/distributions.

  • Visualization I have made use of arviz and in order to do that I converted the output of various sampling procedures to the format/structure required by it. This made me learn and discover xarray. It was really worth doing it and made it easy to plot the graphs.

Chapters

If you prefer the readonly view of notebooks (html pages) then use this link - https://ksachdeva.github.io/rethinking-tensorflow-probability/

If you want to run the notebooks locally -

# install the requirements
pip install -r requirements.txt
# install jupyter in your virtual environment
pip install -r requirements-extra.txt
# do the dev setup (as some common code resides in rethinking module)
pip install -e .

If you prefer to run the notebooks in binder then click here Binder

Clicking on the links will open the notebooks in Google Colab

Acknowledgements

My immense gratitude goes to Professor Richard McElreath for writing such a wonderful book. His method of teaching has made somewhat difficult subject of Bayesian Statistics approachable, interesting and to some extent fun as well. We need more educators like you Sir !.

Another person I want to thank is Du Phan (https://github.com/fehiepsi). He is the main author of Numpyro, a great framework to do Bayesian Analysis. He has ported Statsical Rethinking (2nd Ed) to Numpyro and his notebooks were not only insipirational but were also of great help to me in creating graphs. I borrowed most of his code fragments when it came to plotting the figures using matplotlib.