Open In Colab

3. Sampling the Imaginary

# Install packages that are not installed in colab
try:
  import google.colab
  !pip install watermark
except:
  pass  
%load_ext watermark
# Core
import numpy as np
import arviz as az
import tensorflow as tf
import tensorflow_probability as tfp
import scipy.stats as stats

# visualization
import seaborn as sns
import matplotlib.pyplot as plt

# aliases
tfd = tfp.distributions
2022-01-19 18:53:42.527069: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-01-19 18:53:42.527118: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
%watermark -p numpy,tensorflow,tensorflow_probability,arviz,scipy,pandas
numpy                 : 1.21.5
tensorflow            : 2.7.0
tensorflow_probability: 0.15.0
arviz                 : 0.11.4
scipy                 : 1.7.3
pandas                : 1.3.5
# config of various plotting libraries
%config InlineBackend.figure_format = 'retina'

We are interested in a blood test that correctly detects vampirisim 95% of time:

\(Pr(Positive|Vampire)\) = 0.95

The test has a false positive rate of:

\(Pr(Positive|Mortal)\) = 0.01

We also know that vampires are rare–about 0.1% of population:

\(Pr(Vampire)\) = 0.001

To compute \(Pr(Vampire|Positive)\) we will apply the Bayes’ rule:

\(Pr(Vampire|Positive) = \frac{Pr(Positive|Vampire) * Pr(Vampire)}{Pr(Positive)}\)

Code 3.1

Pr_Positive_Vampire = 0.95
Pr_Positive_Mortal = 0.01
Pr_Vampire = 0.001
tmp = Pr_Positive_Vampire * Pr_Vampire
Pr_Positive = tmp + Pr_Positive_Mortal * (1 - Pr_Vampire)
Pr_Vampire_Positive = tmp / Pr_Positive
Pr_Vampire_Positive
0.08683729433272395

3.1 Sampling from a grid-approximate posterior

Code 3.2

p_grid = tf.linspace(start=0.0001, stop=0.99999, num=1000)
prob_p = tf.repeat(1.0, 1000)
prob_data = tf.exp(tfd.Binomial(total_count=9, probs=p_grid).log_prob(6))
joint_prob = prob_data * prob_p
posterior = joint_prob / tf.reduce_sum(joint_prob)
WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.
2022-01-19 18:53:45.155769: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-01-19 18:53:45.155812: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-01-19 18:53:45.155836: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az272-145): /proc/driver/nvidia/version does not exist
2022-01-19 18:53:45.156164: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

Code 3.3

samples = tfd.Categorical(probs=posterior).sample(10_000)
sample_rows = p_grid.numpy()[samples]

Code 3.4

plt.scatter(range(len(sample_rows)), sample_rows, alpha=0.2);
../_images/03_sampling_the_imaginary_16_0.png

Code 3.5

az.plot_density({"": sample_rows}, hdi_prob=1);
../_images/03_sampling_the_imaginary_18_0.png

3.2 Sampling to summarize

3.2.1 Intervals of defined boundaries

Code 3.6

tf.reduce_sum(posterior[p_grid < 0.5])
<tf.Tensor: shape=(), dtype=float32, numpy=0.17194836>

Code 3.7

sum(sample_rows < 0.5) / 10_000
0.1737

Code 3.8

sum((sample_rows > 0.5) & (sample_rows < 0.75)) / 10_000
0.6061

3.2.2 Intervals of defined mass

Code 3.9

np.percentile(sample_rows, 80)
0.7579745888710029

Code 3.10

np.percentile(sample_rows, [10, 90])
array([0.4474982 , 0.80982071])

Code 3.11

p_grid = tf.linspace(start=0.0001, stop=0.99999, num=1000)
prior = tf.repeat(1.0, 1000)
likelihood = tf.exp(tfd.Binomial(total_count=3, probs=p_grid).log_prob(3))
joint_prob = likelihood * prior
posterior = joint_prob / tf.reduce_sum(joint_prob)

samples = tfd.Categorical(probs=posterior).sample(10_000)
sample_rows = p_grid.numpy()[samples]

Code 3.12

np.percentile(sample_rows, q=(25, 75))
array([0.70873076, 0.92992765])

Code 3.13

az.hdi(sample_rows, hdi_prob=0.5)
array([0.8418492, 0.99999  ], dtype=float32)

3.2.3 Point Estimates

Code 3.14

p_grid[posterior == max(posterior)]
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.99999], dtype=float32)>

Code 3.15

stats.mode(sample_rows)[0]
array([0.9839757], dtype=float32)

Code 3.16

np.mean(sample_rows), np.median(sample_rows)
(0.80089337, 0.8418492)

Code 3.17

sum(posterior * abs(0.5 - p_grid))
<tf.Tensor: shape=(), dtype=float32, numpy=0.31286764>

Code 3.18

loss = list(map(lambda d: tf.reduce_sum(posterior * np.abs(d - p_grid)), p_grid))

Code 3.19

p_grid[tf.math.argmin(loss)]
<tf.Tensor: shape=(), dtype=float32, numpy=0.8408483>

3.3 Sampling to simulate prediction

3.3.1 Dummy data

Code 3.20

tf.exp(tfd.Binomial(total_count=2, probs=0.7).log_prob(np.arange(3)))
WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.
<tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.09      , 0.42000002, 0.48999998], dtype=float32)>

Code 3.21

tfd.Binomial(total_count=2, probs=0.7).sample()
<tf.Tensor: shape=(), dtype=float32, numpy=1.0>

Code 3.22

tfd.Binomial(total_count=2, probs=0.7).sample((10,))
<tf.Tensor: shape=(10,), dtype=float32, numpy=array([2., 1., 1., 2., 1., 1., 2., 2., 2., 1.], dtype=float32)>

Code 3.23

dummy_w = tfd.Binomial(total_count=2, probs=0.7).sample((100000,))

np.unique(dummy_w.numpy(), return_counts=True)[1] / 1e5
array([0.08918, 0.41979, 0.49103])

Code 3.24

dummy_w = tfd.Binomial(total_count=9, probs=0.7).sample((100000,))
ax = sns.histplot(dummy_w.numpy())
ax.set_ylabel("Frequency")
ax.set_xlabel("dummy water count");
../_images/03_sampling_the_imaginary_62_0.png

3.3.2 Model Checking

Code 3.25

w = tfd.Binomial(total_count=9, probs=0.6).sample((int(1e4),))

Code 3.26

w = tfd.Binomial(total_count=9, probs=sample_rows).sample()