3. Sampling the Imaginary
Contents
3. Sampling the Imaginary¶
# Install packages that are not installed in colab
try:
import google.colab
!pip install watermark
except:
pass
%load_ext watermark
# Core
import numpy as np
import arviz as az
import tensorflow as tf
import tensorflow_probability as tfp
import scipy.stats as stats
# visualization
import seaborn as sns
import matplotlib.pyplot as plt
# aliases
tfd = tfp.distributions
2022-01-19 18:53:42.527069: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-01-19 18:53:42.527118: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
%watermark -p numpy,tensorflow,tensorflow_probability,arviz,scipy,pandas
numpy : 1.21.5
tensorflow : 2.7.0
tensorflow_probability: 0.15.0
arviz : 0.11.4
scipy : 1.7.3
pandas : 1.3.5
# config of various plotting libraries
%config InlineBackend.figure_format = 'retina'
We are interested in a blood test that correctly detects vampirisim 95% of time:
\(Pr(Positive|Vampire)\) = 0.95
The test has a false positive rate of:
\(Pr(Positive|Mortal)\) = 0.01
We also know that vampires are rare–about 0.1% of population:
\(Pr(Vampire)\) = 0.001
To compute \(Pr(Vampire|Positive)\) we will apply the Bayes’ rule:
\(Pr(Vampire|Positive) = \frac{Pr(Positive|Vampire) * Pr(Vampire)}{Pr(Positive)}\)
Code 3.1¶
Pr_Positive_Vampire = 0.95
Pr_Positive_Mortal = 0.01
Pr_Vampire = 0.001
tmp = Pr_Positive_Vampire * Pr_Vampire
Pr_Positive = tmp + Pr_Positive_Mortal * (1 - Pr_Vampire)
Pr_Vampire_Positive = tmp / Pr_Positive
Pr_Vampire_Positive
0.08683729433272395
3.1 Sampling from a grid-approximate posterior¶
Code 3.2¶
p_grid = tf.linspace(start=0.0001, stop=0.99999, num=1000)
prob_p = tf.repeat(1.0, 1000)
prob_data = tf.exp(tfd.Binomial(total_count=9, probs=p_grid).log_prob(6))
joint_prob = prob_data * prob_p
posterior = joint_prob / tf.reduce_sum(joint_prob)
WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.
2022-01-19 18:53:45.155769: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-01-19 18:53:45.155812: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-01-19 18:53:45.155836: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az272-145): /proc/driver/nvidia/version does not exist
2022-01-19 18:53:45.156164: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Code 3.3¶
samples = tfd.Categorical(probs=posterior).sample(10_000)
sample_rows = p_grid.numpy()[samples]
Code 3.4¶
plt.scatter(range(len(sample_rows)), sample_rows, alpha=0.2);
Code 3.5¶
az.plot_density({"": sample_rows}, hdi_prob=1);
3.2 Sampling to summarize¶
3.2.1 Intervals of defined boundaries¶
Code 3.6¶
tf.reduce_sum(posterior[p_grid < 0.5])
<tf.Tensor: shape=(), dtype=float32, numpy=0.17194836>
Code 3.7¶
sum(sample_rows < 0.5) / 10_000
0.1737
Code 3.8¶
sum((sample_rows > 0.5) & (sample_rows < 0.75)) / 10_000
0.6061
3.2.2 Intervals of defined mass¶
Code 3.9¶
np.percentile(sample_rows, 80)
0.7579745888710029
Code 3.10¶
np.percentile(sample_rows, [10, 90])
array([0.4474982 , 0.80982071])
Code 3.11¶
p_grid = tf.linspace(start=0.0001, stop=0.99999, num=1000)
prior = tf.repeat(1.0, 1000)
likelihood = tf.exp(tfd.Binomial(total_count=3, probs=p_grid).log_prob(3))
joint_prob = likelihood * prior
posterior = joint_prob / tf.reduce_sum(joint_prob)
samples = tfd.Categorical(probs=posterior).sample(10_000)
sample_rows = p_grid.numpy()[samples]
Code 3.12¶
np.percentile(sample_rows, q=(25, 75))
array([0.70873076, 0.92992765])
Code 3.13¶
az.hdi(sample_rows, hdi_prob=0.5)
array([0.8418492, 0.99999 ], dtype=float32)
3.2.3 Point Estimates¶
Code 3.14¶
p_grid[posterior == max(posterior)]
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.99999], dtype=float32)>
Code 3.15¶
stats.mode(sample_rows)[0]
array([0.9839757], dtype=float32)
Code 3.16¶
np.mean(sample_rows), np.median(sample_rows)
(0.80089337, 0.8418492)
Code 3.17¶
sum(posterior * abs(0.5 - p_grid))
<tf.Tensor: shape=(), dtype=float32, numpy=0.31286764>
Code 3.18¶
loss = list(map(lambda d: tf.reduce_sum(posterior * np.abs(d - p_grid)), p_grid))
Code 3.19¶
p_grid[tf.math.argmin(loss)]
<tf.Tensor: shape=(), dtype=float32, numpy=0.8408483>
3.3 Sampling to simulate prediction¶
3.3.1 Dummy data¶
Code 3.20¶
tf.exp(tfd.Binomial(total_count=2, probs=0.7).log_prob(np.arange(3)))
WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.
<tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.09 , 0.42000002, 0.48999998], dtype=float32)>
Code 3.21¶
tfd.Binomial(total_count=2, probs=0.7).sample()
<tf.Tensor: shape=(), dtype=float32, numpy=1.0>
Code 3.22¶
tfd.Binomial(total_count=2, probs=0.7).sample((10,))
<tf.Tensor: shape=(10,), dtype=float32, numpy=array([2., 1., 1., 2., 1., 1., 2., 2., 2., 1.], dtype=float32)>
Code 3.23¶
dummy_w = tfd.Binomial(total_count=2, probs=0.7).sample((100000,))
np.unique(dummy_w.numpy(), return_counts=True)[1] / 1e5
array([0.08918, 0.41979, 0.49103])
Code 3.24¶
dummy_w = tfd.Binomial(total_count=9, probs=0.7).sample((100000,))
ax = sns.histplot(dummy_w.numpy())
ax.set_ylabel("Frequency")
ax.set_xlabel("dummy water count");