11. God Spiked the Integers¶

# Install packages that are not installed in colab
try:
  import google.colab
  %pip install -q watermark
  %pip install git+https://github.com/ksachdeva/rethinking-tensorflow-probability.git
except:
  pass

%load_ext watermark

# Core
import numpy as np
import arviz as az
import pandas as pd
import xarray as xr
import tensorflow as tf
import tensorflow_probability as tfp

# visualization
import matplotlib.pyplot as plt

from rethinking.data import RethinkingDataset
from rethinking.data import dataframe_to_tensors
from rethinking.mcmc import sample_posterior

# aliases
tfd = tfp.distributions
tfb = tfp.bijectors
Root = tfd.JointDistributionCoroutine.Root

2022-01-19 19:12:08.073507: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-01-19 19:12:08.073545: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

%watermark -p numpy,tensorflow,tensorflow_probability,arviz,scipy,pandas,rethinking

numpy                 : 1.21.5
tensorflow            : 2.7.0
tensorflow_probability: 0.15.0
arviz                 : 0.11.4
scipy                 : 1.7.3
pandas                : 1.3.5
rethinking            : 0.1.0

# config of various plotting libraries
%config InlineBackend.figure_format = 'retina'

11.1 Binomial regression¶

11.1.1 Logistic regression: Prosocial chimpanzees¶

Code 11.1¶

Authors start the chapter by highlighting one key difference between Generalized Linear Models and Gaussian Linear Models. He says, that unlike Gaussian Linear Models where we had some interpretation of parameters, in the case of GLM it is the combination of parameters that matter.

Most common & useful GLM models are about counts but they are also difficult to model. What is the reason for that ?

He says - “When we wish to predict the outcome in the form of counts, the scale of parameters is never the same as the scale of outcome. This is another way of saying that it is the combination of parameters that help in prediction and hence some where we lose human intution that could have connected the parmaeters with the outcome.

Binomial Regression -: model the outcome (count) when both of the categories are known.

Poisson Regression -: model the outcome (count) when the maximum count is unknown.

d = RethinkingDataset.Chimpanzees.get_dataset()

Code 11.2¶

We aim to build index variables instead of using dummy variables. Below is a quick way to do it. Treatment here is about the 4 possible situations.

prosoc_left= 0 and condition= 0: Two food items on right and no partner.
prosoc_left= 1 and condition= 0: Two food items on left and no partner.
prosoc_left= 0 and condition= 1: Two food items on right and partner present.
prosoc_left= 1 and condition= 1: Two food items on left and partner present.

# In the book there is an additional 1 in the code snippet.
# That is because of R as in R index does not starts with 0

d["treatment"] = d.prosoc_left + 2 * d.condition

Code 11.3¶

d.reset_index().groupby(["condition", "prosoc_left", "treatment"]).count()["index"]

condition  prosoc_left  treatment
0          0            0            126
           1            1            126
1          0            2            126
           1            3            126
Name: index, dtype: int64

d.describe()

	actor	recipient	condition	block	trial	prosoc_left	chose_prosoc	pulled_left	treatment
count	504.000000	252.00000	504.000000	504.000000	504.000000	504.000000	504.00000	504.000000	504.000000
mean	4.000000	5.00000	0.500000	3.500000	36.500000	0.500000	0.56746	0.579365	1.500000
std	2.001987	2.00398	0.500497	1.709522	20.803253	0.500497	0.49592	0.494151	1.119145
min	1.000000	2.00000	0.000000	1.000000	1.000000	0.000000	0.00000	0.000000	0.000000
25%	2.000000	3.00000	0.000000	2.000000	18.750000	0.000000	0.00000	0.000000	0.750000
50%	4.000000	5.00000	0.500000	3.500000	36.500000	0.500000	1.00000	1.000000	1.500000
75%	6.000000	7.00000	1.000000	5.000000	54.250000	1.000000	1.00000	1.000000	2.250000
max	7.000000	8.00000	1.000000	6.000000	72.000000	1.000000	1.00000	1.000000	3.000000

Code 11.4¶

def model_11_1(alpha_scale):
    def _generator():
        alpha = yield Root(
            tfd.Sample(tfd.Normal(loc=0.0, scale=alpha_scale), sample_shape=1)
        )
        logit = alpha
        pulled_left = yield tfd.Independent(
            tfd.Binomial(total_count=1.0, logits=logit), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=True)


jdc_11_1 = model_11_1(alpha_scale=10.0)

Code 11.5¶

prior = jdc_11_1.sample(1000)

2022-01-19 19:12:11.023338: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2022-01-19 19:12:11.023377: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-01-19 19:12:11.023402: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (fv-az272-145): /proc/driver/nvidia/version does not exist
2022-01-19 19:12:11.023622: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

Code 11.6¶

p = tf.math.sigmoid(prior[0])
az.plot_kde(p.numpy())
plt.show()

../_images/11_god_spiked_the_integers_21_0.png

Most of the probability mass in the above plot is piled up near zero or one. A flat prior in the logit space is not a flat prior in the outcome probability space.

jdc_11_1b = model_11_1(alpha_scale=1.5)
prior_with_1_5 = jdc_11_1b.sample(1000)

p = tf.math.sigmoid(prior_with_1_5[0])

az.plot_kde(p.numpy())
plt.show()

../_images/11_god_spiked_the_integers_23_0.png

A more concentrated Normal(0,1.5) prior produces something more reasonable

# simply showing both of the above plots
# at the same time to reproduce figure 11.3

p1 = tf.math.sigmoid(prior[0])
p2 = tf.math.sigmoid(prior_with_1_5[0])
az.plot_kde(p1.numpy())
az.plot_kde(p2.numpy())
plt.show()

../_images/11_god_spiked_the_integers_25_0.png

Code 11.7¶

tdf = dataframe_to_tensors("Chimpanzee", d, {"treatment": tf.int32})

def model_11_2(treatments):
    def _generator():
        alpha = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.5), sample_shape=1))
        beta_treatment = yield Root(
            tfd.Sample(tfd.Normal(loc=0.0, scale=10.0), sample_shape=4)
        )

        logit = alpha + tf.squeeze(tf.gather(beta_treatment, treatments, axis=-1))

        pulled_left = yield tfd.Independent(
            tfd.Binomial(total_count=1.0, logits=logit), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=False)


jdc_11_2 = model_11_2(tdf.treatment)

prior_a, prior_b, _ = jdc_11_2.sample(1000)

prior_a = prior_a.numpy()
prior_b = prior_b.numpy()

def compute_p_for_given_treatment(t):
    p = prior_a[:, 0] + prior_b[:, t]
    return tf.math.sigmoid(p)


p = np.array(list(map(compute_p_for_given_treatment, np.arange(4)))).T

p.shape

(1000, 4)

Code 11.8¶

az.plot_kde(np.abs(p[:, 0] - p[:, 1]))
plt.show()

../_images/11_god_spiked_the_integers_33_0.png

Code 11.9¶

Sample model as 11_2 but with a different scale for beta_treatment

def model_11_3(treatments):
    def _generator():
        alpha = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.5), sample_shape=1))
        beta_treatment = yield Root(
            tfd.Sample(tfd.Normal(loc=0.0, scale=0.5), sample_shape=4)
        )

        logit = alpha + tf.squeeze(tf.gather(beta_treatment, treatments, axis=-1))

        pulled_left = yield tfd.Independent(
            tfd.Binomial(total_count=1.0, logits=logit), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=False)


jdc_11_3 = model_11_3(tdf.treatment)


prior_a, prior_b, _ = jdc_11_3.sample(1000)

prior_a = prior_a.numpy()
prior_b = prior_b.numpy()

p = np.array(list(map(compute_p_for_given_treatment, np.arange(4)))).T

np.mean(np.abs(p[:, 0] - p[:, 1]))

0.096076064

Code 11.10¶

	actor	recipient	condition	block	trial	prosoc_left	chose_prosoc	pulled_left	treatment
0	1	NaN	0	1	2	0	1	0	0
1	1	NaN	0	1	4	0	0	1	0
2	1	NaN	0	1	6	1	0	0	1
3	1	NaN	0	1	8	0	1	0	0
4	1	NaN	0	1	10	1	1	1	1
...	...	...	...	...	...	...	...	...	...
499	7	4.0	1	6	64	1	1	1	3
500	7	6.0	1	6	66	1	1	1	3
501	7	3.0	1	6	68	0	0	1	2
502	7	7.0	1	6	70	0	0	1	2
503	7	2.0	1	6	72	0	0	1	2

504 rows × 9 columns

d["actor_id"] = d.actor.values - 1

tdf = dataframe_to_tensors(
    "Chimpanzee",
    d,
    {"pulled_left": tf.float32, "treatment": tf.int32, "actor_id": tf.int32},
)

Code 11.11¶

def model_11_4(actors, treatments):
    def _generator():
        alpha = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.5), sample_shape=7))
        beta_treatment = yield Root(
            tfd.Sample(tfd.Normal(loc=0.0, scale=0.5), sample_shape=4)
        )

        term1 = tf.gather(alpha, actors, axis=-1)
        term2 = tf.gather(beta_treatment, treatments, axis=-1)

        logit = term1 + term2

        pulled_left = yield tfd.Independent(
            tfd.Binomial(total_count=1.0, logits=logit), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=False)


jdc_11_4 = model_11_4(tdf.actor_id, tdf.treatment)
jdc_11_4.sample()

StructTuple(
  var0=<tf.Tensor: shape=(7,), dtype=float32, numpy=
    array([ 0.72694427,  0.97887087, -0.77214324, -2.3240948 ,  0.29000178,
            0.5599559 ,  0.01908719], dtype=float32)>,
  var1=<tf.Tensor: shape=(4,), dtype=float32, numpy=array([ 0.16739044,  0.06596012, -0.5800672 , -0.34271017], dtype=float32)>,
  var2=<tf.Tensor: shape=(504,), dtype=float32, numpy=
    array([1., 0., 1., 1., 0., 0., 1., 1., 1., 0., 1., 0., 0., 1., 1., 1., 0.,
           0., 1., 1., 0., 1., 0., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1.,
           0., 1., 1., 1., 1., 1., 1., 1., 0., 1., 0., 1., 1., 0., 0., 1., 1.,
           1., 0., 1., 0., 1., 0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1.,
           1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 0.,
           1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 1., 1., 0., 0., 1.,
           1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1., 0., 1.,
           0., 1., 1., 1., 1., 0., 1., 1., 0., 0., 1., 1., 1., 1., 1., 0., 0.,
           0., 1., 0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 1., 0., 0.,
           0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 1.,
           0., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0.,
           0., 1., 0., 0., 1., 1., 1., 0., 0., 0., 0., 1., 0., 1., 0., 1., 0.,
           0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1.,
           0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
           0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,
           0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0.,
           0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
           1., 1., 1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 1., 1., 0., 0., 0.,
           0., 1., 1., 1., 1., 0., 0., 0., 0., 1., 0., 0., 1., 0., 1., 1., 1.,
           1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 0., 0.,
           1., 1., 1., 0., 1., 0., 1., 0., 1., 0., 1., 1., 0., 1., 1., 0., 1.,
           1., 0., 0., 0., 1., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0.,
           1., 1., 1., 1., 0., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1., 1.,
           1., 1., 0., 1., 1., 0., 0., 0., 1., 0., 0., 1., 0., 1., 0., 1., 0.,
           0., 1., 0., 1., 0., 0., 0., 0., 1., 0., 0., 1., 0., 1., 1., 0., 0.,
           1., 0., 1., 1., 0., 0., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1.,
           0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1.,
           1., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
           1., 0., 0., 1., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0., 1., 1., 1.,
           1., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0.], dtype=float32)>
)

NUMBER_OF_CHAINS = 2

init_state = [tf.zeros([NUMBER_OF_CHAINS, 7]), tf.zeros([NUMBER_OF_CHAINS, 4])]

bijectors = [tfb.Identity(), tfb.Identity()]

observed_data = (tdf.pulled_left,)

posterior_11_4, trace_11_4 = sample_posterior(
    jdc_11_4,
    observed_data=observed_data,
    init_state=init_state,
    bijectors=bijectors,
    params=["alpha", "beta_treatment"],
)


az.summary(trace_11_4, hdi_prob=0.89)

WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.

	mean	sd	hdi_5.5%	hdi_94.5%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[0]	-0.420	0.332	-0.899	0.145	0.015	0.011	481.0	468.0	1.01
alpha[1]	3.872	0.720	2.693	4.921	0.041	0.030	320.0	334.0	1.00
alpha[2]	-0.726	0.338	-1.228	-0.186	0.015	0.010	533.0	406.0	1.02
alpha[3]	-0.731	0.331	-1.253	-0.214	0.014	0.010	543.0	545.0	1.01
alpha[4]	-0.431	0.332	-0.886	0.208	0.014	0.010	537.0	523.0	1.00
alpha[5]	0.502	0.337	-0.065	1.002	0.015	0.011	485.0	524.0	1.00
alpha[6]	1.972	0.431	1.292	2.636	0.018	0.014	649.0	201.0	1.02
beta_treatment[0]	-0.060	0.275	-0.485	0.389	0.020	0.014	197.0	372.0	1.03
beta_treatment[1]	0.455	0.290	-0.062	0.843	0.018	0.013	270.0	362.0	1.02
beta_treatment[2]	-0.370	0.282	-0.886	0.007	0.016	0.012	297.0	417.0	1.01
beta_treatment[3]	0.353	0.293	-0.158	0.778	0.018	0.013	278.0	481.0	1.01

Clearly interpretation of all these varying parameters is difficult now. The alphas above represent intercepts unique to each chimpanzee. Each of these expresses the tendency of each individual to pull the left lever.

Code 11.12¶

post = trace_11_4.posterior
p_left = tf.math.sigmoid(post["alpha"].values)
az.plot_forest({"p_left": p_left}, combined=True, hdi_prob=0.89)
plt.gca().set(xlim=(-0.01, 1.01))
plt.show()

../_images/11_god_spiked_the_integers_45_0.png

What above plot shows is that 0,2,3,4 show preference for right lever. Two chimps (1 & 6) show strong preference for left lever and it is also reflected in the data

Code 11.13¶

Here we are now considering treatment effects and hoping that they are estimated more precisely

labs = ["R/N", "L/N", "R/P", "L/P"]
az.plot_forest(trace_11_4, combined=True, var_names="beta_treatment", hdi_prob=0.89)
plt.gca().set_yticklabels(labs[::-1])
plt.show()

../_images/11_god_spiked_the_integers_48_0.png

L/N => prosocial on left / no partner
R/P => proscoial on right / partner

What we are looking for is evidence that the chimpanzees choose the prosocial option more when a partner is present. This implies comparing the first row with the third row and the second row with the fourth row

Code 11.14¶

diffs = {
    "db13": post["beta_treatment"][..., 0] - post["beta_treatment"][..., 2],
    "db24": post["beta_treatment"][..., 1] - post["beta_treatment"][..., 3],
}
az.plot_forest(diffs, combined=True)
plt.show()

../_images/11_god_spiked_the_integers_51_0.png

db13 is the difference between no-partner/partner treatments when the prosocial option was on the right.

There is a weak evidence that individual pulled left more when the partner was absent

Code 11.15¶

Compute proportion in each combination of actor & treatment

pl = d.groupby(["actor", "treatment"])["pulled_left"].mean().unstack()
pl.iloc[0, :]

treatment
0    0.333333
1    0.500000
2    0.277778
3    0.555556
Name: 1, dtype: float64

pl

treatment	0	1	2	3
actor
1	0.333333	0.500000	0.277778	0.555556
2	1.000000	1.000000	1.000000	1.000000
3	0.277778	0.611111	0.166667	0.333333
4	0.333333	0.500000	0.111111	0.444444
5	0.333333	0.555556	0.277778	0.500000
6	0.777778	0.611111	0.555556	0.611111
7	0.777778	0.833333	0.944444	1.000000

the cells contain proportions of pulls that were of the left lever

Code 11.16¶

Plotting observed proportions

ax = plt.subplot(
    xlim=(0.5, 28.5),
    ylim=(0, 1.05),
    xlabel="",
    ylabel="proportion left lever",
    xticks=[],
)
plt.yticks(ticks=[0, 0.5, 1], labels=[0, 0.5, 1])
ax.axhline(0.5, c="k", lw=1, ls="--")
for j in range(1, 8):
    ax.axvline((j - 1) * 4 + 4.5, c="k", lw=0.5)
for j in range(1, 8):
    ax.annotate(
        "actor {}".format(j),
        ((j - 1) * 4 + 2.5, 1.1),
        ha="center",
        va="center",
        annotation_clip=False,
    )
for j in np.delete(range(1, 8), 1):
    ax.plot((j - 1) * 4 + np.array([1, 3]), pl.loc[j, [0, 2]], "b")
    ax.plot((j - 1) * 4 + np.array([2, 4]), pl.loc[j, [1, 3]], "b")
x = np.arange(1, 29).reshape(7, 4)
ax.scatter(
    x[:, [0, 1]].reshape(-1),
    pl.values[:, [0, 1]].reshape(-1),
    edgecolor="b",
    facecolor="w",
    zorder=3,
)
ax.scatter(
    x[:, [2, 3]].reshape(-1), pl.values[:, [2, 3]].reshape(-1), marker=".", c="b", s=80
)
yoff = 0.01
ax.annotate("R/N", (1, pl.loc[1, 0] - yoff), ha="center", va="top")
ax.annotate("L/N", (2, pl.loc[1, 1] + yoff), ha="center", va="bottom")
ax.annotate("R/P", (3, pl.loc[1, 2] - yoff), ha="center", va="top")
ax.annotate("L/P", (4, pl.loc[1, 3] + yoff), ha="center", va="bottom")
ax.set_title("observed proportions\n")
plt.tight_layout()

../_images/11_god_spiked_the_integers_58_0.png

Code 11.17¶

dat = {"actor": np.repeat(np.arange(7), 4), "treatment": np.tile(np.arange(4), 7)}


def compute_p_for_given_actor_treatment(params):
    a = params[0]
    t = params[1]
    p = post["alpha"].values[0][:, a] + post["beta_treatment"].values[0][:, t]
    return tf.math.sigmoid(p)


params = zip(dat["actor"], dat["treatment"])

p = np.array(list(map(compute_p_for_given_actor_treatment, params))).T

p_mu = np.mean(p, 0)
p_ci = np.percentile(p, q=(4.5, 95.5), axis=0)

Code 11.18¶

d["side"] = d.prosoc_left  # right 0, left 1
d["cond"] = d.condition  # no partner 0, partner 1

Code 11.19¶

tdf = dataframe_to_tensors(
    "Chimpanzee",
    d,
    {
        "pulled_left": tf.float32,
        "actor_id": tf.int32,
        "side": tf.int32,
        "cond": tf.int32,
    },
)


def model_11_5(actors, sides, conds):
    def _generator():
        alpha = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.5), sample_shape=7))
        beta_side = yield Root(
            tfd.Sample(tfd.Normal(loc=0.0, scale=0.5), sample_shape=2)
        )
        beta_cond = yield Root(
            tfd.Sample(tfd.Normal(loc=0.0, scale=0.5), sample_shape=2)
        )

        term1 = tf.gather(alpha, actors, axis=-1)
        term2 = tf.gather(beta_side, sides, axis=-1)
        term3 = tf.gather(beta_cond, conds, axis=-1)

        logit = term1 + term2 + term3

        pulled_left = yield tfd.Independent(
            tfd.Binomial(total_count=1.0, logits=logit), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=False)


jdc_11_5 = model_11_5(tdf.actor_id, tdf.side, tdf.cond)
jdc_11_5.sample()

StructTuple(
  var0=<tf.Tensor: shape=(7,), dtype=float32, numpy=
    array([-1.5783112 ,  0.5358548 , -2.0334883 ,  0.93882376,  0.03772911,
           -1.7945005 ,  1.2174203 ], dtype=float32)>,
  var1=<tf.Tensor: shape=(2,), dtype=float32, numpy=array([ 0.24999732, -0.52624685], dtype=float32)>,
  var2=<tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.7743274 , 0.52075315], dtype=float32)>,
  var3=<tf.Tensor: shape=(504,), dtype=float32, numpy=
    array([0., 1., 1., 0., 1., 0., 1., 0., 1., 1., 0., 0., 0., 0., 1., 0., 0.,
           1., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 1., 0., 0., 1., 0., 0.,
           1., 0., 0., 1., 0., 0., 1., 1., 1., 1., 0., 1., 0., 1., 0., 1., 0.,
           0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
           0., 0., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 0., 1., 1.,
           0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 0., 0., 1.,
           1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1.,
           1., 0., 1., 0., 1., 1., 1., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1.,
           0., 1., 1., 1., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
           0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 1., 0.,
           0., 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 1., 0.,
           0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
           0., 0., 1., 0., 0., 0., 1., 0., 1., 0., 0., 0., 1., 1., 0., 1., 1.,
           1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 0., 1., 1.,
           1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
           1., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 0., 1., 1., 1., 0., 1.,
           1., 1., 1., 1., 0., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 0., 0.,
           1., 1., 1., 0., 0., 1., 1., 1., 0., 1., 1., 0., 1., 0., 1., 0., 1.,
           1., 1., 0., 1., 1., 1., 0., 1., 0., 1., 1., 1., 0., 1., 0., 1., 1.,
           1., 1., 0., 1., 1., 1., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 1.,
           1., 0., 1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1., 1., 1., 1.,
           1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
           0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 0., 0.,
           0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
           0., 0., 0., 0., 1., 0., 0., 1., 1., 0., 0., 1., 0., 0., 0., 0., 0.,
           0., 0., 1., 0., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1.,
           1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1., 1.,
           1., 0., 1., 0., 1., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1., 1.,
           1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1.,
           0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)>
)

init_state = [
    tf.zeros([NUMBER_OF_CHAINS, 7]),
    tf.zeros([NUMBER_OF_CHAINS, 2]),
    tf.zeros([NUMBER_OF_CHAINS, 2]),
]

bijectors = [tfb.Identity(), tfb.Identity(), tfb.Identity()]

observed_data = (tdf.pulled_left,)

posterior_11_5, trace_11_5 = sample_posterior(
    jdc_11_5,
    observed_data=observed_data,
    init_state=init_state,
    bijectors=bijectors,
    params=["alpha", "beta_side", "beta_cond"],
)


az.summary(trace_11_5, hdi_prob=0.89)

	mean	sd	hdi_5.5%	hdi_94.5%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[0]	-0.702	0.424	-1.441	-0.098	0.040	0.028	114.0	144.0	1.00
alpha[1]	3.730	0.696	2.645	4.776	0.060	0.043	136.0	138.0	1.01
alpha[2]	-0.992	0.425	-1.709	-0.334	0.041	0.029	110.0	97.0	1.00
alpha[3]	-1.000	0.422	-1.593	-0.267	0.040	0.029	110.0	147.0	1.00
alpha[4]	-0.702	0.407	-1.339	-0.085	0.039	0.028	110.0	152.0	1.01
alpha[5]	0.220	0.419	-0.370	0.939	0.039	0.028	117.0	236.0	1.00
alpha[6]	1.699	0.462	1.026	2.480	0.039	0.028	142.0	229.0	1.00
beta_side[0]	-0.172	0.299	-0.584	0.346	0.019	0.013	250.0	468.0	1.00
beta_side[1]	0.525	0.317	0.049	1.045	0.022	0.016	203.0	146.0	1.00
beta_cond[0]	0.310	0.328	-0.267	0.785	0.025	0.018	168.0	360.0	1.01
beta_cond[1]	0.078	0.325	-0.478	0.550	0.025	0.018	171.0	337.0	1.00

Code 11.20¶

We need to compute likelihoods in order to use arviz’s compare method

def compute_and_store_log_likelihood_for_model_11_4():

    sample_alpha = posterior_11_4["alpha"]
    sample_beta = posterior_11_4["beta_treatment"]

    ds, _ = jdc_11_4.sample_distributions(value=[sample_alpha, sample_beta, None])

    log_likelihood_11_4 = ds[-1].distribution.log_prob(tdf.pulled_left).numpy()

    # we need to insert this in the sampler_stats
    sample_stats_11_4 = trace_11_4.sample_stats

    coords = [
        sample_stats_11_4.coords["chain"],
        sample_stats_11_4.coords["draw"],
        np.arange(504),
    ]

    sample_stats_11_4["log_likelihood"] = xr.DataArray(
        log_likelihood_11_4,
        coords=coords,
        dims=["chain", "draw", "log_likelihood_dim_0"],
    )


compute_and_store_log_likelihood_for_model_11_4()

def compute_and_store_log_likelihood_for_model_11_5():

    sample_alpha = posterior_11_5["alpha"]
    sample_beta_side = posterior_11_5["beta_side"]
    sample_beta_cond = posterior_11_5["beta_cond"]

    ds, _ = jdc_11_5.sample_distributions(
        value=[sample_alpha, sample_beta_side, sample_beta_cond, None]
    )

    log_likelihood_11_5 = ds[-1].distribution.log_prob(tdf.pulled_left).numpy()

    # we need to insert this in the sampler_stats
    sample_stats_11_5 = trace_11_5.sample_stats

    coords = [
        sample_stats_11_5.coords["chain"],
        sample_stats_11_5.coords["draw"],
        np.arange(504),
    ]

    sample_stats_11_5["log_likelihood"] = xr.DataArray(
        log_likelihood_11_5,
        coords=coords,
        dims=["chain", "draw", "log_likelihood_dim_0"],
    )


compute_and_store_log_likelihood_for_model_11_5()

az.compare({"m11.5": trace_11_5, "m11.4": trace_11_4}, ic="loo")

	rank	loo	p_loo	d_loo	weight	se	dse	warning	loo_scale
m11.5	0	-265.421100	7.883432	0.000000	1.0	9.528981	0.000000	False	log
m11.4	1	-266.360266	8.594999	0.939166	0.0	9.422382	0.675961	False	log

11.1.2 Relative shark and absolute penguin¶

Code 11.23¶

post = trace_11_4.posterior

np.mean(np.exp(post["beta_treatment"][:, :, 3] - post["beta_treatment"][:, :, 1]))

<xarray.DataArray 'beta_treatment' ()>
array(0.94151527, dtype=float32)

Aggregated binomial: Chimpanzees again,condensed¶

Code 11.24¶

d = RethinkingDataset.Chimpanzees.get_dataset()
d["treatment"] = d.prosoc_left + 2 * d.condition
d["side"] = d.prosoc_left  # right 0, left 1
d["cond"] = d.condition  # no partner 0, partner 1
d_aggregated = (
    d.groupby(["treatment", "actor", "side", "cond"])["pulled_left"].sum().reset_index()
)
d_aggregated.rename(columns={"pulled_left": "left_pulls"}, inplace=True)

d_aggregated["actor_id"] = d_aggregated["actor"].values - 1

d_aggregated

	treatment	actor	side	cond	left_pulls	actor_id
0	0	1	0	0	6	0
1	0	2	0	0	18	1
2	0	3	0	0	5	2
3	0	4	0	0	6	3
4	0	5	0	0	6	4
5	0	6	0	0	14	5
6	0	7	0	0	14	6
7	1	1	1	0	9	0
8	1	2	1	0	18	1
9	1	3	1	0	11	2
10	1	4	1	0	9	3
11	1	5	1	0	10	4
12	1	6	1	0	11	5
13	1	7	1	0	15	6
14	2	1	0	1	5	0
15	2	2	0	1	18	1
16	2	3	0	1	3	2
17	2	4	0	1	2	3
18	2	5	0	1	5	4
19	2	6	0	1	10	5
20	2	7	0	1	17	6
21	3	1	1	1	10	0
22	3	2	1	1	18	1
23	3	3	1	1	6	2
24	3	4	1	1	8	3
25	3	5	1	1	9	4
26	3	6	1	1	11	5
27	3	7	1	1	18	6

Code 11.25¶

tdf = dataframe_to_tensors(
    "Chimpanzee",
    d_aggregated,
    {"actor_id": tf.int32, "treatment": tf.int32, "left_pulls": tf.float32},
)


def model_11_6(actors, treatments):
    def _generator():
        alpha = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.5), sample_shape=7))
        beta_treatment = yield Root(
            tfd.Sample(tfd.Normal(loc=0.0, scale=0.5), sample_shape=4)
        )

        term1 = tf.gather(alpha, actors, axis=-1)
        term2 = tf.gather(beta_treatment, treatments, axis=-1)

        logit = term1 + term2

        pulled_left = yield tfd.Independent(
            tfd.Binomial(total_count=18.0, logits=logit), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=False)


jdc_11_6 = model_11_6(tdf.actor_id, tdf.treatment)
jdc_11_6.sample()

StructTuple(
  var0=<tf.Tensor: shape=(7,), dtype=float32, numpy=
    array([-0.7566206 ,  0.36602026,  1.1799304 , -2.2945683 ,  1.754464  ,
            1.5971856 , -0.06230567], dtype=float32)>,
  var1=<tf.Tensor: shape=(4,), dtype=float32, numpy=array([ 0.4967591 , -0.46505028, -0.63481325, -0.15960528], dtype=float32)>,
  var2=<tf.Tensor: shape=(28,), dtype=float32, numpy=
    array([ 7., 13., 16.,  3., 17., 13.,  9.,  3., 10.,  8.,  1., 13., 14.,
            5.,  6.,  7., 12.,  0., 16., 13.,  7.,  7., 10., 14.,  0., 17.,
           14., 11.], dtype=float32)>
)

init_state = [tf.zeros([NUMBER_OF_CHAINS, 7]), tf.zeros([NUMBER_OF_CHAINS, 4])]

bijectors = [tfb.Identity(), tfb.Identity()]

observed_data = (tdf.left_pulls,)

posterior_11_6, trace_11_6 = sample_posterior(
    jdc_11_6,
    observed_data=observed_data,
    init_state=init_state,
    bijectors=bijectors,
    params=["alpha", "beta_treatment"],
)


az.summary(trace_11_6, hdi_prob=0.89)

WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.

	mean	sd	hdi_5.5%	hdi_94.5%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[0]	-0.456	0.337	-0.992	0.051	0.017	0.012	396.0	419.0	1.00
alpha[1]	3.828	0.739	2.696	4.916	0.041	0.029	327.0	383.0	1.00
alpha[2]	-0.741	0.334	-1.276	-0.233	0.016	0.011	457.0	312.0	1.00
alpha[3]	-0.759	0.358	-1.316	-0.179	0.016	0.012	488.0	343.0	1.00
alpha[4]	-0.459	0.325	-0.985	0.035	0.016	0.012	397.0	399.0	1.00
alpha[5]	0.486	0.346	-0.067	1.009	0.016	0.012	479.0	386.0	1.00
alpha[6]	1.936	0.356	1.318	2.441	0.017	0.012	444.0	334.0	1.00
beta_treatment[0]	-0.048	0.287	-0.513	0.400	0.016	0.011	334.0	326.0	1.00
beta_treatment[1]	0.476	0.282	0.033	0.929	0.017	0.012	261.0	352.0	1.00
beta_treatment[2]	-0.358	0.295	-0.811	0.115	0.017	0.012	287.0	274.0	1.01
beta_treatment[3]	0.372	0.292	-0.103	0.825	0.018	0.013	268.0	326.0	1.00

Code 11.26 (Find a way to get the comparison even if different number of obs)¶

# need to compute likelihood in order to do comparison


def compute_and_store_log_likelihood_for_model_11_6():

    sample_alpha = posterior_11_6["alpha"]
    sample_beta = posterior_11_6["beta_treatment"]

    ds, _ = jdc_11_6.sample_distributions(value=[sample_alpha, sample_beta, None])

    log_likelihood_11_6 = ds[-1].distribution.log_prob(tdf.left_pulls).numpy()

    # we need to insert this in the sampler_stats
    sample_stats_11_6 = trace_11_6.sample_stats

    coords = [
        sample_stats_11_6.coords["chain"],
        sample_stats_11_6.coords["draw"],
        np.arange(28),
    ]

    sample_stats_11_6["log_likelihood"] = xr.DataArray(
        log_likelihood_11_6,
        coords=coords,
        dims=["chain", "draw", "log_likelihood_dim_0"],
    )


compute_and_store_log_likelihood_for_model_11_6()

# in order to use comparison the number of observations should be same
# which is not the case with m11.4 (504) and m11.6 (28)
# and hence not able to compare using the arviz method

import warnings

try:
    az.compare({"m11.6": trace_11_6, "m11.4": trace_11_4}, ic="loo")
except Exception as e:
    warnings.warn("\n{}: {}".format(type(e).__name__, e))

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/arviz/stats/stats.py:695: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  "Estimated shape parameter of Pareto distribution is greater than 0.7 for "

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/ipykernel_launcher.py:10: UserWarning: 
ValueError: The number of observations should be the same across all models
  # Remove the CWD from sys.path while we load stuff.

Code 11.27¶

# deviance of aggregated 6-in-9
print(-2 * tfd.Binomial(total_count=9.0, probs=0.2).log_prob(6))
# deviance of dis-aggregated
print(
    -2
    * np.sum(tfd.Bernoulli(probs=0.2).log_prob(np.array([1, 1, 1, 1, 1, 1, 0, 0, 0])))
)

WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.

tf.Tensor(11.7904825, shape=(), dtype=float32)
20.652116775512695

11.1.4 Aggregated binomial: Graduate school admissions¶

Code 11.28¶

Evaluate if there is a gender bias in admissions

d = RethinkingDataset.UCBadmit.get_dataset()
d["gid"] = (d["applicant.gender"] != "male").astype(int)
d

	dept	applicant.gender	admit	reject	applications	gid
1	A	male	512	313	825	0
2	A	female	89	19	108	1
3	B	male	353	207	560	0
4	B	female	17	8	25	1
5	C	male	120	205	325	0
6	C	female	202	391	593	1
7	D	male	138	279	417	0
8	D	female	131	244	375	1
9	E	male	53	138	191	0
10	E	female	94	299	393	1
11	F	male	22	351	373	0
12	F	female	24	317	341	1

There are 12 rows in the above dataset however they collectively represent 4526 applications. Counting the rows in the data table is not a sensible way to assess sample size

tdf = dataframe_to_tensors(
    "UCBAdmit", d, {"gid": tf.int32, "applications": tf.float32, "admit": tf.float32}
)

Code 11.29¶

We will model the admission as the outcome that depends on the gender

def model_11_7(gid, N):
    def _generator():
        alpha = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.5), sample_shape=2))
        p = tf.sigmoid(tf.squeeze(tf.gather(alpha, gid, axis=-1)))

        A = yield tfd.Independent(
            tfd.Binomial(total_count=N, probs=p), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=True)


jdc_11_7 = model_11_7(tdf.gid, tdf.applications)

init_state = [
    tf.zeros([NUMBER_OF_CHAINS, 2]),
]

bijectors = [tfb.Identity()]

posterior_11_7, trace_11_7 = sample_posterior(
    jdc_11_7,
    observed_data=(tdf.admit,),
    params=["alpha"],
    init_state=init_state,
    bijectors=bijectors,
)

az.summary(trace_11_7, round_to=2, kind="all", hdi_prob=0.89)

WARNING:tensorflow:@custom_gradient grad_fn has 'variables' in signature, but no ResourceVariables were used on the forward pass.

	mean	sd	hdi_5.5%	hdi_94.5%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[0]	-0.22	0.04	-0.28	-0.16	0.00	0.00	551.81	442.28	1.00
alpha[1]	-0.81	0.05	-0.88	-0.72	0.01	0.01	28.42	34.73	1.03

Code 11.30¶

Posterior of males is higher than females, here we compute the contrast

diff_a = trace_11_7.posterior["alpha"][:, :, 0] - trace_11_7.posterior["alpha"][:, :, 1]
trace_11_7.posterior["diff_a"] = diff_a

# to add diff_p have to do lot of massaging to make xarray happy !
diff_p = tf.sigmoid(trace_11_7.posterior["alpha"][:, :, 0].values) - tf.sigmoid(
    trace_11_7.posterior["alpha"][:, :, 1].values
)

coords = [trace_11_7.posterior.coords["chain"], trace_11_7.posterior.coords["draw"]]
trace_11_7.posterior["diff_p"] = xr.DataArray(
    diff_p, coords=coords, dims=["chain", "draw"]
)

az.summary(trace_11_7, hdi_prob=0.89)

	mean	sd	hdi_5.5%	hdi_94.5%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[0]	-0.222	0.038	-0.283	-0.162	0.002	0.001	552.0	442.0	1.00
alpha[1]	-0.814	0.051	-0.884	-0.721	0.010	0.007	28.0	35.0	1.03
diff_a	0.592	0.064	0.496	0.693	0.010	0.007	47.0	78.0	1.01
diff_p	0.138	0.015	0.116	0.160	0.002	0.001	54.0	78.0	1.00

Log odds difference is certainly positive, corresponding to a higher prob of admissions for male applicants.

Code 11.31¶

# compute the posterior predictive given the posterior parameters

N = tf.cast(d.applications.values, dtype=tf.float32)
gid = d.gid.values

# only picking the first chain
sample_alpha = posterior_11_7["alpha"][
    0:,
]

sample_pbar = tf.sigmoid(tf.squeeze(tf.gather(sample_alpha, gid, axis=-1)))

dist = tfd.Binomial(total_count=N, probs=sample_pbar)

# taking the samples from first chain
predictive_samples = dist.sample().numpy()[0]

WARNING:tensorflow:5 out of the last 11 calls to <function _random_binomial at 0x7feb55cbd170> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

admit_rate = predictive_samples / N

plt.errorbar(
    range(1, 13),
    np.mean(admit_rate, 0),
    np.std(admit_rate, 0) / 2,
    fmt="o",
    c="k",
    mfc="none",
    ms=7,
    elinewidth=1,
)
plt.plot(range(1, 13), np.percentile(admit_rate, 5.5, 0), "k+")
plt.plot(range(1, 13), np.percentile(admit_rate, 94.5, 0), "k+")
# draw lines connecting points from same dept
for i in range(1, 7):
    x = 1 + 2 * (i - 1)
    y1 = d.admit.iloc[x - 1] / d.applications.iloc[x - 1]
    y2 = d.admit.iloc[x] / d.applications.iloc[x]
    plt.plot((x, x + 1), (y1, y2), "bo-")
    plt.annotate(
        d.dept.iloc[x], (x + 0.5, (y1 + y2) / 2 + 0.05), ha="center", color="royalblue"
    )
plt.gca().set(ylim=(0, 1), xticks=range(1, 13), ylabel="admit", xlabel="case")
plt.tight_layout()

../_images/11_god_spiked_the_integers_102_0.png

What above plot shows is that only for departments C & E, there was a lower rate of admissions for females. Now this is not in sync with what we observed earlier where the model was telling that females should expect an overall 14% lower chance of admission.

What is going on ?

Problem here is that male and females do not apply to same departments & departments also vary in their rate of admissions. Females do not apply to A & B that has higher rate of acceptance. Instead they applied to departments that have lower rate (10% of applicants) of acceptance.

Code 11.32¶

Changing the question to be asked -

Instead of asking -

What are the average probabilities of admission for females and males across all departments?

we are now going to ask -

What is the average difference in probability of admis- sion between females and males within departments?

# construct 6 unique indexes such that there are 2 entries per index
dept_id = np.repeat(np.arange(6), 2)


def model_11_8(gid, dept_id, N):
    def _generator():
        alpha = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.5), sample_shape=2))
        delta = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.5), sample_shape=6))

        p = tf.sigmoid(
            tf.squeeze(tf.gather(alpha, gid, axis=-1))
            + tf.squeeze(tf.gather(delta, dept_id, axis=-1))
        )

        A = yield tfd.Independent(
            tfd.Binomial(total_count=N, probs=p), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=True)


jdc_11_8 = model_11_8(tdf.gid, dept_id, tdf.applications)

init_state = [
    tf.zeros([NUMBER_OF_CHAINS, 2]),
    tf.zeros([NUMBER_OF_CHAINS, 6]),
]

bijectors = [tfb.Identity(), tfb.Identity()]

posterior_11_8, trace_11_8 = sample_posterior(
    jdc_11_8,
    observed_data=(tdf.admit,),
    params=["alpha", "delta"],
    init_state=init_state,
    bijectors=bijectors,
    num_samples=2000,
)

az.summary(trace_11_8, hdi_prob=0.89)

WARNING:tensorflow:5 out of the last 5 calls to <function run_hmc_chain at 0x7feb5545e7a0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

	mean	sd	hdi_5.5%	hdi_94.5%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[0]	-0.721	0.580	-1.688	0.271	0.110	0.080	29.0	60.0	1.06
alpha[1]	-0.625	0.583	-1.600	0.391	0.111	0.080	29.0	58.0	1.06
delta[0]	1.304	0.583	0.427	2.398	0.110	0.080	29.0	59.0	1.06
delta[1]	1.257	0.586	0.348	2.313	0.110	0.080	30.0	66.0	1.06
delta[2]	0.042	0.582	-0.824	1.123	0.110	0.078	29.0	59.0	1.06
delta[3]	0.010	0.582	-0.840	1.118	0.110	0.079	29.0	65.0	1.06
delta[4]	-0.434	0.583	-1.302	0.670	0.110	0.078	29.0	62.0	1.06
delta[5]	-1.988	0.595	-2.931	-0.946	0.109	0.078	31.0	67.0	1.06

Code 11.33¶

Computing the contrast

diff_a = trace_11_8.posterior["alpha"][:, :, 0] - trace_11_8.posterior["alpha"][:, :, 1]
diff_p = tf.sigmoid(trace_11_8.posterior["alpha"][:, :, 0].values) - tf.sigmoid(
    trace_11_8.posterior["alpha"][:, :, 1].values
)

trace_11_8.posterior["diff_a"] = diff_a

coords = [trace_11_8.posterior.coords["chain"], trace_11_8.posterior.coords["draw"]]
trace_11_8.posterior["diff_p"] = xr.DataArray(
    diff_p, coords=coords, dims=["chain", "draw"]
)

az.summary(trace_11_8, hdi_prob=0.89)

	mean	sd	hdi_5.5%	hdi_94.5%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
alpha[0]	-0.721	0.580	-1.688	0.271	0.110	0.080	29.0	60.0	1.06
alpha[1]	-0.625	0.583	-1.600	0.391	0.111	0.080	29.0	58.0	1.06
delta[0]	1.304	0.583	0.427	2.398	0.110	0.080	29.0	59.0	1.06
delta[1]	1.257	0.586	0.348	2.313	0.110	0.080	30.0	66.0	1.06
delta[2]	0.042	0.582	-0.824	1.123	0.110	0.078	29.0	59.0	1.06
delta[3]	0.010	0.582	-0.840	1.118	0.110	0.079	29.0	65.0	1.06
delta[4]	-0.434	0.583	-1.302	0.670	0.110	0.078	29.0	62.0	1.06
delta[5]	-1.988	0.595	-2.931	-0.946	0.109	0.078	31.0	67.0	1.06
diff_a	-0.097	0.085	-0.228	0.044	0.002	0.002	1174.0	969.0	1.00
diff_p	-0.021	0.019	-0.048	0.012	0.001	0.001	623.0	820.0	1.01

Code 11.34¶

pg = np.stack(
    list(
        map(
            lambda k: np.divide(
                d.applications[dept_id == k].values, d.applications[dept_id == k].sum()
            ),
            range(6),
        )
    ),
    axis=0,
).T
pg = pd.DataFrame(pg, index=["male", "female"], columns=d.dept.unique())
pg.round(2)

	A	B	C	D	E	F
male	0.88	0.96	0.35	0.53	0.33	0.52
female	0.12	0.04	0.65	0.47	0.67	0.48

Department A receives 88% of its applications from males. Department E receives 33% from males.

11.2 Poisson regression¶

Code 11.35¶

Binomial is good when the count is known but quite often it is not the case.

When the count goes extremely high then mean and variance of Binomial starts to become same and this is known as Poisson Distribution

y = tfd.Binomial(1000, probs=1 / 1000).sample((int(1e5),))
tf.reduce_mean(y).numpy(), np.var(y)

WARNING:tensorflow:6 out of the last 12 calls to <function _random_binomial at 0x7feb55cbd170> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

(0.99741, 0.99274325)

Useful for binomial events for which the number of trials N is unknown or uncountably large.

Code 11.36¶

Counts of unique tool types for 10 historical Oceanic societies

d = RethinkingDataset.Kline.get_dataset()
d

	culture	population	contact	total_tools	mean_TU
0	Malekula	1100	low	13	3.2
1	Tikopia	1500	low	22	4.7
2	Santa Cruz	3600	low	24	4.0
3	Yap	4791	high	43	5.0
4	Lau Fiji	7400	high	33	5.0
5	Trobriand	8000	high	19	4.0
6	Chuuk	9200	high	40	3.8
7	Manus	13000	low	28	6.6
8	Tonga	17500	high	55	5.4
9	Hawaii	275000	low	71	6.6

To remember - Number of rows in count based models is not same as the sample size

Code 11.37¶

Here the outcome variable is total tools

We want to establish a relationship between log population and our outcome variable.

We also want to see the interaction between contact and log population

So first and foremost let’s create a column for log population

d["P"] = d.population.pipe(np.log).pipe(lambda x: (x - x.mean()) / x.std())
d["cid"] = (d.contact == "high").astype(int)

Code 11.38¶

x = np.linspace(0, 100, 200)
plt.plot(x, tf.exp(tfd.LogNormal(loc=0.0, scale=10.0).log_prob(x)))
plt.show()

../_images/11_god_spiked_the_integers_122_0.png

Code 11.39¶

a = tfd.Normal(loc=0.0, scale=10.0).sample(int(1e4))
lambda_ = tf.exp(a)
tf.reduce_mean(lambda_)

<tf.Tensor: shape=(), dtype=float32, numpy=2154870300000.0>

Code 11.40¶

x = np.linspace(0, 100, 200)
plt.plot(x, tf.exp(tfd.LogNormal(loc=3.0, scale=0.5).log_prob(x)))
plt.show()

../_images/11_god_spiked_the_integers_126_0.png

Code 11.41¶

N = 100
a = tfd.Normal(loc=3.0, scale=0.5).sample(N)
b = tfd.Normal(loc=0.0, scale=10.0).sample(N)
plt.subplot(xlim=(-2, 2), ylim=(0, 100))
x = np.linspace(-2, 2, 100)
for i in range(N):
    plt.plot(x, tf.exp(a[i] + b[i] * x), c="k", alpha=0.5)

../_images/11_god_spiked_the_integers_128_0.png

Code 11.42¶

N = 100
a = tfd.Normal(loc=3.0, scale=0.5).sample(N).numpy()
b = tfd.Normal(loc=0.0, scale=0.2).sample(N).numpy()
plt.subplot(xlim=(-2, 2), ylim=(0, 100))
x = np.linspace(-2, 2, 100)
for i in range(N):
    plt.plot(x, tf.exp(a[i] + b[i] * x), c="k", alpha=0.5)

../_images/11_god_spiked_the_integers_130_0.png

Code 11.43¶

x_seq = np.linspace(np.log(100), np.log(200000), num=100)
lambda_ = np.array(list(map(lambda x: np.exp(a + b * x), x_seq)))

plt.subplot(
    xlim=(np.min(x_seq).item(), np.max(x_seq).item()),
    ylim=(0, 500),
    xlabel="log population",
    ylabel="total tools",
)

plt.plot(x_seq, lambda_, c="k", alpha=0.5)
plt.show()

../_images/11_god_spiked_the_integers_132_0.png

Code 11.44¶

plt.subplot(
    xlim=(np.min(np.exp(x_seq)).item(), np.max(np.exp(x_seq)).item()),
    ylim=(0, 500),
    xlabel="population",
    ylabel="total tools",
)
plt.plot(np.exp(x_seq), lambda_, c="k", alpha=0.5)
plt.show()

../_images/11_god_spiked_the_integers_134_0.png

Code 11.45¶

# dat = dict(T=d.total_tools.values, P=d.P.values, cid=d.cid.values)

# d.total_tools.values

tdf = dataframe_to_tensors(
    "Kline", d, {"total_tools": tf.float32, "P": tf.float32, "cid": tf.int32}
)

# intercept only
def model_11_9():
    def _generator():
        alpha = yield Root(tfd.Sample(tfd.Normal(loc=3.0, scale=0.5), sample_shape=1))
        lambda_ = tf.exp(alpha[..., tf.newaxis])

        T = yield tfd.Independent(
            tfd.Poisson(rate=lambda_), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=False)


jdc_11_9 = model_11_9()

init_state = [3.0 * tf.zeros([NUMBER_OF_CHAINS])]

bijectors = [tfb.Identity()]

posterior_11_9, trace_11_9 = sample_posterior(
    jdc_11_9,
    observed_data=(tdf.total_tools,),
    init_state=init_state,
    bijectors=bijectors,
    params=["alpha"],
)

WARNING:tensorflow:From /opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/tensorflow_probability/python/distributions/distribution.py:342: calling Poisson.__init__ (from tensorflow_probability.python.distributions.poisson) with interpolate_nondiscrete is deprecated and will be removed after 2021-02-01.
Instructions for updating:
The `interpolate_nondiscrete` flag is deprecated; instead use `force_probs_to_zero_outside_support` (with the opposite sense).

WARNING:tensorflow:6 out of the last 6 calls to <function run_hmc_chain at 0x7feb5545e7a0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

# interaction model
def model_11_10(cid, P):
    def _generator():
        alpha = yield Root(tfd.Sample(tfd.Normal(loc=3.0, scale=0.5), sample_shape=2))
        beta = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=0.2), sample_shape=2))

        lambda_ = tf.exp(
            tf.squeeze(tf.gather(alpha, cid, axis=-1))
            + tf.squeeze(tf.gather(beta, cid, axis=-1)) * P
        )

        T = yield tfd.Independent(
            tfd.Poisson(rate=lambda_), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=True)


jdc_11_10 = model_11_10(tdf.cid, tdf.P)

init_state = [tf.zeros([NUMBER_OF_CHAINS, 2]), tf.zeros([NUMBER_OF_CHAINS, 2])]

bijectors = [tfb.Identity(), tfb.Identity()]

posterior_11_10, trace_11_10 = sample_posterior(
    jdc_11_10,
    observed_data=(tdf.total_tools,),
    init_state=init_state,
    bijectors=bijectors,
    params=["alpha", "beta"],
)

Code 11.46¶

# We need to compute the log likelihood for model_11_9
# and then store it in the trace_11_9
#
# Reason to store it as it is expected by az.compare


def compute_and_store_log_likelihood_for_model_11_9():

    sample_alpha = posterior_11_9["alpha"]

    ds, _ = jdc_11_9.sample_distributions(value=[sample_alpha, None])

    log_likelihood_11_9 = ds[-1].distribution.log_prob(tdf.total_tools).numpy()

    # we need to insert this in the sampler_stats
    sample_stats_11_9 = trace_11_9.sample_stats

    coords = [
        sample_stats_11_9.coords["chain"],
        sample_stats_11_9.coords["draw"],
        np.arange(10),
    ]

    sample_stats_11_9["log_likelihood"] = xr.DataArray(
        log_likelihood_11_9,
        coords=coords,
        dims=["chain", "draw", "log_likelihood_dim_0"],
    )


compute_and_store_log_likelihood_for_model_11_9()

# We need to compute the log likelihood for model_11_10
# and then store it in the trace_11_10
#
# Reason to store it as it is expected by az.compare


def compute_and_store_log_likelihood_for_model_11_10():

    sample_alpha = posterior_11_10["alpha"]
    sample_beta = posterior_11_10["beta"]

    ds, _ = jdc_11_10.sample_distributions(value=[sample_alpha, sample_beta, None])

    log_likelihood_11_10 = ds[-1].distribution.log_prob(tdf.total_tools).numpy()

    # we need to insert this in the sampler_stats
    sample_stats_11_10 = trace_11_10.sample_stats

    coords = [
        sample_stats_11_10.coords["chain"],
        sample_stats_11_10.coords["draw"],
        np.arange(10),
    ]

    sample_stats_11_10["log_likelihood"] = xr.DataArray(
        log_likelihood_11_10,
        coords=coords,
        dims=["chain", "draw", "log_likelihood_dim_0"],
    )


compute_and_store_log_likelihood_for_model_11_10()

az.compare({"m11.9": trace_11_9, "m11.10": trace_11_10}, ic="loo")

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/arviz/stats/stats.py:695: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  "Estimated shape parameter of Pareto distribution is greater than 0.7 for "
/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/arviz/stats/stats.py:695: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  "Estimated shape parameter of Pareto distribution is greater than 0.7 for "

	rank	loo	p_loo	d_loo	weight	se	dse	warning	loo_scale
m11.10	0	-42.49497	6.803545	0.0000	0.959477	6.249423	0.000000	True	log
m11.9	1	-70.61657	7.924030	28.1216	0.040523	16.114811	15.528969	True	log

Code 11.47¶

k = az.loo(trace_11_10, pointwise=True).pareto_k.values

cex = 1 + (k - np.min(k)) / (np.max(k) - np.min(k))
plt.scatter(
    d.P,
    d.total_tools,
    s=40 * cex,
    edgecolors=["none" if i == 1 else "b" for i in d.cid],
    facecolors=["none" if i == 0 else "b" for i in d.cid],
)
plt.gca().set(xlabel="log population (std)", ylabel="total tools", ylim=(0, 75))

# set up the horizontal axis values to compute predictions at
ns = 100
P_seq = np.linspace(-1.4, 3, num=ns)


# only using chain 0
sample_alpha = posterior_11_10["alpha"][0]
sample_beta = posterior_11_10["beta"][0]


def predict_and_plot(cid):
    term1 = tf.gather(sample_alpha, [cid], axis=-1)
    term2 = tf.gather(sample_beta, [cid], axis=-1) * P_seq
    lambda_ = tf.exp(term1 + term2)
    lmu = tf.reduce_mean(lambda_, 0)
    lci = tfp.stats.percentile(lambda_, (0.55, 94.5), 0)
    if cid == 0:
        plt.plot(P_seq, lmu, "k--", lw=1.5)
    else:
        plt.plot(P_seq, lmu, "k", lw=1.5)
    plt.fill_between(P_seq, lci[0], lci[1], color="k", alpha=0.2)


# make predictions when cid = 1 i.e. have low contact
predict_and_plot(0)
predict_and_plot(1)

/opt/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/arviz/stats/stats.py:695: UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.7 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.
  "Estimated shape parameter of Pareto distribution is greater than 0.7 for "

../_images/11_god_spiked_the_integers_146_1.png

Open points are low contact societies and closed points are the high contact ones. Dashed curve is the posterior mean for low contact societies.

Code 11.48¶

cex = 1 + (k - np.min(k)) / (np.max(k) - np.min(k))

plt.scatter(
    d.population,
    d.total_tools,
    s=40 * cex,
    edgecolors=["none" if i == 1 else "b" for i in d.cid],
    facecolors=["none" if i == 0 else "b" for i in d.cid],
)
plt.gca().set(xlabel="population", ylabel="total tools", xlim=(0, 300000), ylim=(0, 75))

ns = 100
P_seq = np.linspace(-5, 3, num=ns)
# 1.53 is sd of log(population)
# 9 is mean of log(population)
pop_seq = np.exp(P_seq * 1.53 + 9)


def predict_and_plot(cid):
    term1 = tf.gather(sample_alpha, [cid], axis=-1)
    term2 = tf.gather(sample_beta, [cid], axis=-1) * P_seq
    lambda_ = tf.exp(term1 + term2)
    lmu = tf.reduce_mean(lambda_, 0)
    lci = tfp.stats.percentile(lambda_, (0.55, 94.5), 0)
    if cid == 0:
        plt.plot(pop_seq, lmu, "k--", lw=1.5)
    else:
        plt.plot(pop_seq, lmu, "k", lw=1.5)
    plt.fill_between(pop_seq, lci[0], lci[1], color="k", alpha=0.2)


# make predictions when cid = 1 i.e. have low contact
predict_and_plot(0)
predict_and_plot(1)

../_images/11_god_spiked_the_integers_149_0.png

Code 11.49¶

tdf = dataframe_to_tensors(
    "Kline", d, {"total_tools": tf.float32, "population": tf.float32, "cid": tf.int32}
)


def model_11_11(cid, P):
    def _generator():
        g = yield Root(tfd.Sample(tfd.Exponential(1.0), sample_shape=1))
        beta = yield Root(tfd.Sample(tfd.Exponential(1.0), sample_shape=2))
        alpha = yield Root(tfd.Sample(tfd.Normal(loc=1.0, scale=1.0), sample_shape=2))

        term1 = tf.exp(tf.squeeze(tf.gather(alpha, cid, axis=-1)))
        term2 = tf.pow(P, tf.squeeze(tf.gather(beta, cid, axis=-1))) / g

        lambda_ = term1 + term2

        T = yield tfd.Independent(
            tfd.Poisson(rate=lambda_), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=True)


jdc_11_11 = model_11_11(tdf.cid, tdf.population)

11.2.2 Negative binomial (gamm-poisson) models¶

11.2.3 Example: Exposure and the offset¶

Code 11.50¶

num_days = 30
y = tfd.Poisson(rate=1.5).sample((num_days,))

Code 11.51¶

num_weeks = 4
y_new = tfd.Poisson(rate=0.5 * 7).sample((num_weeks,))

Code 11.52¶

y_all = np.concatenate([y, y_new])
exposure = np.concatenate([np.repeat(1, 30), np.repeat(7, 4)])
monastery = np.concatenate([np.repeat(0, 30), np.repeat(1, 4)])
d = pd.DataFrame.from_dict(dict(y=y_all, days=exposure, monastery=monastery))

d

	y	days	monastery
0	2.0	1	0
1	2.0	1	0
2	0.0	1	0
3	2.0	1	0
4	1.0	1	0
5	1.0	1	0
6	3.0	1	0
7	2.0	1	0
8	2.0	1	0
9	2.0	1	0
10	3.0	1	0
11	1.0	1	0
12	0.0	1	0
13	1.0	1	0
14	1.0	1	0
15	0.0	1	0
16	2.0	1	0
17	1.0	1	0
18	2.0	1	0
19	2.0	1	0
20	2.0	1	0
21	2.0	1	0
22	0.0	1	0
23	1.0	1	0
24	3.0	1	0
25	2.0	1	0
26	1.0	1	0
27	2.0	1	0
28	0.0	1	0
29	1.0	1	0
30	2.0	7	1
31	5.0	7	1
32	3.0	7	1
33	5.0	7	1

Code 11.53¶

# compute the offset
d["log_days"] = d.days.pipe(np.log)

tdf = dataframe_to_tensors(
    "Monastry", d, {"log_days": tf.float32, "monastery": tf.float32, "y": tf.float32}
)


def model_11_12(log_days, monastery):
    def _generator():
        a = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.0), sample_shape=1))
        b = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.0), sample_shape=1))

        lambda_ = tf.exp(log_days + a[..., tf.newaxis] + b[..., tf.newaxis] * monastery)

        T = yield tfd.Independent(
            tfd.Poisson(rate=lambda_), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=False)


jdc_11_12 = model_11_12(tdf.log_days, tdf.monastery)

init_state = [tf.zeros([NUMBER_OF_CHAINS]), tf.zeros([NUMBER_OF_CHAINS])]

bijectors = [tfb.Identity(), tfb.Identity()]


posterior_11_12, trace_11_12 = sample_posterior(
    jdc_11_12,
    observed_data=(tdf.y,),
    init_state=init_state,
    bijectors=bijectors,
    params=["alpha", "beta"],
)

Code 11.54¶

# using only the first chain
sample_alpha = posterior_11_12["alpha"][
    0:,
]
sample_beta = posterior_11_12["beta"][
    0:,
]

lambda_old = tf.exp(sample_alpha)
lambda_new = tf.exp(sample_alpha + sample_beta)

az.summary(
    az.from_dict({"lambda_old": lambda_old, "lambda_new": lambda_new}), hdi_prob=0.89
)

	mean	sd	hdi_5.5%	hdi_94.5%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
lambda_old	1.417	0.206	1.085	1.754	0.016	0.012	160.0	214.0	1.0
lambda_new	0.567	0.122	0.376	0.735	0.007	0.005	265.0	381.0	1.0

11.3 Multinomial and categorical models¶

11.3.1 Predictors matched to outcomes¶

Code 11.55¶

Here we simulate career choice from three different careers, each with its own income trait. These traits are used to assign a score to each type of event. Then when the model is fit to the data, one of these scores is held constant, and the other two scores are estimated, using the known income traits

# simulate career choices among 500 individuals
N = 500  # number of individuals
income = np.array([1, 2, 5])  # expected income of each career
score = 0.5 * income  # scores for each career, based on income

# next line converts scores to probabilities
p = tf.nn.softmax(score)

# now simulate choice
# outcome career holds event type values, not counts
career = np.repeat(np.nan, N)  # empty vector of choices for each individual

# sample chosen career for each individual
for i in range(N):
    career[i] = tfd.Categorical(probs=p).sample()

career = career.astype(int)

Code 11.56¶

Categorical likelihood is a multinomial logistic regression distribution.

Each possible career will gets its own linear model with its own features.

def model_11_13(N, K, career_income):
    def _generator():
        print("==== start")
        # intercepts
        alpha = yield Root(
            tfd.Sample(tfd.Normal(loc=0.0, scale=1.0), sample_shape=(K - 1))
        )
        # association of income with choice
        beta = yield Root(tfd.Sample(tfd.HalfNormal(scale=0.5), sample_shape=1))

        print("alpja", alpha.shape)
        print("beta", beta.shape)

        s_1 = tf.gather(alpha, [0], axis=-1) + beta * career_income[0]
        s_2 = tf.gather(alpha, [1], axis=-1) + beta * career_income[1]
        s_3 = tf.zeros_like(s_1)  # pivot

        print("s1", s_1.shape)
        print("s2", s_2.shape)

        ts = tf.stack([s_1, s_2, s_3], axis=1)

        p = tf.nn.softmax(ts)

        print("p", p.shape)

        p_p = p
        print("==== end")

        career = yield tfd.Independent(
            tfd.Categorical(probs=p_p), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=True)

Code 11.57 (TODO - not able to make it work with 2 chains)¶

jdc_11_13 = model_11_13(N, 3, income)

jdc_11_13.sample(2)

==== start
alpja (2,)
beta (1,)
s1 (1,)
s2 (1,)
p (1, 3)
==== end
==== start
alpja (2, 2)
beta (2, 1)
s1 (2, 1)
s2 (2, 1)
p (2, 3, 1)
==== end

StructTuple(
  var0=<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
    array([[ 0.9685273 , -0.84437585],
           [ 0.5235515 ,  0.16414931]], dtype=float32)>,
  var1=<tf.Tensor: shape=(2, 1), dtype=float32, numpy=
    array([[0.4134818 ],
           [0.18612705]], dtype=float32)>,
  var2=<tf.Tensor: shape=(2, 3), dtype=int32, numpy=
    array([[0, 0, 0],
           [0, 0, 0]], dtype=int32)>
)

NUMBER_OF_CHAINS_11_13 = 1

init_state = [tf.zeros([NUMBER_OF_CHAINS_11_13, 2]), tf.zeros([NUMBER_OF_CHAINS_11_13])]

bijectors = [tfb.Identity(), tfb.Identity()]

# Not working !

# posterior_11_13, trace_11_13 = sample_posterior(jdc_11_13,
#                                     observed_data=(career,),
#                                     init_state=init_state,
#                                     bijectors=bijectors,
#                                     params=["alpha", "beta"])

# az.summary(trace_11_13)

11.3.3 Multinomial in disguise as Poisson¶

Code 11.60¶

Multinomial in disguise as Poisson. The author shows a technique here on how to fit a multinomial/categorical model as a series of Poisson Likelihoods.

It is computationally easier to use Poisson rather than multinomial likelihoods

d = RethinkingDataset.UCBadmit.get_dataset()

d

	dept	applicant.gender	admit	reject	applications
1	A	male	512	313	825
2	A	female	89	19	108
3	B	male	353	207	560
4	B	female	17	8	25
5	C	male	120	205	325
6	C	female	202	391	593
7	D	male	138	279	417
8	D	female	131	244	375
9	E	male	53	138	191
10	E	female	94	299	393
11	F	male	22	351	373
12	F	female	24	317	341

tdf = dataframe_to_tensors(
    "UCBAdmit",
    d,
    {"applications": tf.float32, "admit": tf.float32, "reject": tf.float32},
)

Code 11.61¶

Using Poisson regression to model rate of admission & rate of rejection

def model_binomial(applications):
    def _generator():
        a = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.5), sample_shape=1))
        logit = a[..., tf.newaxis]

        T = yield tfd.Independent(
            tfd.Binomial(total_count=applications, logits=logit),
            reinterpreted_batch_ndims=1,
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=False)


jdc_model_binom = model_binomial(tdf.applications)

init_state = [tf.zeros([NUMBER_OF_CHAINS])]

bijectors = [tfb.Identity()]

observed_data = (tdf.admit,)

posterior_model_binom, trace_model_binom = sample_posterior(
    jdc_model_binom,
    observed_data=observed_data,
    init_state=init_state,
    bijectors=bijectors,
    params=["alpha"],
)

# d["rej"] = d.reject


def model_pois(applications):
    def _generator():
        a = yield Root(tfd.Sample(tfd.Normal(loc=0.0, scale=1.5), sample_shape=2))
        lambda1 = tf.exp(tf.gather(a, 0, axis=-1))
        lambda2 = tf.exp(tf.gather(a, 1, axis=-1))

        A = yield tfd.Independent(
            tfd.Poisson(rate=lambda1[..., tf.newaxis]), reinterpreted_batch_ndims=1
        )

        R = yield tfd.Independent(
            tfd.Poisson(rate=lambda2[..., tf.newaxis]), reinterpreted_batch_ndims=1
        )

    return tfd.JointDistributionCoroutine(_generator, validate_args=True)


jdc_model_pois = model_pois(tdf.applications)

jdc_model_pois.sample(3)

init_state = [tf.zeros([NUMBER_OF_CHAINS, 2])]

bijectors = [tfb.Identity()]


observed_data = (tdf.admit, tdf.reject)

posterior_model_pois, trace_model_pois = sample_posterior(
    jdc_model_pois,
    observed_data=observed_data,
    init_state=init_state,
    bijectors=bijectors,
    params=["alpha"],
)

WARNING:tensorflow:5 out of the last 7 calls to <function random_poisson at 0x7feb55966e60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

Code 11.62¶

Inferred binomial probability of admission across the entire dataset

tf.math.sigmoid(np.median(posterior_model_pois["alpha"]))

<tf.Tensor: shape=(), dtype=float32, numpy=0.9947492>

Code 11.63¶

Inplied Poisson probability

k = tf.reduce_mean(posterior_model_pois["alpha"][0], 0)

a1 = k[0]
a2 = k[1]

tf.exp(a1) / (tf.exp(a1) + tf.exp(a2))

<tf.Tensor: shape=(), dtype=float32, numpy=0.39187136>

This is same as that of binomial model

11.4 Censoring and survival¶

Code 11.64¶

x2 = np.min(tfd.Uniform(low=1, high=100).sample((int(1e5), 2)), -1)
x5 = np.min(tfd.Uniform(low=1, high=100).sample((int(1e5), 5)), -1)

x2, x5

(array([76.54987 , 11.961508, 48.184395, ..., 23.755568, 14.266549,
         6.295066], dtype=float32),
 array([ 4.383093 , 38.237564 ,  2.7101636, ..., 45.11505  , 34.192856 ,
        35.747616 ], dtype=float32))

Code 11.65¶

N = 10
M = 2
x = np.sort(tfd.Uniform(low=1, high=100).sample((int(1e5), N)))[:, 1]

Code 11.66 - 11.69 (TODO)¶

d = RethinkingDataset.AustinCats.get_dataset()
d["adopt"] = (d.out_event == "Adoption").astype(int)
dat = dict(
    days_to_event=d.days_to_event.values,
    color_id=(d.color != "Black").astype(int).values,
    adopted=d.adopt.values,
)

	actor	recipient	condition	block	trial	prosoc_left	chose_prosoc	pulled_left	treatment
0	1	NaN	0	1	2	0	1	0	0
1	1	NaN	0	1	4	0	0	1	0
2	1	NaN	0	1	6	1	0	0	1
3	1	NaN	0	1	8	0	1	0	0
4	1	NaN	0	1	10	1	1	1	1
...	...	...	...	...	...	...	...	...	...
499	7	4.0	1	6	64	1	1	1	3
500	7	6.0	1	6	66	1	1	1	3
501	7	3.0	1	6	68	0	0	1	2
502	7	7.0	1	6	70	0	0	1	2
503	7	2.0	1	6	72	0	0	1	2

	treatment	actor	side	cond	left_pulls	actor_id
0	0	1	0	0	6	0
1	0	2	0	0	18	1
2	0	3	0	0	5	2
3	0	4	0	0	6	3
4	0	5	0	0	6	4
5	0	6	0	0	14	5
6	0	7	0	0	14	6
7	1	1	1	0	9	0
8	1	2	1	0	18	1
9	1	3	1	0	11	2
10	1	4	1	0	9	3
11	1	5	1	0	10	4
12	1	6	1	0	11	5
13	1	7	1	0	15	6
14	2	1	0	1	5	0
15	2	2	0	1	18	1
16	2	3	0	1	3	2
17	2	4	0	1	2	3
18	2	5	0	1	5	4
19	2	6	0	1	10	5
20	2	7	0	1	17	6
21	3	1	1	1	10	0
22	3	2	1	1	18	1
23	3	3	1	1	6	2
24	3	4	1	1	8	3
25	3	5	1	1	9	4
26	3	6	1	1	11	5
27	3	7	1	1	18	6

	y	days	monastery
0	2.0	1	0
1	2.0	1	0
2	0.0	1	0
3	2.0	1	0
4	1.0	1	0
5	1.0	1	0
6	3.0	1	0
7	2.0	1	0
8	2.0	1	0
9	2.0	1	0
10	3.0	1	0
11	1.0	1	0
12	0.0	1	0
13	1.0	1	0
14	1.0	1	0
15	0.0	1	0
16	2.0	1	0
17	1.0	1	0
18	2.0	1	0
19	2.0	1	0
20	2.0	1	0
21	2.0	1	0
22	0.0	1	0
23	1.0	1	0
24	3.0	1	0
25	2.0	1	0
26	1.0	1	0
27	2.0	1	0
28	0.0	1	0
29	1.0	1	0
30	2.0	7	1
31	5.0	7	1
32	3.0	7	1
33	5.0	7	1

	actor	recipient	condition	block	trial	prosoc_left	chose_prosoc	pulled_left	treatment
0	1	NaN	0	1	2	0	1	0	0
1	1	NaN	0	1	4	0	0	1	0
2	1	NaN	0	1	6	1	0	0	1
3	1	NaN	0	1	8	0	1	0	0
4	1	NaN	0	1	10	1	1	1	1
...	...	...	...	...	...	...	...	...	...
499	7	4.0	1	6	64	1	1	1	3
500	7	6.0	1	6	66	1	1	1	3
501	7	3.0	1	6	68	0	0	1	2
502	7	7.0	1	6	70	0	0	1	2
503	7	2.0	1	6	72	0	0	1	2

	treatment	actor	side	cond	left_pulls	actor_id
0	0	1	0	0	6	0
1	0	2	0	0	18	1
2	0	3	0	0	5	2
3	0	4	0	0	6	3
4	0	5	0	0	6	4
5	0	6	0	0	14	5
6	0	7	0	0	14	6
7	1	1	1	0	9	0
8	1	2	1	0	18	1
9	1	3	1	0	11	2
10	1	4	1	0	9	3
11	1	5	1	0	10	4
12	1	6	1	0	11	5
13	1	7	1	0	15	6
14	2	1	0	1	5	0
15	2	2	0	1	18	1
16	2	3	0	1	3	2
17	2	4	0	1	2	3
18	2	5	0	1	5	4
19	2	6	0	1	10	5
20	2	7	0	1	17	6
21	3	1	1	1	10	0
22	3	2	1	1	18	1
23	3	3	1	1	6	2
24	3	4	1	1	8	3
25	3	5	1	1	9	4
26	3	6	1	1	11	5
27	3	7	1	1	18	6

	y	days	monastery
0	2.0	1	0
1	2.0	1	0
2	0.0	1	0
3	2.0	1	0
4	1.0	1	0
5	1.0	1	0
6	3.0	1	0
7	2.0	1	0
8	2.0	1	0
9	2.0	1	0
10	3.0	1	0
11	1.0	1	0
12	0.0	1	0
13	1.0	1	0
14	1.0	1	0
15	0.0	1	0
16	2.0	1	0
17	1.0	1	0
18	2.0	1	0
19	2.0	1	0
20	2.0	1	0
21	2.0	1	0
22	0.0	1	0
23	1.0	1	0
24	3.0	1	0
25	2.0	1	0
26	1.0	1	0
27	2.0	1	0
28	0.0	1	0
29	1.0	1	0
30	2.0	7	1
31	5.0	7	1
32	3.0	7	1
33	5.0	7	1

Statistical Rethinking (2nd Edition)

11. God Spiked the Integers

Contents