Dirichlet Distribution(DD):

* — Symmetric DD can be considered a distribution of distributions

* — Each sample from Symmetric DD is a categorical distribution over K categories.

* — Generates samples that are similar discrete distributions

* — It is parameterized G0, a distribution over K categories and a scale

factor

<code>

import numpy as np

from scipy.stats import dirichlet

np.set_printoptions(precision=2)

def stats(scale_factor, G0=[.2, .2, .6], N=10000):

samples = dirichlet(alpha = scale_factor * np.array(G0)).rvs(N)

print ” alpha:”, scale_factor

print ” element-wise mean:”, samples.mean(axis=0)

print “element-wise standard deviation:”, samples.std(axis=0)

print

for scale in [0.1, 1, 10, 100, 1000]:

stats(scale)

</code>

## Dirichlet Process(DP)

- — A way to generalize Dirichlet Distribution
- — Generate samples that are distributions similar to the parameter
- — Also has the parameter determines how much will the samples

vary from H0 - — a sample H of is constructed by drawing a countably

infinite number of samples and then setting

where — is carefully chosen weights that sum to 1

— is the Dirac delta function

* — Since the samples from a DP are similar to a parameter one-way to

test if a DP is generating your dataset is to check if the distributions(of

different attributes/dimensions) you get are similar to each other. Something like

permutation test, but understand the assumptions and caveats.

- — The code for dirichlet sampling can be written as:

<code>

import matplotlib.pyplot as plt

from scipy.stats import beta, norm

def dirichlet_sample_approximation(base_measure, alpha, tol=0.01):

betas = []

pis = []

betas.append(beta(1, alpha).rvs())

pis.append(betas[0])

while sum(pis) < (1.-tol):

s = np.sum([np.log(1 – b) for b in betas])

new_beta = beta(1, alpha).rvs()

betas.append(new_beta)

pis.append(new_beta * np.exp(s))

pis = np.array(pis)

thetas = np.array([base_measure() for _ in pis])

return pis, thetas

def plot_normal_dp_approximation(alpha):

plt.figure()

plt.title(“Dirichlet Process Sample with N(0,1) Base Measure”)

plt.suptitle(“alpha: %s” % alpha)

pis, thetas = dirichlet_sample_approximation(lambda: norm().rvs(), alpha)

pis = pis * (norm.pdf(0) / pis.max())

plt.vlines(thetas, 0, pis, )

X = np.linspace(-4,4,100)

plt.plot(X, norm.pdf(X))

plot_normal_dp_approximation(.1)

plot_normal_dp_approximation(1)

plot_normal_dp_approximation(10)

plot_normal_dp_approximation(1000)

</code>

- — The code for Dirichlet process can be written as :

<code>

from numpy.random import choice

class DirichletProcessSample():

def **init**(self, base_measure, alpha):

self.base_measure = base_measure

self.alpha = alpha

self.cache = []

self.weights = []

self.total_stick_used = 0.

def **call**(self):

remaining = 1.0 – self.total_stick_used

i = DirichletProcessSample.roll_die(self.weights + [remaining])

if i is not None and i < len(self.weights) :

return self.cache[i]

else:

stick_piece = beta(1, self.alpha).rvs() * remaining

self.total_stick_used += stick_piece

self.weights.append(stick_piece)

new_value = self.base_measure()

self.cache.append(new_value)

return new_value

@staticmethod

def roll_die(weights):

if weights:

return choice(range(len(weights)), p=weights)

else:

return None

</code>