API Documentation

Gaussian Proposal

class gmmmc.proposals.gaussian_proposals.GaussianStepCovarProposal(step_sizes=(0.001, ))

Bases: gmmmc.proposals.proposals.Proposal

Methods

get_acceptance() Calculate and return the acceptance rate of the proposal.
get_illegal() Calculate and return the illegal proposal rate of this proposal.
propose(X, gmm, target[, n_jobs]) Propose a new set of GMM covariances (diagonal only).
propose(X, gmm, target, n_jobs=1)

Propose a new set of GMM covariances (diagonal only).

Parameters:

X : 2-D array like of shape (n_samples, n_features)

The observed data or evidence.

gmm : GMM object

The current state (set of gmm parameters) in the Markov Chain

target : GMMPosteriorTarget object

The target distribution to be found.

n_jobs : int

Number of cpu cores to use in the calculation of log probabilities.

Returns:

: GMM

A new GMM object initialised with new covariance parameters.

class gmmmc.proposals.gaussian_proposals.GaussianStepMeansProposal(step_sizes=(0.001, ))

Bases: gmmmc.proposals.proposals.Proposal

Gaussian Proposal distribution for means of a GMM

Methods

get_acceptance() Calculate and return the acceptance rate of the proposal.
get_illegal() Calculate and return the illegal proposal rate of this proposal.
propose(X, gmm, target[, n_jobs]) Propose a new set of GMM means.
propose(X, gmm, target, n_jobs=1)

Propose a new set of GMM means.

Parameters:

X : 2-D array like of shape (n_samples, n_features)

The observed data or evidence.

gmm : GMM object

The current state (set of gmm parameters) in the Markov Chain

target : GMMPosteriorTarget object

The target distribution to be found.

n_jobs : int

Number of cpu cores to use in the calculation of log probabilities.

Returns:

: GMM

A new GMM object initialised with new mean parameters.

class gmmmc.proposals.gaussian_proposals.GaussianStepWeightsProposal(n_mixtures, step_sizes=(0.001, ), threshold=0.001)

Bases: gmmmc.proposals.proposals.Proposal

Methods

get_acceptance() Calculate and return the acceptance rate of the proposal.
get_illegal() Calculate and return the illegal proposal rate of this proposal.
invTransformSimplex(simplex_coords) Transforms a point on the simplex to the original vector space.
propose(X, gmm, target[, n_jobs]) Propose a new set of weight vectors.
transformSimplex(weights) Project weight vector onto the normal simplex.
invTransformSimplex(simplex_coords)

Transforms a point on the simplex to the original vector space.

Parameters:

simplex_coords : array_like of shape (n_mixtures - 1,)

Coordinates of a weight vector on the simplex.

Returns:

: array_like of shape(n_mixtures,)

vector of weights.

propose(X, gmm, target, n_jobs=1)

Propose a new set of weight vectors. Parameters ———- X : 2-D array like of shape (n_samples, n_features)

The observed data or evidence.
gmm : GMM object
The current state (set of gmm parameters) in the Markov Chain
target : GMMPosteriorTarget object
The target distribution to be found.
n_jobs : int
Number of cpu cores to use in the calculation of log probabilities.
: GMM A new GMM object initialised with new covariance parameters.
transformSimplex(weights)

Project weight vector onto the normal simplex.

Parameters:

weights : array_like of shape (n_mixtures,)

vector of weights for each gaussian component

Returns:

: array_like of shape (n_mixtures-1,)

vector of weights projected onto the simplex plane

class gmmmc.proposals.gaussian_proposals.GaussianTuningStepMeansProposal(step_sizes=(0.001, ), limit=200)

Bases: gmmmc.proposals.proposals.Proposal

Gaussian Proposal distribution for means of a GMM

Methods

get_acceptance() Calculate and return the acceptance rate of the proposal.
get_illegal() Calculate and return the illegal proposal rate of this proposal.
propose(X, gmm, target[, n_jobs]) Propose a new set of GMM means.
propose(X, gmm, target, n_jobs=1)

Propose a new set of GMM means.

Parameters:

X : 2-D array like of shape (n_samples, n_features)

The observed data or evidence.

gmm : GMM object

The current state (set of gmm parameters) in the Markov Chain

target : GMMPosteriorTarget object

The target distribution to be found.

n_jobs : int

Number of cpu cores to use in the calculation of log probabilities.

Returns:

: GMM

A new GMM object initialised with new mean parameters.

Generic Proposals

class gmmmc.proposals.proposals.GMMBlockMetropolisProposal(propose_mean=None, propose_covars=None, propose_weights=None, propose_iterations=1)

Bases: gmmmc.proposals.proposals.Proposal

Methods

get_acceptance() Calculate and return the acceptance rate of the proposal.
get_illegal() Calculate and return the illegal proposal rate of this proposal.
propose(X, gmm, target[, n_jobs]) Propose a new set of gmm parameters.
propose(X, gmm, target, n_jobs=1)

Propose a new set of gmm parameters. Calls each proposal function one after another.

Parameters:

X : 2-D array_like of shape (n_samples, n_features)

Feature vectors

gmm : GMM object

Current GMM parameters in the markov chain

target : GMMPosteriorTarget object

Target distribution

n_jobs : int

Number of cpus to use. -1 to use all available cores.

Returns:

: GMM Object

The next state in the Markov Chain.

class gmmmc.proposals.proposals.Proposal

Bases: object

Methods

get_acceptance() Calculate and return the acceptance rate of the proposal.
get_illegal() Calculate and return the illegal proposal rate of this proposal.
propose(X, gmm, target[, n_jobs])
Parameters:
get_acceptance()

Calculate and return the acceptance rate of the proposal.

Returns:

: double

The acceptance rate of the proposal function

get_illegal()

Calculate and return the illegal proposal rate of this proposal. (Proposing values outside the support of the parameter space e.g covariances < 0)

Returns:

: double

The illegal proposal rate of the proposal function

propose(X, gmm, target, n_jobs=1)
Parameters:

X : 2-D array_like

Observed data or evidence.

gmm : GMM object

target

n_jobs

Priors

class gmmmc.priors.prior.CovarsStaticPrior(prior_covars)

Bases: gmmmc.priors.prior.GMMParameterPrior

Methods

log_prob(covariances) Log probability of a covariance given we know what its true value ‘should’ be.
log_prob_single(covariance, mixture_num) Log probability of a covariance matrix of a single mixture given we know what its true value should be.
sample()
log_prob(covariances)

Log probability of a covariance given we know what its true value ‘should’ be.

Parameters:

covariances : covariance matrices of GMM distribution

Returns:

: double

0 if covariances are identical to their true values, -inf otherwise.

log_prob_single(covariance, mixture_num)

Log probability of a covariance matrix of a single mixture given we know what its true value should be.

Parameters:

covariance : 1-D array_like of shape (n_features)

covariance matrix for a specific mixture in GMM

mixture_num : int

Index of mixture in GMM

Returns:

: double

0 if covariance is identical to its true value, -inf otherwise

sample()
class gmmmc.priors.prior.DiagCovarsUniformPrior(low, high, n_mixtures, n_features)

Bases: gmmmc.priors.prior.GMMParameterPrior

Methods

log_prob(covars) Compute the log prior probability of the covariances of a GMM.
log_prob_single(covar, mixture_num) Compute the log probability of the covariances for a specific mixture.
sample() Draw a sample for the diagonals of a covariance matrix assuming a uniform prior over each individual element.
log_prob(covars)

Compute the log prior probability of the covariances of a GMM. Since this will be used for monte carlo simulations, we care only that it is proportional to the true probability. For a uniform distribution we can use any value.

Parameters:

covars : 2-D array_like, of shape (n_mixtures, n_features)

covariance vectors for the GMM

Returns:

: double

Proportional to the log probability of the covariance of the GMM. 0.0 if the means are within the bounds of the uniform prior, -inf otherwise.eans

log_prob_single(covar, mixture_num)

Compute the log probability of the covariances for a specific mixture.

Parameters:

covar : 1-D array_like of length n_features

Single diagonal covariance from a single mixture of the GMM.

mixture_num : int

Index of the mixture for the covariance matrix.

Returns:

double

Proportional to the log prior probability for the covariance. 0.0 if the means are wimeansthin the bounds of the uniform prior, -inf otherwise.

sample()

Draw a sample for the diagonals of a covariance matrix assuming a uniform prior over each individual element.

Returns:

: 2-D array_like of shape (n_mixtures, n_features)

array of diagonal covariances for a GMM.

class gmmmc.priors.prior.DiagCovarsWishartPrior(df, scale_matrices)

Bases: gmmmc.priors.prior.GMMParameterPrior

Methods

log_prob(covars) Compute the log prior probability of the covariances of a GMM.
log_prob_single(covar, mixture_num) Compute the log probability of the means for a specific mixture.
sample() Draw a sample from the inverse wishart prior distribution.
log_prob(covars)

Compute the log prior probability of the covariances of a GMM. Since this will be used for monte carlo simulations, we care only that it is proportional to the true probability.

Parameters:

covars : 2-D array_like, of shape (n_mixtures, n_features)

covariance vectors for the GMM

Returns:

: double

Proportional to the log probability of the covariance of the GMM. w.r.t an inverse wishart distribution

log_prob_single(covar, mixture_num)

Compute the log probability of the means for a specific mixture.

Parameters:

mean : 1-D array_like of length n_features

Single mean vector from a single mixture of the GMM.

mixture_num : int

Index of the mixture for the mean.

Returns:

: double

Proportional to the log prior probability for means

sample()

Draw a sample from the inverse wishart prior distribution.

Returns:

: 2-D array_like of shape (n_mixtures, n_features)

Return a complete set of mean vectors for a GMM

class gmmmc.priors.prior.GMMParameterPrior

Methods

log_prob(params) Calculate the log prior probability.
log_prob_single(param, mixture_num) Compute log probability of a single set of parameters (mean/covariance/weight) vector
log_prob(params)

Calculate the log prior probability.

Parameters:

params : array_like of varying shape

parameters for a GMM e.g means weights covariances

Returns:

: double

log prior probability of the parameters

log_prob_single(param, mixture_num)

Compute log probability of a single set of parameters (mean/covariance/weight) vector

Parameters:

param : 1-D array_like vector of parameters for a single mixture

mixture_num : Mixture index for the parameters.

Returns:

: double

log prior probability of parameters

class gmmmc.priors.prior.GMMPrior(means_prior, covars_prior, weights_prior)

Methods

log_prob(gmm)
Parameters:
sample() Compute a sample from the prior distribution of the GMM’s parameters.
log_prob(gmm)
Parameters:

gmm : GMM

object containing parameters

Returns:

: double

log prior probability of the parameters of the input GMM

sample()

Compute a sample from the prior distribution of the GMM’s parameters.

Returns:

: GMM

GMM object containing the sampled parameters

class gmmmc.priors.prior.MeansGaussianPrior(prior_means, covariances)

Bases: gmmmc.priors.prior.GMMParameterPrior

Methods

log_prob(means) Compute the log prior probability of the means of a GMM according to Gaussian priors.
log_prob_single(mean, mixture_num) Compute the log probability of the means for a specific mixture.
sample() Draw a sample from the Gaussian prior distributions of the mean vectors.
log_prob(means)

Compute the log prior probability of the means of a GMM according to Gaussian priors.

Parameters:

means : 2-D array_like, of shape (n_mixtures, n_features)

mean vectors for the GMM

Returns:

: double

Proportional to the log probability of the means of the GMM

log_prob_single(mean, mixture_num)

Compute the log probability of the means for a specific mixture.

Parameters:

mean : 1-D array_like of length n_features

Single mean vector from a single mixture of the GMM.

mixture_num : int

Index of the mixture for the mean.

Returns:

: double

Proportional to the log prior probability for means

sample()

Draw a sample from the Gaussian prior distributions of the mean vectors.

Returns:

: 2-D array_like of shape (n_mixtures, n_features)

Return a complete set of mean vectors for a GMM

class gmmmc.priors.prior.MeansUniformPrior(low, high, n_mixtures, n_features)

Bases: gmmmc.priors.prior.GMMParameterPrior

Methods

log_prob(means) Compute the log prior probability of the means of a GMM.
log_prob_single(mean, mixture) Compute the log probability of the means for a specific mixture.
sample()
log_prob(means)

Compute the log prior probability of the means of a GMM. Since this will be used for monte carlo simulations, we care only that it is proportional to the true probability. For a uniform distribution we can use any value.

Parameters:

means : 2-D array_like, of shape (n_mixtures, n_features)

mean vectors for the GMM

Returns:

: double

Proportional to the log probability of the means of the GMM. 0.0 if the means are within the bounds of the uniform prior, -inf otherwise.

log_prob_single(mean, mixture)

Compute the log probability of the means for a specific mixture.

Parameters:

mean : 1-D array_like of length n_features

Single mean vector from a single mixture of the GMM.

mixture_num : int

Index of the mixture for the mean.

Returns:

: double

Proportional to the log prior probability for means 0.0 if the means are within the bounds of the uniform prior, -inf otherwise.

sample()
class gmmmc.priors.prior.WeightsDirichletPrior(alpha)

Bases: gmmmc.priors.prior.GMMParameterPrior

Methods

log_prob(weights) Calculate log probability of weight vector according to dirichlet prior.
log_prob_single(weights, mixture_num) Identical to log_prob
sample() Sample from dirichlet distribution.
log_prob(weights)

Calculate log probability of weight vector according to dirichlet prior.

Parameters:

weights : 1-D array_like of shape (n_mixtures)

Returns:

: double

log probability under distribution

log_prob_single(weights, mixture_num)

Identical to log_prob

Parameters:

weights : 1-D array_like of shape (n_mixtures)

mixture_num : unused

Returns:

: double

log probability under distribution

sample()

Sample from dirichlet distribution.

Returns:

: 1-D array like of shape (n_mixtures)

Sample weights from dirichlet distribution.

class gmmmc.priors.prior.WeightsStaticPrior(prior_weights)

Bases: gmmmc.priors.prior.GMMParameterPrior

Methods

log_prob(weights) Returns a log probability assuming weights have a true fixed value.
log_prob_single(weights, mixture_num) Functionally the same as log_prob
sample() Sample true weight vector
log_prob(weights)

Returns a log probability assuming weights have a true fixed value.

Parameters:

weights : 1-D array_like of shape (n_mixtures)

Weight vector for mixture. Must lie on the normal simplex.

Returns:

: double

0 if weights are close to true values, -inf otherwise

log_prob_single(weights, mixture_num)

Functionally the same as log_prob

Parameters:

weights : 1-D array_like of shape (n_mixtures)

Weight vector for mixture. Must lie on the normal simplex.

mixture_num : int

Not used.

Returns:

: double

0 if weights are close to true values, -inf otherwise

sample()

Sample true weight vector

Returns:

: 1-D array_like with shape (n_mixtures)

true weights.

class gmmmc.priors.prior.WeightsUniformPrior(n_mixtures)

Bases: gmmmc.priors.prior.GMMParameterPrior

Methods

log_prob(weights) Returns a log probability according to uniform dirichlet prior.
log_prob_single(weights, mixture_num) Functionally the same as log_prob
sample() Draw sample from dirichlet distribution
log_prob(weights)

Returns a log probability according to uniform dirichlet prior.

Parameters:

weights : 1-D array_like of shape (n_mixtures)

Weight vector for mixture. Must lie on the normal simplex.

Returns:

: double

log probability under uniform prior.

log_prob_single(weights, mixture_num)

Functionally the same as log_prob

Parameters:

weights : 1-D array_like of shape (n_mixtures)

Weight vector for mixture. Must lie on the normal simplex.

mixture_num : not used in this context

Returns:

: double

log probability under uniform prior.

sample()

Draw sample from dirichlet distribution

Returns:

: 1-D array_like of shape (n_mixtures)

Set of weight parameters for a GMM.

GMM Representation

class gmmmc.gmm.GMM(means, covariances, weights)

Attributes

covars
means
weights

Methods

log_likelihood(X[, n_jobs]) Calculate the average log likelihood of the data given the GMM parameters
sample(n_samples) Sample from the GMM.
covars
log_likelihood(X, n_jobs=1)

Calculate the average log likelihood of the data given the GMM parameters

Parameters:

X : 2-D array_like of shape (n_samples, n_features)

Data to be used.

n_jobs : int

Number of CPU cores to use in the calculation

Returns:

: float

average log likelihood of the data given the GMM parameters

Notes

For GMMs with small numbers of mixtures (<10) the use of more than 1 core can slow down the function.

means
sample(n_samples)

Sample from the GMM.

Parameters:

n_samples : int

Number of samples to draw.

Returns:

: 2-D array_like of shape (n_samples, n_features)

Samples drawn from the GMM distribution

weights

Monte Carlo Algorithms

class gmmmc.monte_carlo.AnnealedImportanceSampling(proposal, priors, betas)

Bases: gmmmc.monte_carlo.MonteCarloBase

Methods

anneal(X, n_jobs[, diagnostics]) A single annealing run from AIS.
sample(X, n_samples[, n_jobs, diagnostics]) Generate samples from the posterior distribution of the parameters.
anneal(X, n_jobs, diagnostics=None)

A single annealing run from AIS.

Parameters:

X : 2-D array_like of shape (n_feature_vectors, n_features)

Feature vectors used as the data for the underlying model.

n_jobs : int

Number of cpu cores to utilise during the Monte Carlo simulation.

diagnostics : empty dictionary, optional

If included, the dictionary passed to the function will contain diagnostic information from each annealing run of AIS. TODO: expand on this

Returns:

: tuple (GMM Object, double)

A single GMM sample from AIS and its corresponding weight.

sample(X, n_samples, n_jobs=1, diagnostics=None)

Generate samples from the posterior distribution of the parameters.

Parameters:

X : 2-D array_like of shape (n_feature_vectors, n_features)

Feature vectors used as the data for the underlying model.

n_samples : int

Number of Monte Carlo samples to be drawn from the posterior distribution.

n_jobs : int

Number of cpu cores to utilise during the Monte Carlo simulation.

diagnostics : empty dictionary, optional

If included, the dictionary passed to the function will contain diagnostic information from each annealing run of AIS. TODO: expand on this

Returns:

: list of tuples (GMM, double)

A list of GMM samples with their corresponding weight.

class gmmmc.monte_carlo.MarkovChain(proposal, prior, initial_gmm)

Bases: gmmmc.monte_carlo.MonteCarloBase

Methods

sample(X, n_samples[, n_jobs]) Sample from the posterior distribution of the GMM parameters
sample(X, n_samples, n_jobs=1)

Sample from the posterior distribution of the GMM parameters

Parameters:

X : 2-D array_like of shape (n_feature_vectors, n_features)

Feature vectors used as the data for the underlying model.

n_samples : int

Number of Monte Carlo samples to be drawn from the posterior distribution.

n_jobs : int

Number of cpu cores to utilise during the Monte Carlo simulation.

Returns:

samples : List of GMM Objects

Returns a list of GMMs (GMM parameters) which are the samples drawn from the distribution.

Notes

It is beneficial to discard a number of samples from the beginning of the chain (burn-in) as well as to use only every nth sample (lag) to reduce the correlation between succesive samples of the posterior.

class gmmmc.monte_carlo.MonteCarloBase

Bases: object

Methods

sample(X, n_samples)
sample(X, n_samples)

Target Distributions

class gmmmc.posterior.GMMPosteriorTarget(prior, beta=1)

Posterior distribution (targets distribution)

Methods

log_prob(X, gmm, n_jobs)
Parameters:
log_prob(X, gmm, n_jobs)
Parameters:

X : 2-D array_like of shape (n_samples, n_features)

Feature vectors

gmm : GMM object

GMM parameters for the calculation of the prior probability

n_jobs : int

Number of cores to use in the calculation.

Returns : double

log probability of the posterior up to a constant factor.

——-