Distributions

Module containing functions to generate some distributions.

icspylab.distributions.generate_gaussian_mixture(eps, mu, sigma, n, p)[source]

Generates a Gaussian Mixture Model (GMM) with the given parameters.

Parameters:
  • eps (list of float) – Proportions of points assigned to each cluster (must sum to 1).

  • mu (list of np.ndarray) – List of mean vectors (centroids) for each cluster (size k).

  • sigma (list of np.ndarray) – List of covariance matrices (size k).

  • n (int) – Total number of data points to generate.

  • p (int) – Dimension of the data, including noise.

Returns:

A tuple containing:

data_with_noise (ndarray): Matrix (n, p) of generated data points. labels (ndarray): Array of cluster labels (size n).

Return type:

tuple

Example

>>> eps = [0.5, 0.5]
>>> mu = [np.ones(2), np.ones(2)*10]
>>> sigma = [np.eye(2) for _ in range(2)]
>>> X, labels = generate_gaussian_mixture(eps, mu, sigma, n=1000, p=6)
icspylab.distributions.generate_powerexp_mixture(eps, mu, sigma, beta, n, p)[source]

Generates a mixture of multivariate power exponential distribution (PEM) with the given parameters.

Parameters:
  • eps (list of float) – Proportions of points assigned to each cluster (must sum to 1).

  • mu (list of np.ndarray) – List of mean vectors (centroids) for each cluster (size k).

  • sigma (list of np.ndarray) – List of covariance matrices (size k).

  • beta (float or list of float) – Shape parameters (size k if list).

  • n (int) – Total number of data points to generate.

  • p (int) – Dimension of the data, including noise.

Returns:

A tuple containing:

data_with_noise (ndarray): Matrix (n, p) of generated data points. labels (ndarray): Array of cluster labels (size n).

Return type:

tuple

Example

>>> eps = [0.5, 0.5]
>>> mu = [np.ones(2), np.ones(2)*10]
>>> sigma = [np.eye(2) for _ in range(2)]
>>> X, labels = generate_powerexp_mixture(eps, mu, sigma, beta=0.8, n=1000, p=6)
icspylab.distributions.generate_randu(n=400, seed=1)[source]

Generate a synthetic dataset based on the classical RANDU pseudo-random number generator.

RANDU is an obsolete linear congruential generator that is widely used as a benchmark example of poor randomness properties.

The implementation follows the standard definition described in the R datasets package manual.

Parameters:
  • n (int, default=400) – Number of data points to generate.

  • seed (int, default=1) – Seed of the generator.

Returns:

ndarray (n, 3)

References

Example

>>> from icspylab.distributions import generate_randu
>>> X = generate_randu(n=100)
>>> print(X.shape)
(100, 3)
icspylab.distributions.generate_student_mixture(eps, mu, sigma, df, n, p)[source]

Generates a Student-t Mixture Model (SMM) with the given parameters.

Parameters:
  • eps (list of float) – Proportions of points assigned to each cluster (must sum to 1).

  • mu (list of ndarray) – List of mean vectors (centroids) for each cluster (size k).

  • sigma (list of ndarray) – List of covariance matrices (size k).

  • df (int or list of int) – Degrees of freedom (size k if list). Must be strictly positive integers.

  • n (int) – Total number of data points to generate.

  • p (int) – Dimension of the data, including noise.

Returns:

A tuple containing:

data_with_noise (ndarray): Matrix (n, p) of generated data points. labels (ndarray): Array of cluster labels (size n).

Return type:

tuple

Example

>>> eps = [0.5, 0.5]
>>> mu = [np.ones(2), np.ones(2)*10]
>>> sigma = [np.eye(2) for _ in range(2)]
>>> X, labels = generate_student_mixture(eps, mu, sigma, df=2, n=1000, p=6)
icspylab.distributions.multivariate_powerexp(n, scatter, location=None, beta=1)[source]

Generate n observations from a multivariate power exponential distribution.

Parameters:
  • n (int) – Number of observations.

  • scatter (array-like) – Symmetric positive definite scatter matrix (p x p).

  • location (array-like) – Mean vector of dimension p.

  • beta (float) – Shape parameter (> 0). beta = 1 corresponds to the multivariate normal distribution, beta < 1 corresponds to heavier tails.

Returns:

ndarray (n, p)

References

  • Oja, H. (2010), Multivariate Nonparametric Methods with R, Springer.

  • Nordhausen, K., & Oja, H. (2011). Multivariate L1 statistical methods: The package MNM. Journal of Statistical Software, 43, 1-28.

Example

>>> from icspylab.distributions import multivariate_powerexp
>>> X = multivariate_powerexp(n=100, scatter=np.eye(3), beta=4)
>>> print(X.shape)
(100, 3)
icspylab.distributions.unifsphere(n, p)[source]

Generate n vectors uniformly distributed on the unit sphere in dimension p.

Parameters:
  • n (int) – Number of observations.

  • p (int) – Dimension of the sphere.

Returns:

ndarray (n, p)

Example

>>> from icspylab.distributions import unifsphere
>>> X = unifsphere(n=100, p=2)
>>> print(X.shape)

(100, 2)