Distributions
Module containing functions to generate some distributions.
- icspylab.distributions.generate_gaussian_mixture(eps, mu, sigma, n, p)[source]
Generates a Gaussian Mixture Model (GMM) with the given parameters.
- Parameters:
eps (list of float) – Proportions of points assigned to each cluster (must sum to 1).
mu (list of np.ndarray) – List of mean vectors (centroids) for each cluster (size k).
sigma (list of np.ndarray) – List of covariance matrices (size k).
n (int) – Total number of data points to generate.
p (int) – Dimension of the data, including noise.
- Returns:
- A tuple containing:
data_with_noise (ndarray): Matrix (n, p) of generated data points. labels (ndarray): Array of cluster labels (size n).
- Return type:
tuple
Example
>>> eps = [0.5, 0.5] >>> mu = [np.ones(2), np.ones(2)*10] >>> sigma = [np.eye(2) for _ in range(2)] >>> X, labels = generate_gaussian_mixture(eps, mu, sigma, n=1000, p=6)
- icspylab.distributions.generate_powerexp_mixture(eps, mu, sigma, beta, n, p)[source]
Generates a mixture of multivariate power exponential distribution (PEM) with the given parameters.
- Parameters:
eps (list of float) – Proportions of points assigned to each cluster (must sum to 1).
mu (list of np.ndarray) – List of mean vectors (centroids) for each cluster (size k).
sigma (list of np.ndarray) – List of covariance matrices (size k).
beta (float or list of float) – Shape parameters (size k if list).
n (int) – Total number of data points to generate.
p (int) – Dimension of the data, including noise.
- Returns:
- A tuple containing:
data_with_noise (ndarray): Matrix (n, p) of generated data points. labels (ndarray): Array of cluster labels (size n).
- Return type:
tuple
Example
>>> eps = [0.5, 0.5] >>> mu = [np.ones(2), np.ones(2)*10] >>> sigma = [np.eye(2) for _ in range(2)] >>> X, labels = generate_powerexp_mixture(eps, mu, sigma, beta=0.8, n=1000, p=6)
- icspylab.distributions.generate_randu(n=400, seed=1)[source]
Generate a synthetic dataset based on the classical RANDU pseudo-random number generator.
RANDU is an obsolete linear congruential generator that is widely used as a benchmark example of poor randomness properties.
The implementation follows the standard definition described in the R datasets package manual.
- Parameters:
n (int, default=400) – Number of data points to generate.
seed (int, default=1) – Seed of the generator.
- Returns:
ndarray (n, 3)
References
Fortran Language Reference Manual (1999), Compaq.
R Core Team (datasets package), https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/randu.html
Example
>>> from icspylab.distributions import generate_randu >>> X = generate_randu(n=100) >>> print(X.shape) (100, 3)
- icspylab.distributions.generate_student_mixture(eps, mu, sigma, df, n, p)[source]
Generates a Student-t Mixture Model (SMM) with the given parameters.
- Parameters:
eps (list of float) – Proportions of points assigned to each cluster (must sum to 1).
mu (list of ndarray) – List of mean vectors (centroids) for each cluster (size k).
sigma (list of ndarray) – List of covariance matrices (size k).
df (int or list of int) – Degrees of freedom (size k if list). Must be strictly positive integers.
n (int) – Total number of data points to generate.
p (int) – Dimension of the data, including noise.
- Returns:
- A tuple containing:
data_with_noise (ndarray): Matrix (n, p) of generated data points. labels (ndarray): Array of cluster labels (size n).
- Return type:
tuple
Example
>>> eps = [0.5, 0.5] >>> mu = [np.ones(2), np.ones(2)*10] >>> sigma = [np.eye(2) for _ in range(2)] >>> X, labels = generate_student_mixture(eps, mu, sigma, df=2, n=1000, p=6)
- icspylab.distributions.multivariate_powerexp(n, scatter, location=None, beta=1)[source]
Generate n observations from a multivariate power exponential distribution.
- Parameters:
n (int) – Number of observations.
scatter (array-like) – Symmetric positive definite scatter matrix (p x p).
location (array-like) – Mean vector of dimension p.
beta (float) – Shape parameter (> 0). beta = 1 corresponds to the multivariate normal distribution, beta < 1 corresponds to heavier tails.
- Returns:
ndarray (n, p)
References
Oja, H. (2010), Multivariate Nonparametric Methods with R, Springer.
Nordhausen, K., & Oja, H. (2011). Multivariate L1 statistical methods: The package MNM. Journal of Statistical Software, 43, 1-28.
Example
>>> from icspylab.distributions import multivariate_powerexp >>> X = multivariate_powerexp(n=100, scatter=np.eye(3), beta=4) >>> print(X.shape) (100, 3)
- icspylab.distributions.unifsphere(n, p)[source]
Generate n vectors uniformly distributed on the unit sphere in dimension p.
- Parameters:
n (int) – Number of observations.
p (int) – Dimension of the sphere.
- Returns:
ndarray (n, p)
Example
>>> from icspylab.distributions import unifsphere >>> X = unifsphere(n=100, p=2) >>> print(X.shape)
(100, 2)