Scatter
Module containing scatter matrix calculations and the Scatter class.
This module provides various functions to compute scatter matrices, which are essential for the ICS algorithm. The scatter matrices implemented include the covariance matrix, weighted covariance matrix, and the one-step Tyler shape matrix. These scatter matrices are encapsulated in the Scatter class, which includes information about the location (mean) and a label describing the type of scatter matrix. If you want to use ICS with other scatter matrices than the ones provided in this module, you would need to create Scatter object. The S1 and S2 arguments are functions returning Scatter objects.
Most scatters come from the R package ICS.
- class icspylab.scatter.Scatter(location, scatter, label)[source]
Bases:
objectA class to represent the scatter matrix and its related data.
- location
The mean location of the data.
- Type:
np.ndarray
- scatter
The scatter matrix.
- Type:
np.ndarray
- label
A label describing the scatter matrix.
- Type:
str
- icspylab.scatter.cov(X, location=True)[source]
Compute the covariance matrix.
- Parameters:
X (numpy.ndarray) – The data matrix.
location (bool) – (default: True) Whether to include the mean location.
- Returns:
An object containing the location and scatter matrix.
- Return type:
- icspylab.scatter.cov4(X, location=True)[source]
Compute a custom weighted covariance matrix (cov4) which internally uses covW with alpha=1 and cf=(1 / (p + 2)).
- Parameters:
X (numpy.ndarray) – The data matrix.
location (bool) – (default: True) Whether to include the mean location.
- Returns:
An object containing the location and custom weighted scatter matrix.
- Return type:
- icspylab.scatter.covAxis(X, location=True)[source]
Compute the one-step Tyler shape matrix which internally uses covW with alpha=-1 and cf=p.
- Parameters:
X (numpy.ndarray) – The data matrix.
location (bool) – (default: True) Whether to include the mean location.
- Returns:
An object containing the location and scatter matrix.
- Return type:
- icspylab.scatter.covW(X, location=True, alpha=1, cf=1)[source]
Estimates the scatter matrix based on one-step M-estimator using mean and covariance matrix as starting point. For more details, check the R documentation of the package ICS (function covW).
- Parameters:
X (numpy.ndarray) – The data matrix.
location (bool) – (default: True) Whether to include the mean location.
alpha (float) – (default: 1) Parameter of the one-step M-estimator.
cf (float) – (default: 1) Consistency factor of the one-step M-estimator.
- Returns:
An object containing the location and weighted scatter matrix.
- Return type:
- Details:
It is given for a \(n\) x \(p\) matrix \(X\) by: \(CovW(X) = (1/n) cf \sum_{i=1}^{n} w(D^2(x_i)) (x_i - \overline{x})^T (x_i - \overline{x})\)
- where:
\(n\) is the number of observations,
\(x_i\) is the i-th observation vector,
\(\overline{x}\) is the mean vector of all observations,
\(w(d)= d^{α}\) is a non-negative and continuous weight function applied to the squared Mahalanobis distance \(D^2(x_i)\).
\(cf\) is a consistency factor
References
Tyler, D.E., Critchley, F., Dümbgen, L., and Oja, H. (2009), Invariant coordinate selection, Journal of the Royal Statistical Society, Series B, 71, 549-592. <https://doi.org/10.1111/j.1467-9868.2009.00706.x>.