ICS

Module containing the main Invariant Coordinate Selection (ICS) Class and associated methods.

The ICS class provides methods to fit the ICS model from the data, transform data using the model, and provide a detailed summary of the results. This module relies on scatter matrices (defined in the Scatter page). The ICS class supports three different algorithms for applying ICS to data: (‘standard’, ‘whiten’, and ‘QR’), which can be specified as parameters during instantiation. Additional options such as the choice of scatter matrices, centering the data, and fixing the signs can also be defined.

This implementation is based on the function ICS-S3 from the R package ICS. For more details about the algorithms ‘standard’, ‘whiten’ and ‘QR’, as well as the ‘fix_signs’ argument, see the R package documentation (function ICS-S3).

class icspylab.ics.ICS(S1='cov', S2='cov4', algorithm='eigh', center=False, fix_signs='scores', S1_args=None, S2_args=None, method_select=None, select_args=None)[source]

Bases: TransformerMixin, BaseEstimator

Invariant Coordinate Selection (ICS) Class and associated methods.

This class implements the ICS algorithm: it transforms the data, via the simultaneous diagonalization of two scatter matrices, into an invariant coordinate system or independent components, depending on the underlying assumptions. It supports various scatter matrix calculations and offers multiple algorithms for applying ICS.

Parameters:
  • S1 (callable or str, default='cov') – First scatter estimator. If a string is provided, it must be one of the predefined scatter estimators (see the “Available scatter estimators” section below). Otherwise, it must be a callable returning a Scatter object.

  • S2 (callable or str, default='cov4') – Second scatter estimator. If a string is provided, it must be one of the predefined scatter estimators (see the “Available scatter estimators” section below). Otherwise, it must be a callable returning a Scatter object.

  • algorithm ({'eigh', 'standard', 'whiten', 'QR'}, default='eigh') – The algorithm used for computing the invariant coordinates.

  • center (bool, default=False) – A logical indicating whether the invariant coordinates should be centered with respect to the first locattion or not. Centering is only applicable if the first scatter object contains a location component, otherwise this is set to False. Note that this only affects the scores of the invariant components (attribute self.scores_), but not the generalized kurtosis values (attribute self.kurtosis_).

  • fix_signs ({'scores', 'W'}, default='scores') How to fix the signs of the invariant coordinates. Possible values are 'scores' to fix the signs based on (generalized)

  • S1_args (dict or None, default=None) – Additional arguments for S1.

  • S2_args (dict or None, default=None) – Additional arguments for S2.

  • method_select ({'median', 'normal', 'unimodal'} or callable or None, default=None) – The criteria to select the invariant components. If None (default), all components are kept. If a string is provided, it must be either “median” to use the median eigenvalue criterion, “normal” to apply normality tests to the components, or “unimodal” to apply unimodality tests to the components. If callable, it must return a ComponentSelect object. For more information, refer to icspylab.comp_select.

  • select_args (dict or None, default=None) – Additional arguments for method_select.

components_

Invariant axes in feature space: the transformation matrix in which each row contains the coefficients of the linear transformation to the corresponding invariant coordinate. The components are sorted by decreasing kurtosis.

Type:

ndarray

n_components_

Number of components kept.

Type:

int

component_names_

Names of components kept.

Type:

list

kurtosis_

Generalized kurtosis values.

Type:

ndarray

skewness_

Skewness values.

Type:

ndarray

n_features_in_

Number of features seen during fit.

Type:

int

feature_names_in_

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:

ndarray

S1_X_

Fitted scatter S1. Defined only when center=True.

Type:

ndarray

criteria_out_

Summary of the component selection step. Defined only when method_select is not None.

Type:

dict or None

Available scatter estimators are (see icspylab.scatter for a full description of the available scatters):
  • 'cov': classical covariance matrix and mean

  • 'cov4': fourth-moment estimator

  • 'covAxis': one-step Tyler shape estimator

  • 'covW': one-step M-estimator using mean and covariance matrix as starting point

  • 'mcd': Minimum Covariance Determinant

  • 'tM': location and scatter for a multivariate t-distribution

  • 'tcov': one-step pairwise M-estimator

  • 'tcovAxis': one-step pairwise M-estimator with the same weights as covAxis

Supported algorithms:
  1. eigh: performs directly the simultaneous diagonalization of the two scatter matrices using scipy.linalg’s function eigh(\(S_2(X)\), \(S_1(X)\))

  2. standard: performs the spectral decomposition of the symmetric matrix \(S_1(X)^{-1/2}S_2(X)S_1(X)^{-1/2}\)

  3. whiten: whitens the data with respect to the first scatter matrix before computing the second scatter matrix.

  4. QR: numerically stable algorithm based on the QR algorithm for a common family of scatter pairs: if S1 is cov(), and if S2 is one of cov4, covW, or covAxis. See Archimbaud et al. (2023) for details.

References

  • Tyler, D.E., Critchley, F., Dumbgen, L. and Oja, H. (2009) Invariant Co-ordinate Selection. Journal of the Royal Statistical Society, Series B, 71(3), 549–592. doi:10.1111/j.14679868.2009.00706.x.

  • Nordhausen, K., Oja, H., & Tyler, D. E. (2008). Tools for exploring multivariate data: The package ICS. Journal of Statistical Software, 28, 1-31.

  • For algorithm = ‘QR’, refer to Archimbaud, A., Drmac, Z., Nordhausen, K., Radojcic, U. and Ruiz-Gazen, A. (2023) Numerical Considerations and a New Implementation for Invariant Coordinate Selection. SIAM Journal on Mathematics of Data Science, 5(1), 97–121. doi:10.1137/22M1498759.

Example

>>> from sklearn.datasets import load_iris
>>> from icspylab import ICS
>>> iris = load_iris()
>>> X = iris.data
>>> ics = ICS()
>>> ics.fit(X)
>>> print(ics.kurtosis_)
[1.20739878 1.0269412  0.9292235  0.74046722]
describe()[source]

Print a summary of the ICS model.

This includes the algorithm used, whether data was centered, how signs were fixed; and displays the generalized kurtosis, transformation matrix, transformed data, and the skewness of the data.

fit(X, y=None)[source]

Fit the ICS model to the data.

This function relies on several helper methods to perform the ICS fit: _compute_first_scatter, _compute_second_scatter, _transform_second_scatter, _compute_transformation, _compute_transformation_qr, _fix_component_signs.

Parameters:
  • X (array-like) – Data to fit the ICS model, where rows are samples and columns are features.

  • y (Ignored) – Not used, present for API consistency by convention.

Returns:

The fitted ICS object.

Return type:

self

fit_transform(X, y=None)[source]

Fit the ICS model and transform the data using the fitted ICS model.

Parameters:
  • X (array-like) – Data to fit and transform.

  • y (Ignored) – Not used, present for API consistency by convention.

Returns:

Transformed matrix in which columns contain the scores of the selected invariant coordinates.

Return type:

ndarray

get_feature_names_out(input_features=None)[source]
inverse_transform(X)[source]

Transform data back to its original space.

In other words, return an X_original whose transform would be X.

Parameters:

X (array-like) – Transformed data, where n_samples is the number of samples and n_components is the number of components.

Returns:

Original data, where n_samples is the number of samples and n_features is the number of features.

Return type:

ndarray

plot_kurtosis(**kwargs)[source]

Plot the generated kurtosis.

transform(X, y=None)[source]

Transform the data using the fitted ICS model.

This function relies on the helper method _center_data.

Parameters:
  • X (array-like) – Data to transform.

  • y (Ignored) – Not used, present for API consistency by convention.

Returns:

Transformed matrix in which columns contain the scores of the selected invariant coordinates.

Return type:

ndarray