ICS
Module containing the main Invariant Coordinate Selection (ICS) Class and associated methods.
The ICS class provides methods to fit the ICS model from the data, transform data using the model, and provide a detailed summary of the results. This module relies on scatter matrices (defined in the Scatter page). The ICS class supports three different algorithms for applying ICS to data: (‘standard’, ‘whiten’, and ‘QR’), which can be specified as parameters during instantiation. Additional options such as the choice of scatter matrices, centering the data, and fixing the signs can also be defined.
This implementation is based on the function ICS-S3 from the R package ICS. For more details about the algorithms ‘standard’, ‘whiten’ and ‘QR’, as well as the ‘fix_signs’ argument, see the R package documentation (function ICS-S3).
- class icspylab.ics.ICS(S1='cov', S2='cov4', algorithm='eigh', center=False, fix_signs='scores', S1_args=None, S2_args=None, method_select=None, select_args=None)[source]
Bases:
TransformerMixin,BaseEstimatorInvariant Coordinate Selection (ICS) Class and associated methods.
This class implements the ICS algorithm: it transforms the data, via the simultaneous diagonalization of two scatter matrices, into an invariant coordinate system or independent components, depending on the underlying assumptions. It supports various scatter matrix calculations and offers multiple algorithms for applying ICS.
- Parameters:
S1 (callable or str, default='cov') – First scatter estimator. If a string is provided, it must be one of the predefined scatter estimators (see the “Available scatter estimators” section below). Otherwise, it must be a callable returning a Scatter object.
S2 (callable or str, default='cov4') – Second scatter estimator. If a string is provided, it must be one of the predefined scatter estimators (see the “Available scatter estimators” section below). Otherwise, it must be a callable returning a Scatter object.
algorithm ({'eigh', 'standard', 'whiten', 'QR'}, default='eigh') – The algorithm used for computing the invariant coordinates.
center (bool, default=False) – A logical indicating whether the invariant coordinates should be centered with respect to the first locattion or not. Centering is only applicable if the first scatter object contains a location component, otherwise this is set to False. Note that this only affects the scores of the invariant components (attribute self.scores_), but not the generalized kurtosis values (attribute self.kurtosis_).
fix_signs ({'scores', 'W'}, default='scores') How to fix the signs of the invariant coordinates. Possible values are 'scores' to fix the signs based on (generalized)
S1_args (dict or None, default=None) – Additional arguments for S1.
S2_args (dict or None, default=None) – Additional arguments for S2.
method_select ({'median', 'normal', 'unimodal'} or callable or None, default=None) – The criteria to select the invariant components. If None (default), all components are kept. If a string is provided, it must be either “median” to use the median eigenvalue criterion, “normal” to apply normality tests to the components, or “unimodal” to apply unimodality tests to the components. If callable, it must return a ComponentSelect object. For more information, refer to
icspylab.comp_select.select_args (dict or None, default=None) – Additional arguments for method_select.
- components_
Invariant axes in feature space: the transformation matrix in which each row contains the coefficients of the linear transformation to the corresponding invariant coordinate. The components are sorted by decreasing kurtosis.
- Type:
ndarray
- n_components_
Number of components kept.
- Type:
int
- component_names_
Names of components kept.
- Type:
list
- kurtosis_
Generalized kurtosis values.
- Type:
ndarray
- skewness_
Skewness values.
- Type:
ndarray
- n_features_in_
Number of features seen during fit.
- Type:
int
- feature_names_in_
Names of features seen during fit. Defined only when X has feature names that are all strings.
- Type:
ndarray
- S1_X_
Fitted scatter S1. Defined only when center=True.
- Type:
ndarray
- criteria_out_
Summary of the component selection step. Defined only when method_select is not None.
- Type:
dict or None
- Available scatter estimators are (see
icspylab.scatterfor a full description of the available scatters): 'cov': classical covariance matrix and mean'cov4': fourth-moment estimator'covAxis': one-step Tyler shape estimator'covW': one-step M-estimator using mean and covariance matrix as starting point'mcd': Minimum Covariance Determinant'tM': location and scatter for a multivariate t-distribution'tcov': one-step pairwise M-estimator'tcovAxis': one-step pairwise M-estimator with the same weights as covAxis
- Supported algorithms:
eigh: performs directly the simultaneous diagonalization of the two scatter matrices using scipy.linalg’s function eigh(\(S_2(X)\), \(S_1(X)\))
standard: performs the spectral decomposition of the symmetric matrix \(S_1(X)^{-1/2}S_2(X)S_1(X)^{-1/2}\)
whiten: whitens the data with respect to the first scatter matrix before computing the second scatter matrix.
QR: numerically stable algorithm based on the QR algorithm for a common family of scatter pairs: if S1 is cov(), and if S2 is one of cov4, covW, or covAxis. See Archimbaud et al. (2023) for details.
References
Tyler, D.E., Critchley, F., Dumbgen, L. and Oja, H. (2009) Invariant Co-ordinate Selection. Journal of the Royal Statistical Society, Series B, 71(3), 549–592. doi:10.1111/j.14679868.2009.00706.x.
Nordhausen, K., Oja, H., & Tyler, D. E. (2008). Tools for exploring multivariate data: The package ICS. Journal of Statistical Software, 28, 1-31.
For algorithm = ‘QR’, refer to Archimbaud, A., Drmac, Z., Nordhausen, K., Radojcic, U. and Ruiz-Gazen, A. (2023) Numerical Considerations and a New Implementation for Invariant Coordinate Selection. SIAM Journal on Mathematics of Data Science, 5(1), 97–121. doi:10.1137/22M1498759.
Example
>>> from sklearn.datasets import load_iris >>> from icspylab import ICS >>> iris = load_iris() >>> X = iris.data >>> ics = ICS() >>> ics.fit(X) >>> print(ics.kurtosis_) [1.20739878 1.0269412 0.9292235 0.74046722]
- describe()[source]
Print a summary of the ICS model.
This includes the algorithm used, whether data was centered, how signs were fixed; and displays the generalized kurtosis, transformation matrix, transformed data, and the skewness of the data.
- fit(X, y=None)[source]
Fit the ICS model to the data.
This function relies on several helper methods to perform the ICS fit: _compute_first_scatter, _compute_second_scatter, _transform_second_scatter, _compute_transformation, _compute_transformation_qr, _fix_component_signs.
- Parameters:
X (array-like) – Data to fit the ICS model, where rows are samples and columns are features.
y (Ignored) – Not used, present for API consistency by convention.
- Returns:
The fitted ICS object.
- Return type:
self
- fit_transform(X, y=None)[source]
Fit the ICS model and transform the data using the fitted ICS model.
- Parameters:
X (array-like) – Data to fit and transform.
y (Ignored) – Not used, present for API consistency by convention.
- Returns:
Transformed matrix in which columns contain the scores of the selected invariant coordinates.
- Return type:
ndarray
- inverse_transform(X)[source]
Transform data back to its original space.
In other words, return an X_original whose transform would be X.
- Parameters:
X (array-like) – Transformed data, where n_samples is the number of samples and n_components is the number of components.
- Returns:
Original data, where n_samples is the number of samples and n_features is the number of features.
- Return type:
ndarray
- transform(X, y=None)[source]
Transform the data using the fitted ICS model.
This function relies on the helper method _center_data.
- Parameters:
X (array-like) – Data to transform.
y (Ignored) – Not used, present for API consistency by convention.
- Returns:
Transformed matrix in which columns contain the scores of the selected invariant coordinates.
- Return type:
ndarray