Testing
Comparison with R
This section explains the logic behind testing the ICSpyLab package to ensure it matches the functionality of the original R package.
The testing approach involves the following steps:
Setup: Install necessary dependencies, including pytest and rpy2 for interfacing with R.
Data Loading: Load datasets such as iris, wine, and diabetes using scikit-learn.
Running ICS: Perform the ICS (Invariant Coordinate Selection) algorithm in both R (using rpy2) and Python.
Comparison: Compare the results from the R implementation and the Python implementation.
Fixtures and Parameters
To streamline the testing process, fixtures are used to load data and run the ICS algorithm in both R and Python. Parameters for the ICS algorithm, such as covariance estimators and transformation settings, are defined and tested across different datasets and configurations.
Main Testing Files
The testing logic is organized into the following main files:
Initialization file for the fixtures package. This makes the fixtures directory a Python package and imports fixtures for easy access.
- tests.fixtures.load_data()[source]
Fixture to load different datasets.
This fixture provides a function to load datasets by their name. It ensures that the loaded dataset does not contain any missing values.
- Returns:
A function that takes a dataset name and returns the dataset (X, y).
- Return type:
function
- Raises:
ValueError – If the dataset contains missing values.
- tests.fixtures.run_py_ics()[source]
Fixture to perform ICS in Python.
This fixture provides a function to run the ICS algorithm using the Python implementation. It creates an ICS object, fits and transforms the data, and returns the results.
- Parameters:
X (np.ndarray) – The input data matrix.
S1 (function, optional) – The first scatter matrix function. Default is cov.
S2 (function, optional) – The second scatter matrix function. Default is covW.
algorithm (str, optional) – The algorithm to use. Default is ‘whiten’.
center (bool, optional) – Whether to center the data. Default is False.
fix_signs (str, optional) – Method to fix signs. Default is ‘scores’.
S1_args (dict, optional) – Additional arguments for S1. Default is {}.
S2_args (dict, optional) – Additional arguments for S2. Default is {}.
- Returns:
- A dictionary with the results, including the transformation matrix,
generalized kurtosis, skewness, and transformed data.
- Return type:
dict
Validation
The results with algorithms ‘standard’ and ‘whiten’ from the Python implementation are validated against the R implementation by comparing:
Transformation matrices
Kurtosis values
Skewness values (if available)
Transformed data
This comparison ensures that the Python package produces results consistent with the R package.
Unit tests
Other tests include:
Initialization tests
Error Handling Tests
Test with a large dataset: 10000 rows and 10 columns
Consistency of ‘standard’ and ‘whiten’ algorithm
Specific tests for QR algorithm (pending)
For more details and to view the full testing code, please refer to the tests directory in the source repository.