TransformerMixin

class pyoptex.analysis.mixins.fit_mixin.TransformerMixin(factors=(), Y2X=<function identityY2X>, random_effects=())[source]

Base mixin for all transformers. This mixin extends the transformer mixin from sklearn. To create your own transformer, do

>>> class MyTransformer(TransformerMixin):
>>>     def _fit(self, X, y):
>>>         # Your fit code
>>>         pass
>>>
>>>     def _apply_transform(self, X, y):
>>>         # Your transform code to transform X and y
>>>         return X, y

You should implement two functions: the _fit function which fits the transformer to the data (given the encoded and normalized X, and normalized y), and the _apply_transform function which applies the transformation to the data.

Any attributes suffixed by _ is only accessible after fitting.

Note

Transformers should be able to handle both OLS and mixed models, or raise an error otherwise. Use fit_fn_ attribute to fit a model given some terms and data. It automatically accounts for OLS vs. mixed model.

Note

If you require access to the attributes factors, re or Y2X, use the underscored versions _factors, _re and _Y2X. As sklearn does not permit to adapt these factors directly, they may be adapted during fitting.

Attributes

factorslist(Factor): A list of factors to be used during fitting. It contains the categorical encoding, continuous normalization, etc.
Y2Xfunc(Y): The function to transform a design matrix Y to a model matrix X.
random_effectslist(str): The names of any random effect columns. Every random effect is interpreted as a string column and encoded using effect encoding.
n_features_in_int: The number of features. Equals len(self._factors).
features_names_in_list(str): The names of the features.
n_encoded_features_int: The number of encoded features. Is the result of Y2X(Y).shape[1].
effect_types_np.array(1d): An array indicating the type of each factor (effect). A 1 indicates a continuous variable, anything higher indicates a categorical factor with that many levels. Can be used for internal package functions such as encode_model.
coords_list: A list of 2d numpy arrays. Each element corresponds to the possible encodings of a factor. Retrieved using factor.coords_ property.
y_mean_float: The mean y-value, used in normalization.
y_std_float: The standard deviation of the y-value, used in normalization.
fit_fn_func(X, y, terms): A fit function used to fit a model from data and the specified terms. When random effects are specified, this fits a mixed model, otherwise an OLS is fitted.
Zs_np.array(2d): The groups of each random effect. Zs.shape[0] == len(self._re) and Zs.shape[1] == len(X). For example, if the first row is [0, 0, 1, 1], then the first two runs are in group 0 according to the first random effect, and the last two runs are in group 1.
is_fitted_bool: Whether the transformer has been fitted.

__init__(factors=(), Y2X=<function identityY2X>, random_effects=())[source]

Creates the regressor

Parameters

factorslist(Factor): A list of factors to be used during fitting. It contains the categorical encoding, continuous normalization, etc.
Y2Xfunc(Y): The function to transform a design matrix Y to a model matrix X.
random_effectslist(str): The names of any random effect columns. Every random effect is interpreted as a string column and encoded using effect encoding.

Methods

`TransformerMixin.fit`(X, y)	Fits the data.
`TransformerMixin.fit_transform`(X, y)	Fit the transformer to the data and apply the transformation.
`TransformerMixin.preprocess_fit`(X, y)	Preprocesses before fitting the data.
`TransformerMixin.set_output`(*[, transform])	Set output container.
`TransformerMixin.transform`(X, y)	Apply the transformation to the data.

Attributes

TransformerMixin.is_fitted

Checks whether the regressor has been fitted.