RegressionMixin

class pyoptex.analysis.mixins.fit_mixin.RegressionMixin(factors=(), Y2X=<function identityY2X>, random_effects=())[source]

Base mixin for all regressors. This mixin extends the regressor mixin from sklearn. To create your own regressor, do

>>> class MyRegressor(RegressionMixin):
>>>     def _fit(self, X, y):
>>>         # Your fit code
>>>         pass
>>> 
>>>     def _predict(self, X):
>>>         # Optional, if you require a custom prediction
>>>         # Defaults to
>>>         return np.sum(X[:, self.terms_] * np.expand_dims(self.coef_, 0), axis=1)     >>>                       * self.y_std_ + self.y_mean_

One function should be implemented: the _fit function which fits your model based on the encoded and normalized X, and normalized y. It should set the parameters specified below. Inside the _fit function, you have access to the attributes specified below.

Optionally, you can implement your own prediction function, however, when setting the coefficients and terms correctly, this should not be necessary. The _predict function receives a normalized and encoded X.

Any attributes suffixed by _ is only accessible after fitting.

Note

Regressor should be able to handle both OLS and mixed models, or raise an error otherwise. Use fit_fn_ attribute to fit a model given some terms and data. It automatically accounts for OLS vs. mixed model.

Note

If you require access to the attributes factors, re or Y2X, use the underscored versions _factors, _re and _Y2X. As sklearn does not permit to adapt these factors directly, they may be adapted during fitting.

Parameters

terms_np.array(1d)

The indices of the terms (= columns in X) in the model.

coef_np.array(1d)

An array of coefficients corresponding to the terms.

scale_float

The scale (= variance of the fit).

vcomp_float

The estimates of any presented variance components.

fit_optional

The result of calling

>>> fit_fn_(X, y, self.terms_)

if applicable. If not specified, summary is unavailable.

Attributes

factorslist(Factor): A list of factors to be used during fitting. It contains the categorical encoding, continuous normalization, etc.
Y2Xfunc(Y): The function to transform a design matrix Y to a model matrix X.
random_effectslist(str): The names of any random effect columns. Every random effect is interpreted as a string column and encoded using effect encoding.
n_features_in_int: The number of features. Equals len(self._factors).
features_names_in_list(str): The names of the features.
n_encoded_features_int: The number of encoded features. Is the result of Y2X(Y).shape[1].
effect_types_np.array(1d): An array indicating the type of each factor (effect). A 1 indicates a continuous variable, anything higher indicates a categorical factor with that many levels. Can be used for internal package functions such as encode_model.
coords_numba.typed.List: A list of 2d numpy arrays. Each element corresponds to the possible encodings of a factor. Retrieved using factor.coords_ property.
y_mean_float: The mean y-value, used in normalization.
y_std_float: The standard deviation of the y-value, used in normalization.
fit_fn_func(X, y, terms): A fit function used to fit a model from data and the specified terms. When random effects are specified, this fits a mixed model, otherwise an OLS is fitted.
Zs_np.array(2d): The groups of each random effect. Zs.shape[0] == len(self._re) and Zs.shape[1] == len(X). For example, if the first row is [0, 0, 1, 1], then the first two runs are in group 0 according to the first random effect, and the last two runs are in group 1.
is_fitted_bool: Whether the regressor has been fitted.

__init__(factors=(), Y2X=<function identityY2X>, random_effects=())[source]

Creates the regressor

Parameters

factorslist(Factor): A list of factors to be used during fitting. It contains the categorical encoding, continuous normalization, etc.
Y2Xfunc(Y): The function to transform a design matrix Y to a model matrix X.
random_effectslist(str): The names of any random effect columns. Every random effect is interpreted as a string column and encoded using effect encoding.

Methods

`RegressionMixin.fit`(X, y)	Fits the data.
`RegressionMixin.formula`([labels])	Creates the prediction formula of the fit for the encoded and normalized data.
`RegressionMixin.model_formula`(model)	Creates the prediction formula of the fit for the encoded and normalized data.
`RegressionMixin.pred_var`(X)	Prediction variances for the new values specified in X.
`RegressionMixin.predict`(X)	Predict on new data after fitting.
`RegressionMixin.preprocess_fit`(X, y)	Preprocesses before fitting the data.
`RegressionMixin.preprocess_predict`(X)	Preprocessing the incoming data before prediction.
`RegressionMixin.score`(X, y[, sample_weight])	Return the coefficient of determination of the prediction.
`RegressionMixin.summary`()	Generates a summary of the fit in case it was stored during training in the fit_ attribute.

Attributes

`RegressionMixin.M_`	Alias for `information_matrix`
`RegressionMixin.Minv_`	Alias for `inv_information_matrix`
`RegressionMixin.V_`	Alias for `obs_cov`
`RegressionMixin.Vinv_`	Alias for `inv_obs_cov`
`RegressionMixin.information_matrix`	The information matrix of the fitted data.
`RegressionMixin.inv_information_matrix`	The inverse of the information matrix.
`RegressionMixin.inv_obs_cov`	The inverse of the observation covariance matrix.
`RegressionMixin.is_fitted`	Checks whether the regressor has been fitted.
`RegressionMixin.obs_cov`	The observation covariance matrix \(V = var(Y)\).
`RegressionMixin.total_var`	The total variance on the normalized y-values.