RegressionMixin

class pyoptex.analysis.mixins.fit_mixin.RegressionMixin(factors=(), Y2X=<function identityY2X>, random_effects=())[source]

Base mixin for all regressors. This mixin extends the regressor mixin from sklearn. To create your own regressor, do

>>> class MyRegressor(RegressionMixin):
>>>     def _fit(self, X, y):
>>>         # Your fit code
>>>         pass
>>> 
>>>     def _predict(self, X):
>>>         # Optional, if you require a custom prediction
>>>         # Defaults to
>>>         return np.sum(X[:, self.terms_] * np.expand_dims(self.coef_, 0), axis=1)     >>>                       * self.y_std_ + self.y_mean_

One function should be implemented: the _fit function which fits your model based on the encoded and normalized X, and normalized y. It should set the parameters specified below. Inside the _fit function, you have access to the attributes specified below.

Optionally, you can implement your own prediction function, however, when setting the coefficients and terms correctly, this should not be necessary. The _predict function receives a normalized and encoded X.

Any attributes suffixed by _ is only accessible after fitting.

Note

Regressor should be able to handle both OLS and mixed models, or raise an error otherwise. Use fit_fn_ attribute to fit a model given some terms and data. It automatically accounts for OLS vs. mixed model.

Note

If you require access to the attributes factors, re or Y2X, use the underscored versions _factors, _re and _Y2X. As sklearn does not permit to adapt these factors directly, they may be adapted during fitting.

Parameters

terms_np.array(1d)

The indices of the terms (= columns in X) in the model.

coef_np.array(1d)

An array of coefficients corresponding to the terms.

scale_float

The scale (= variance of the fit).

vcomp_float

The estimates of any presented variance components.

fit_optional

The result of calling

>>> fit_fn_(X, y, self.terms_)

if applicable. If not specified, summary is unavailable.

Attributes

factorslist(Factor)

A list of factors to be used during fitting. It contains the categorical encoding, continuous normalization, etc.

Y2Xfunc(Y)

The function to transform a design matrix Y to a model matrix X.

random_effectslist(str)

The names of any random effect columns. Every random effect is interpreted as a string column and encoded using effect encoding.

n_features_in_int

The number of features. Equals len(self._factors).

features_names_in_list(str)

The names of the features.

n_encoded_features_int

The number of encoded features. Is the result of Y2X(Y).shape[1].

effect_types_np.array(1d)

An array indicating the type of each factor (effect). A 1 indicates a continuous variable, anything higher indicates a categorical factor with that many levels. Can be used for internal package functions such as encode_model.

coords_numba.typed.List

A list of 2d numpy arrays. Each element corresponds to the possible encodings of a factor. Retrieved using factor.coords_ property.

y_mean_float

The mean y-value, used in normalization.

y_std_float

The standard deviation of the y-value, used in normalization.

fit_fn_func(X, y, terms)

A fit function used to fit a model from data and the specified terms. When random effects are specified, this fits a mixed model, otherwise an OLS is fitted.

Zs_np.array(2d)

The groups of each random effect. Zs.shape[0] == len(self._re) and Zs.shape[1] == len(X). For example, if the first row is [0, 0, 1, 1], then the first two runs are in group 0 according to the first random effect, and the last two runs are in group 1.

is_fitted_bool

Whether the regressor has been fitted.

__init__(factors=(), Y2X=<function identityY2X>, random_effects=())[source]

Creates the regressor

Parameters

factorslist(Factor)

A list of factors to be used during fitting. It contains the categorical encoding, continuous normalization, etc.

Y2Xfunc(Y)

The function to transform a design matrix Y to a model matrix X.

random_effectslist(str)

The names of any random effect columns. Every random effect is interpreted as a string column and encoded using effect encoding.

Methods

RegressionMixin.fit(X, y)

Fits the data.

RegressionMixin.formula([labels])

Creates the prediction formula of the fit for the encoded and normalized data.

RegressionMixin.model_formula(model)

Creates the prediction formula of the fit for the encoded and normalized data.

RegressionMixin.pred_var(X)

Prediction variances for the new values specified in X.

RegressionMixin.predict(X)

Predict on new data after fitting.

RegressionMixin.preprocess_fit(X, y)

Preprocesses before fitting the data.

RegressionMixin.preprocess_predict(X)

Preprocessing the incoming data before prediction.

RegressionMixin.score(X, y[, sample_weight])

Return the coefficient of determination of the prediction.

RegressionMixin.summary()

Generates a summary of the fit in case it was stored during training in the fit_ attribute.

Attributes

RegressionMixin.M_

Alias for information_matrix

RegressionMixin.Minv_

Alias for inv_information_matrix

RegressionMixin.V_

Alias for obs_cov

RegressionMixin.Vinv_

Alias for inv_obs_cov

RegressionMixin.information_matrix

The information matrix of the fitted data.

RegressionMixin.inv_information_matrix

The inverse of the information matrix.

RegressionMixin.inv_obs_cov

The inverse of the observation covariance matrix.

RegressionMixin.is_fitted

Checks whether the regressor has been fitted.

RegressionMixin.obs_cov

The observation covariance matrix \(V = var(Y)\).

RegressionMixin.total_var

The total variance on the normalized y-values.