MultiRegressionMixin

class pyoptex.analysis.mixins.fit_mixin.MultiRegressionMixin(factors=(), Y2X=<function identityY2X>, random_effects=())[source]

Base mixin for all regressors which output multiple models during the model selection. This mixin extends RegressionMixin, which extends the regression mixin from sklearn. To create your own regressor, do

>>> class MyMultiRegressor(MultiRegressionMixin):
>>>     def _fit(self, X, y):
>>>         # Your fit code
>>>         pass
>>>
>>>     def _predict(self, X):
>>>         # Optional, if you require a custom prediction
>>>         # Defaults to
>>>         return np.sum(X[:, self.terms_] * np.expand_dims(self.coef_, 0), axis=1)     >>>                       * self.y_std_ + self.y_mean_

One function should be implemented: the _fit function which fits your model based on the encoded and normalized X, and normalized y. It should set the parameters specified below. Inside the _fit function, you have access to the attributes specified below.

Optionally, you can implement your own prediction function, however, when setting the coefficients and terms correctly, this should not be necessary. The _predict function receives a normalized and encoded X.

Any attributes suffixed by _ is only accessible after fitting.

Note

Contains the same attributes as RegressionMixin.

Note

Prediction happens based on the top model (is the first model in models_). To predict based on any other model, fit that specific model using SimpleRegressor.

Assume it is based on a model (in a pandas dataframe) and you fitted a multi-regression model multi_regr:

>>> model = ...
>>> multi_regr = ...
>>>
>>> terms = multi_regr.models_[1]
>>> new_model = model.iloc[terms]
>>> Y2X = model2Y2X(new_model, factors)
>>>
>>> regr = SimpleRegressor(factors, Y2X, random_effects).fit(X, y)

Parameters

models_list(np.array(1d))

The list of models, sorted by the selection_metrics_ (highest metric first). Each model is an integer array specifying the selected terms.

model_coef_list(np.array(1d))

The coefficients of the models_.

model_scale_np.array(1d)

The scale of the models_.

model_vcomp_np.array(2d)

The variance components of the models_.

selection_metrics_np.array(1d)

The metric of each model, sorted highest first. The selection metric defines the order in which the models should be analyzed.

metric_name_str

The name of the selection metric.

__init__(factors=(), Y2X=<function identityY2X>, random_effects=())[source]

Creates the regressor

Parameters

factorslist(Factor)

A list of factors to be used during fitting. It contains the categorical encoding, continuous normalization, etc.

Y2Xfunc(Y)

The function to transform a design matrix Y to a model matrix X.

random_effectslist(str)

The names of any random effect columns. Every random effect is interpreted as a string column and encoded using effect encoding.

Methods

MultiRegressionMixin.fit(X, y)

Fits the data.

MultiRegressionMixin.formula([labels, idx])

Creates the prediction formula of the fit for the encoded and normalized data.

MultiRegressionMixin.model_formula(model[, idx])

Creates the prediction formula of the fit for the encoded and normalized data.

MultiRegressionMixin.plot_selection([ntop])

Creates a selection plot to visually display how the top performing models were selected and ordered.

MultiRegressionMixin.pred_var(X)

Prediction variances for the new values specified in X.

MultiRegressionMixin.predict(X)

Predict on new data after fitting.

MultiRegressionMixin.preprocess_fit(X, y)

Preprocesses before fitting the data.

MultiRegressionMixin.preprocess_predict(X)

Preprocessing the incoming data before prediction.

MultiRegressionMixin.score(X, y[, sample_weight])

Return coefficient of determination on test data.

MultiRegressionMixin.summary()

Generates a summary of the fit in case it was stored during training in the fit_ attribute.

Attributes

MultiRegressionMixin.M_

Alias for information_matrix

MultiRegressionMixin.Minv_

Alias for inv_information_matrix

MultiRegressionMixin.V_

Alias for obs_cov

MultiRegressionMixin.Vinv_

Alias for inv_obs_cov

MultiRegressionMixin.information_matrix

The information matrix of the fitted data.

MultiRegressionMixin.inv_information_matrix

The inverse of the information matrix.

MultiRegressionMixin.inv_obs_cov

The inverse of the observation covariance matrix.

MultiRegressionMixin.is_fitted

Checks whether the regressor has been fitted.

MultiRegressionMixin.obs_cov

The observation covariance matrix \(V = var(Y)\).

MultiRegressionMixin.total_var

The total variance on the normalized y-values.