MultiRegressionMixin
- class pyoptex.analysis.mixins.fit_mixin.MultiRegressionMixin(factors=(), Y2X=<function identityY2X>, random_effects=())[source]
Base mixin for all regressors which output multiple models during the model selection. This mixin extends
RegressionMixin, which extends the regression mixin from sklearn. To create your own regressor, do>>> class MyMultiRegressor(MultiRegressionMixin): >>> def _fit(self, X, y): >>> # Your fit code >>> pass >>> >>> def _predict(self, X): >>> # Optional, if you require a custom prediction >>> # Defaults to >>> return np.sum(X[:, self.terms_] * np.expand_dims(self.coef_, 0), axis=1) >>> * self.y_std_ + self.y_mean_
One function should be implemented: the _fit function which fits your model based on the encoded and normalized X, and normalized y. It should set the parameters specified below. Inside the _fit function, you have access to the attributes specified below.
Optionally, you can implement your own prediction function, however, when setting the coefficients and terms correctly, this should not be necessary. The _predict function receives a normalized and encoded X.
Any attributes suffixed by _ is only accessible after fitting.
Note
Contains the same attributes as
RegressionMixin.Note
Prediction happens based on the top model (is the first model in models_). To predict based on any other model, fit that specific model using
SimpleRegressor.Assume it is based on a model (in a pandas dataframe) and you fitted a multi-regression model multi_regr:
>>> model = ... >>> multi_regr = ... >>> >>> terms = multi_regr.models_[1] >>> new_model = model.iloc[terms] >>> Y2X = model2Y2X(new_model, factors) >>> >>> regr = SimpleRegressor(factors, Y2X, random_effects).fit(X, y)
Parameters
- models_list(np.array(1d))
The list of models, sorted by the selection_metrics_ (highest metric first). Each model is an integer array specifying the selected terms.
- model_coef_list(np.array(1d))
The coefficients of the models_.
- model_scale_np.array(1d)
The scale of the models_.
- model_vcomp_np.array(2d)
The variance components of the models_.
- selection_metrics_np.array(1d)
The metric of each model, sorted highest first. The selection metric defines the order in which the models should be analyzed.
- metric_name_str
The name of the selection metric.
- __init__(factors=(), Y2X=<function identityY2X>, random_effects=())[source]
Creates the regressor
Parameters
- factorslist(
Factor) A list of factors to be used during fitting. It contains the categorical encoding, continuous normalization, etc.
- Y2Xfunc(Y)
The function to transform a design matrix Y to a model matrix X.
- random_effectslist(str)
The names of any random effect columns. Every random effect is interpreted as a string column and encoded using effect encoding.
Methods
MultiRegressionMixin.fit(X, y)Fits the data.
MultiRegressionMixin.formula([labels, idx])Creates the prediction formula of the fit for the encoded and normalized data.
MultiRegressionMixin.model_formula(model[, idx])Creates the prediction formula of the fit for the encoded and normalized data.
Creates a selection plot to visually display how the top performing models were selected and ordered.
Prediction variances for the new values specified in X.
Predict on new data after fitting.
Preprocesses before fitting the data.
Preprocessing the incoming data before prediction.
MultiRegressionMixin.score(X, y[, sample_weight])Return coefficient of determination on test data.
Generates a summary of the fit in case it was stored during training in the fit_ attribute.
Attributes
Alias for
information_matrixAlias for
inv_information_matrixAlias for
obs_covAlias for
inv_obs_covThe information matrix of the fitted data.
The inverse of the information matrix.
The inverse of the observation covariance matrix.
Checks whether the regressor has been fitted.
The observation covariance matrix \(V = var(Y)\).
The total variance on the normalized y-values.
- factorslist(