SamsBnB

class pyoptex.analysis.estimators.sams.bnb.sams_bnb.SamsBnB(model_size, models, nterms, mode=None, dependencies=None, forced_model=None)[source]

Runs the BnB algorithm for SAMS automated model selection.

Attributes

model_sizeint

The size of the overfitted models.

modelsnp.array(2d)

The returned results from the SAMS simulation. A numpy array with a special datatype where each element contains two arrays of size model_size (‘model’, np.int64), (‘coeff’, np.float64), and one scalar (‘metric’, np.float64).

ntermsint

The total number of fixed effects in the encoded, normalized model matrix (=X.shape[1] after encoding and normalization). No element in models should be larger than or equal to this value.

modeNone or ‘weak’ or ‘strong’

The heredity mode during sampling.

dependenciesnp.array(2d)

The dependency matrix of size (N, N) with N the number of terms in the encoded model (output from Y2X). Term i depends on term j if dep(i, j) = true.

forced_modelnp.array(1d)

The terms which were forced to be in the simulation models as an integer array. Often the intercept.

killnp.array(1d)

A boolean array of which terms should not be investigated as they cannot be in the top performing models. Updated during the algorithm.

spmscipy.sparse.csc_array

A sparse boolean matrix of the models. Has dimensions (models.shape[0], nterms).

__init__(model_size, models, nterms, mode=None, dependencies=None, forced_model=None)[source]

Initializes the branch-and-bound object.

Parameters

model_sizeint

The size of the overfitted models.

modelsnp.array(2d)

The returned results from the SAMS simulation. A numpy array with a special datatype where each element contains two arrays of size model_size (‘model’, np.int64), (‘coeff’, np.float64), and one scalar (‘metric’, np.float64).

ntermsint

The total number of fixed effects in the encoded, normalized model matrix (=X.shape[1] after encoding and normalization). No element in models should be larger than or equal to this value.

modeNone or ‘weak’ or ‘strong’

The heredity mode during sampling.

dependenciesnp.array(2d)

The dependency matrix of size (N, N) with N the number of terms in the encoded model (output from Y2X). Term i depends on term j if dep(i, j) = true.

forced_modelNone or np.array(1d)

The terms which were forced to be in the simulation models as an integer array. Often the intercept.

Methods

SamsBnB.branches(node)

Generates branches by adding possible where permitted.

SamsBnB.init_queue(top_results, top_scores)

Initializes the branches queue, starting from the forced model and yielding all possible one-term extensions.

SamsBnB.initialize(nfit)

Initializes the results using a greedy search.

SamsBnB.leaf(node)

Checks whether the model is of full size.

SamsBnB.loop(top_results, top_scores)

Loops through the branch-and-bound algorithm keeping topn results.

SamsBnB.node_in_results(node, results)

Check whether the model is already in the results.

SamsBnB.postloop(top_results, top_scores)

Callback to run after the branch-and-bound algorithm has run.

SamsBnB.postnew(old, new, top)

Kills any terms which do not occur frequently enough.

SamsBnB.preloop(top_results, top_scores)

Kills any terms which do not occur frequently enough.

SamsBnB.prenew(old, new, top)

Function defining what to do after finding a new optimal node and before adding it to the top.

SamsBnB.top(nfit)

Returns the top nfit results using the branch-and-bound algorithm.

SamsBnB.upperbound(node)

Compute the upperbound on the amount of times this submodel occurs in the set (=frequency of this submodel).