entropies_approx

pyoptex.analysis.estimators.sams.entropy.entropies_approx(submodels, freqs, model_size, dep, mode, forced=None, N=10000, sampler=<function sample_model_dep_onebyone>, eps=1e-06)[source]

Compute the approximate entropy by sampling N random models and observing the frequency of each submodel.

The entropy is computed as

where \(f_{o}\) is the observed frequency of the submodel in the SAMS procedure and \(f_{t}\) is the theoretical frequency when sampling at random. A higher entropy indicates more “surprise” and therefore more likely to be the correct model.

Parameters

submodelslist(np.array(1d))

The list of top submodels for each size.

freqsnp.array(1d)

The frequencies of these submodels in the raster plot.

model_sizeint

The size of the overfitted models. The overfitted model includes the forced model, and its size must thus be larger than the forced model.

depnp.array(2d)

The dependency matrix of size (N, N) with N the number of terms in the encoded model (output from Y2X). Term i depends on term j if dep(i, j) = true.

modeNone or ‘weak’ or ‘strong’

The heredity mode during sampling.

forcedNone or np.array(1d)

Any terms that must be included in the model.

Nint

The number of random samples to draw to compute the theoretical frequency of a submodel.

samplerfunc(dep, model_size, N, forced, mode)

The sampler to use when generating random hereditary models.

epsfloat

A numerical stability parameter in computing the entropy.

Returns

entropynp.array(1d)

An array of floats of the same length as the submodels.