entropies_approx

pyoptex.analysis.estimators.sams.entropy.entropies_approx(submodels, freqs, model_size, dep, mode, forced=None, N=10000, sampler=<function sample_model_dep_onebyone>, eps=1e-06)[source]

Compute the approximate entropy by sampling N random models and observing the frequency of each submodel.

The entropy is computed as

where \(f_{o}\) is the observed frequency of the submodel in the SAMS procedure and \(f_{t}\) is the theoretical frequency when sampling at random. A higher entropy indicates more “surprise” and therefore more likely to be the correct model.

Parameters

submodelslist(np.array(1d)): The list of top submodels for each size.
freqsnp.array(1d): The frequencies of these submodels in the raster plot.
model_sizeint: The size of the overfitted models. The overfitted model includes the forced model, and its size must thus be larger than the forced model.
depnp.array(2d): The dependency matrix of size (N, N) with N the number of terms in the encoded model (output from Y2X). Term i depends on term j if dep(i, j) = true.
modeNone or ‘weak’ or ‘strong’: The heredity mode during sampling.
forcedNone or np.array(1d): Any terms that must be included in the model.
Nint: The number of random samples to draw to compute the theoretical frequency of a submodel.
samplerfunc(dep, model_size, N, forced, mode): The sampler to use when generating random hereditary models.
epsfloat: A numerical stability parameter in computing the entropy.

Returns

entropynp.array(1d): An array of floats of the same length as the submodels.