statsmodels.imputation.bayes_mi.MI

class statsmodels.imputation.bayes_mi.MI(imp, model, model_args_fn=None, model_kwds_fn=None, formula=None, fit_args=None, fit_kwds=None, xfunc=None, burn=100, nrep=20, skip=10)[source]

MI performs multiple imputation using a provided imputer object.

Parameters:
  • imp (object) – An imputer class, such as BayesGaussMI.

  • model (model class) – Any statsmodels model class.

  • model_args_fn (function) – A function taking an imputed dataset as input and returning endog, exog. If the model is fit using a formula, returns a DataFrame used to build the model. Optional when a formula is used.

  • model_kwds_fn (function, optional) – A function taking an imputed dataset as input and returning a dictionary of model keyword arguments.

  • formula (str, optional) – If provided, the model is constructed using the from_formula class method, otherwise the __init__ method is used.

  • fit_args (list-like, optional) – List of arguments to be passed to the fit method

  • fit_kwds (dict-like, optional) – Keyword arguments to be passed to the fit method

  • xfunc (function mapping ndarray to ndarray) – A function that is applied to the complete data matrix prior to fitting the model

  • burn (int) – Number of burn-in iterations

  • nrep (int) – Number of imputed data sets to use in the analysis

  • skip (int) – Number of Gibbs iterations to skip between successive multiple imputation fits.

Notes

The imputer object must have an ‘update’ method, and a ‘data’ attribute that contains the current imputed dataset.

xfunc can be used to introduce domain constraints, e.g. when imputing binary data the imputed continuous values can be rounded to 0/1.

Methods

fit([results_cb])

Impute datasets, fit models, and pool results.