Bayesian Information Criterion

Assumptions

  1. The approximation is only valid for sample size much larger than the number of parameters in the model.
  2. The BIC cannot handle complex collections of models as in the variable selection problem in high-dimension.

Definition

The BIC is formally defined as

where

  • is the maximized value of the likelihood function of the model , i.e. with is the parameter value that maximizes the Likelihood Function.
  • is the observed data.
  • is the number of data points in , the number of observations.
  • is the number of parameters estimated by the model.

Remarks

  • BIC and AIC penalties: BIC is similar to AIC, but with a different penalty for the number of parameters. With AIC this penalty is , whereas with BIC the penalty is .