hidden pixel

Model Selection Information

Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection.

Contents

Introduction

The scientific observation cycle.

In its most basic forms, model selection is one of the fundamental tasks of scientific inquiry. Determining the principle that explains a series of observations is often linked directly to a mathematical model predicting those observations. For example, when Galileo performed his inclined plane experiments, he demonstrated that the motion of the balls fitted the parabola predicted by his model.

Of the countless number of possible mechanisms and processes that could have produced the data, how can one even begin to choose the best model? The mathematical approach commonly taken decides among a set of candidate models; this set must be chosen by the researcher. Often simple models such as polynomials are used, at least initially. Burnham and Anderson (2002) emphasize the importance of choosing models based on sound scientific principles, modeling the underlying data throughout their book.

Once the set of candidate models has been chosen, the mathematical analysis allows us to select the best of these models. What is meant by best is controversial. A good model selection technique will balance goodness of fit with simplicity. More complex models will be better able to adapt their shape to fit the data (for example, a fifth-order polynomial can exactly fit six points), but the additional parameters may not represent anything useful. (Perhaps those six points are really just randomly distributed about a straight line.) Goodness of fit is generally determined using a likelihood ratio approach, or an approximation of this, leading to a chi-squared test. The complexity is generally measured by counting the number of parameters in the model.

Model selection techniques can be considered as estimators of some physical quantity, such as the probability of the model producing the given data. The bias and variance are both important measures of the quality of this estimator.

Asymptotic efficiency is also often considered.

A standard example of model selection is that of curve fitting, where, given a set of points and other background knowledge (e.g. points are a result of i.i.d. samples), we must select a curve that describes the function that generated the points.

Methods for choosing the set of candidate models

Experiments for choosing the set of candidate models

Criteria for model selection

This article is in a list format that may be better presented using prose. You can help by converting this article to prose, if appropriate. Editing help is available. (February 2012)

See also

References

Statistics
Descriptive statistics
Continuous data
Location
Dispersion
Shape
Count data
Summary tables
Dependence
Statistical graphics
Data collection
Designing studies
Survey methodology
Controlled experiment
Uncontrolled studies
Statistical inference
Statistical theory
Bayesian inference
Frequentist inference
Specific tests
General estimation
Correlation and regression analysis
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical, multivariate, time-series, or survival analysis
Categorical data
Multivariate statistics
Time series analysis
General
Time domain
Frequency domain
Survival analysis
Applications
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Least squares and regression analysis
Computational statistics Least squares · Linear least squares · Non-linear least squares · Iteratively reweighted least squares
Correlation and dependence Pearson product-moment correlation · Rank correlation (Spearman's rho, Kendall's tau) · Partial correlation · Confounding variable
Regression analysis Ordinary least squares · Partial least squares · Total least squares · Ridge regression
Regression as a statistical model
Linear regression Simple linear regression · Ordinary least squares · Generalized least squares · Weighted least squares · General linear model
Predictor structure Polynomial regression · Growth curve · Segmented regression · Local regression
Non-standard Nonlinear regression · Nonparametric · Semiparametric · Robust · Quantile · Isotonic
Non-normal errors Generalized linear model · Binomial · Poisson · Logistic
Decomposition of variance Analysis of variance · Analysis of covariance · Multivariate AOV
Model exploration Mallows' Cp · Stepwise regression · Model selection · Regression model validation
Background Mean and predicted response · Gauss–Markov theorem · Errors and residuals · Goodness of fit · Studentized residual · Minimum mean-square error
Design of experiments Response surface methodology · Optimal design · Bayesian design
Numerical approximation Numerical analysis · Approximation theory · Numerical integration · Gaussian quadrature · Orthogonal polynomials · Chebyshev polynomials · Chebyshev nodes
Applications Curve fitting · Calibration curve · Numerical smoothing and differentiation · System identification · Moving least squares
Regression analysis category - Statistics category · Statistics portal · Statistics outline · Statistics topics

Categories:

 

The above information uses material from Wikipedia and is licensed under the GNU Free Documentation License.
Some facts may not have been fully verified for accuracy. [Disclaimers]
This page was last archived by our server on Sat Mar 17 00:55:54 2012.
Displaying this page or its contents does not use any Wikimedia Foundation's resources.
The owners of this site proudly support the Wikimedia Foundation.