How Good is the Model?
The goal of modeling data is not just to find a model that works,
but to find the best possible model. In order to do this, scientists
need some kind of measure of how good a model fits the data. On the
What is a Model? page,
you developed a simple model to describe how the time a student studies
relates to his or her test scores in your physics class. The initial
model was the equation:
y = (18.3) x + 12.5
where, y is the score (in percentage points) and x is the
time spent studying (in hours).
But how good is that model? Scientists and mathematicians use
"goodness-of-fit" tests to describe how well the model matches the data.
There are several such tests. The goal of all such tests is to minimize
a parameter that characterizes how far the data lies from the model.
We'll use a simple test called least-squares fit to illustrate the
principle of goodness-of-fit. Below we briefly describe the test that
the XSPEC software will use when you fit the low-mass X-ray binary data.
The basic principles of getting the best fit between the model and data
are similar.
Least-Squares Fit
The distance between the data points and the model is called the
"residuals" of a the model. In the least-squares fit method, we want to
minimize the square of the residuals. By minimizing the square, we
ensure that we are minimizing a positive number, no matter whether the
data lies above or below the model. The residuals are illustrated in
the plot below as red lines:
Plot of the data set for test scores versus time studied with the initial linear model shown as a
black line. The red lines represent the residuals. The residuals are the
distance between the data points and the modeled line.
We won't go into the mathematical detail for determining the best fit
by the least-squares fit method, but check out this webpage if you're
interested: LeastSquares Fitting from Wolfram MathWorld (http://mathworld.wolfram.com/LeastSquaresFitting.html).
Using their equations, we find that the best-fit straight line for
the data above is given by:
y = (15.1) x + 23.5
The data with this best-fit line and residuals are shown below.
The data is the same as above, but the line is now the best-fit model
as found using the least-squares fit method.
You can interpret the model to mean that if you don't study at all,
you will probably get a grade of about 22% on the next exam. You would
need to study about 4.25 hours to ensure that you get 90% on the
exam.
Of course, the model would be better with more data points. For
example, a straight line might not be the best model if most
students understand the material after studying for 4.5 hours,
additional study time may increase their grades much. This type of
function would not be linear.
However, no matter the final shape of the best-fit model, the above
example illustrates the basics of how the process works.
Chi-squared fit
The software that you will be using to model the low-mass X-ray
binary uses a test called a Chi-squared test to determine the
goodness-of-fit of the data and model. This test is more involved than
the least-squares fit, but works on the same principle of minimizing a
parameter characterizing how far the data lie from the model.
The Chi-squared test determines the goodness-of-fit by determining by
the weighted sum of the squared differences between the measured and
calculated values. The test is characterized by the statistic
Χ², which can be written
Χ² = ∑{(1/σi²)[yi-y(xi)]²}
where:
- σi² is the variance or the square of the
calculated error of each point
- yi is the measured value of y at a given point
- y(xi) is the fitted value of y at that given point
If the fitted values of y(xi) are good approximations of
the measure values yi, then the value of Χ² is
low and a good fit can be claimed. If, however, the value of
Χ² is high, the fit is not good.
When you try different models for the low-mass X-ray binary data, you
will, therefore, try to minimize Χ² to find the best fit
possible.
|