How Good is the Model?
The goal of modeling data is not just to find a model that works,
but to find the best possible model. In order to do this, scientists
need some kind of measure of how good a model fits the data. On the
What is a Model? page,
you developed a simple model to describe how the time a student studies
relates to his or her test scores in your physics class. The initial
model was the equation:
y = (18.3) x + 12.5
where, y is the score (in percentage points) and x is the
time spent studying (in hours).
But how good is that model? Scientists and mathematicians use
"goodnessoffit" tests to describe how well the model matches the data.
There are several such tests. The goal of all such tests is to minimize
a parameter that characterizes how far the data lies from the model.
We'll use a simple test called leastsquares fit to illustrate the
principle of goodnessoffit. Below we briefly describe the test that
the XSPEC software will use when you fit the lowmass Xray binary data.
The basic principles of getting the best fit between the model and data
are similar.
LeastSquares Fit
The distance between the data points and the model is called the
"residuals" of a the model. In the leastsquares fit method, we want to
minimize the square of the residuals. By minimizing the square, we
ensure that we are minimizing a positive number, no matter whether the
data lies above or below the model. The residuals are illustrated in
the plot below as red lines:
Plot of the data set for test scores versus time studied with the initial linear model shown as a
black line. The red lines represent the residuals. The residuals are the
distance between the data points and the modeled line.
We won't go into the mathematical detail for determining the best fit
by the leastsquares fit method, but check out this webpage if you're
interested: LeastSquares Fitting from Wolfram MathWorld (http://mathworld.wolfram.com/LeastSquaresFitting.html).
Using their equations, we find that the bestfit straight line for
the data above is given by:
y = (15.1) x + 23.5
The data with this bestfit line and residuals are shown below.
The data is the same as above, but the line is now the bestfit model
as found using the leastsquares fit method.
You can interpret the model to mean that if you don't study at all,
you will probably get a grade of about 22% on the next exam. You would
need to study about 4.25 hours to ensure that you get 90% on the
exam.
Of course, the model would be better with more data points. For
example, a straight line might not be the best model – if most
students understand the material after studying for 4.5 hours,
additional study time may increase their grades much. This type of
function would not be linear.
However, no matter the final shape of the bestfit model, the above
example illustrates the basics of how the process works.
Chisquared fit
The software that you will be using to model the lowmass Xray
binary uses a test called a Chisquared test to determine the
goodnessoffit of the data and model. This test is more involved than
the leastsquares fit, but works on the same principle of minimizing a
parameter characterizing how far the data lie from the model.
The Chisquared test determines the goodnessoffit by determining by
the weighted sum of the squared differences between the measured and
calculated values. The test is characterized by the statistic
Χ², which can be written
Χ² = ∑{(1/σ_{i}²)[y_{i}y(x_{i})]²}
where:
 σ_{i}² is the variance or the square of the
calculated error of each point
 y_{i} is the measured value of y at a given point
 y(x_{i}) is the fitted value of y at that given point
If the fitted values of y(x_{i}) are good approximations of
the measure values y_{i}, then the value of Χ² is
low and a good fit can be claimed. If, however, the value of
Χ² is high, the fit is not good.
When you try different models for the lowmass Xray binary data, you
will, therefore, try to minimize Χ² to find the best fit
possible.
