Department of Chemical Engineering
MATLAB Tutorial
 

Least Squares Linear Regression                                             (last updated  9/9/99)

The information in this tutorial is located in the MATLAB manual. Any page or section numbers refer to the following:

The Student Edition of MATLAB, Version 5, The MATH WORKS Inc. Prentice-Hall, 1997.

======================================================================================

This tutorial contains the following sections;

Objective

Example Problem - Least Squares Linear Regression

Solution Using Linear Algebra Methods in the Regular Version of MATLAB

Solution Using the regress Command in the Statistics Toolbox

=============================================================================================================

Objective

Determine the parameters 'a1', 'a2' and 'a3' in the equation, y = a1 + a2x1 +a3x2, given a set of n data points (x1,x2,y).

                a1 + a2x11 + a3x21 = y1
                a1 + a2x12 + a3x22 = y2
                a1 + a2x13 + a3x23 = y3
                a1 + a2x14 + a3x24 = y4
                etc.

where the first subscript on x identifies the independent variable and the second subscript signifies the data point.

In matrix notation this is expressed as;

For either technique described below the following holds true. There are 2 ways in MATLAB to do a linear least squares regression.

(a) Using the regress command in the Statistics Toolbox,

(b) Using linear algebra methods in the basic MATLAB package. When you have a linear algebra problem with more equations than unknowns MATLAB defaults to a least squares solution which is what a linear regression uses.

=============================================================================================================

Example Problem - Linear Least Squares Regression

Develop a linear correlation to predict the final weight of an animal based on the initial weight and the amount of feed eaten.

(final weight) = a1 + a2*(initial weight) + a3*(feed eaten)

The following data are given:
 

final weight initial weight feed eaten
95 42 272
77 33 226
80 33 259
100 45 292
97 39 311
70 36 183
50 32 173
80 41 236
92 40 230
84 38 235

=============================================================================================================

Solution Using Linear Algebra Methods in the Regular Version of MATLAB.

Using the linear algebra commands of the basic version of MATLAB has the advantage that you can perform linear regression using the Student Edition of MATLAB. The disadvantage is that statistical information is not available, (ie. no residuals, confidence intervals or correlation coefficients).

This uses the linear algebra notation to solve the equation Ax = b. This is solved using the statement x = A\b.
_______________________________________________________________________________________

Here is the m-file used to solve the example problem using linear algebra methods
_______________________________________________________________________________________

%   Darin Ridgway
%   ChE XXX
%   July 11, 1998

%   Load the independent variables into vectors.
initwgt = [ 42 33 33 45 39 36 32 41 40 38];
feed = [ 272 226 259 292 311 183 173 236 230 235];
fw = [95; 77; 80; 100; 97; 70; 50; 80; 92; 84] ;

%   Then create the x matrix from these vectors using a for loop.
%   You could create the x matrix directly, but you will possibly want the vectors of the independent variables later for plotting.
%   The dependent variable goes into a 10x1 column vector.
for n = 1:10

    x(n,1) = 1;
    x(n,2) = initwgt(n);
    x(n,3) = feed(n);
    y(n,1) = fw(n);
end

%   Use the matrix division operation.  Note: The notation x = b/A will not work.
a = x\fw;

%   Calculate the values of the final weight predicted by the equation.
%   Then calculate the difference between the experimental and predicted values
fwpred = x*a;
res = fw - fwpred;

%   Create output
disp('    Darin Ridgway')
disp('    ChE XXX')
disp('    July 11, 1998')
disp('    Example Problem X')
fprintf('\n\n\n   The parameters a1, a2, and a3 respectively are \n')
fprintf('   a1  =  %5.2e \n', a(1))
fprintf('   a2  =  %5.2e \n', a(2))
fprintf('   a3  =  %5.2e \n\n', a(3))

fprintf('  Experimental wgt    Predicted wgt   Difference \n')

for n = 1:10
    fprintf('     %5.2e          %5.2e      %5.2e \n', fw(n), fwpred(n), res(n))
end
_________________________________________________________________________________________________

Here is the response in the Command Window
_________________________________________________________________________________________________

    Darin Ridgway
    ChE XXX
    July 11, 1998
    Example Problem X

   The parameters a1, a2, and a3 respectively are
   a1  =  -2.30e+001
   a2  =  1.40e+000
   a3  =  2.18e-001

  Experimental wgt    Predicted wgt   Difference
     9.50e+001          9.48e+001      1.84e-001
     7.70e+001          7.22e+001      4.76e+000
     8.00e+001          7.94e+001      5.74e-001
     1.00e+002          1.03e+002      -3.36e+000
     9.70e+001          9.91e+001      -2.12e+000
     7.00e+001          6.71e+001      2.93e+000
     5.00e+001          5.93e+001      -9.32e+000
     8.00e+001          8.56e+001      -5.59e+000
     9.20e+001          8.29e+001      9.12e+000
     8.40e+001          8.12e+001      2.82e+000
__________________________________________________________________________

After calculating the equation parameters you may wish to plot the function versus the data points, especially if there is a single
independent variable. Use plot to plot the data as discrete points and fplot to plot the function.

==============================================================================================================

Solution Using the regress Command in the Statistics Toolbox

Note: The Statistics Toolbox is not in the Student Edition of MATLAB. You must use a machine with the professional version.

The regress command allows you to obtain a great deal of statisitical information.  These are included in the command to call the regress routine.  These are:

If you want a certain piece of information, for example 'stats' you have to include all the parameters up to the desired parameter.
This is the full version of the regress command. Any subset of the list can be used, starting at the left, all the way down to a just [a]. For instance you may not be interested in anything more than the coefficients. You would type [a] = regress(fw,x,alpha) and only the value of the vector 'a' would be given.
_______________________________________________________________________

Here is the m-file used to solve the example problem using regress
_______________________________________________________________________

%   Darin Ridgway
%   ChE XXX
%   July 11, 1998

%   Load the independent variables into vectors.
initwgt = [ 42 33 33 45 39 36 32 41 40 38];
feed = [ 272 226 259 292 311 183 173 236 230 235];
fw = [95; 77; 80; 100; 97; 70; 50; 80; 92; 84] ;

%   Then create the x matrix from these vectors using a for loop.
%   You could create the x matrix directly, but you will possibly want the vectors of the independent variables later for plotting.
%   The dependent variable goes into a 10x1 column vector.
for n = 1:10

    x(n,1) = 1;
    x(n,2) = initwgt(n);
    x(n,3) = feed(n);
    y(n,1) = fw(n);
end

%   Set the value of alpha and call the regress command
alpha = 0.80;
[a,aint,res,rint,stats] = regress(fw,x,alpha)
________________________________________________________________________

Here is the output when you execute the m-file.  It is all shown here to demonstrate the information regress provides.
In a submission you should use formatted output.
_________________________________________________________________________

a =
    -22.9932
    1.3957
    0.2176

aint =
    -27.6677 -18.3187
    1.2424 1.5490
    0.2024 0.2328

res =
    0.1841
    4.7553
    0.5741
    -3.3552
    -2.1158
    2.9257
    -9.3155
    -5.5862
    9.1152
    2.8184

rint =
    1.3520 1.7202
    3.3668 6.1439
    -0.7075 1.8557
    -4.6340 -2.0764
    -3.3535 -0.8782
    1.5472 4.3041
    -10.1864 -8.4446
    -6.9905 -4.1818
    7.9047 -10.3256
    1.2196 4.4172
 
stats =
    0.8732 24.0934 0.0007
____________________________________________________________________

To calculate the predicted values use either method given here.

After calculating the equation parameters you may wish to plot the function versus the data points, especially if there is a single
independent variable. Use plot to plot the data as discrete points and fplot to plot the function.