% Multiple Linear Regression - mlinreg2.m % This program performs multiple linear regression on a dataset. % The input should be an ASCII text file (named regmat.dat) which contains the matrix of data. % The file should be in the currently active directory for Matlab. % Each column should contain the values for a different variable. Each row should contain the % values for a different datapoint. The values of the independent variable should be in the % last column. Values within a row should be separated only by spaces. The file should % contain numerical data only, no title or units. Titles, units, and other metadata should be % kept in a separate "readme" file. % The m-file automatically counts the number of datapoints (N) and the number of independent % variables (k) from the dimensions of the input matrix. N = number of rows. k = number of % columns - 1. % The load command of Matlab automatically puts the data in regmat.dat into a matrix called % regmat. The m-file will then assign the values in the first k columns to independent variables % and the values in the (k+1)th column to the dependent variable. The index for the independent % variables is m. The index for the data points is n. % The m-file automatically generates a linear equation which includes mixed variables. For % example, for 3 independent variables, this equation would be . . . % y = a(0) + a(1)*x(n,1) + a(2)*x(n,2) + a(3)*x(n,3) + a(4)*x(n,1)*x(n,1) + a(5)*x(n,1)*x(n,2) % + a(6)*x(n,1)*x(n,3) + a(7)*x(n,2)*x(n,2) + a(8)*x(n,2)*x(n,3) + a(9)*x(n,3)*x(n,3) % It then solves these equations using the method outlined by Dr. Ridgway in his tutorials. % Clear workspace and set up screens clear; format compact; format short e; % Initialize indices m = 0; % index for variables n = 1; % index for data points % Read in regmat.dat and assign variables load regmat.dat; [N K] = size(regmat); k = K-1; for n = 1:N; for m = 1:k; x(n,m) = regmat(n,m); end y(n) = regmat(n,K); end % Now define new variables, which allow mixtures of the original independent variables m = 0; for q = 1:k; m = m+1; for n = 1:N; xnew(n,m) = x(n,q); end end for q = 1:k; for r = q:k; m = m+1; for n=1:N; xnew(n,m) = x(n,q)*x(n,r); end end end knew = m; % Now we have a matrix xnew. Each row of the matrix corresponds to a different data point. % Each column of the matrix corresponds to a different independent variable. These independent % variables are derived from the original independent variables. Referring to the example % supplied previously, where there are three original independent variables x . . . % xnew(n,1) = x(n,1) xnew(n,2) = x(n,2) xnew(n,3) = x(n,3) % xnew(n,4) = x(n,1)*x(n,1) xnew(n,5) = x(n,1)*x(n,2) xnew(n,6) = x(n,1)*x(n,3) % xnew(n,7) = x(n,2)*x(n,2) xnew(n,8) = x(n,2)*x(n,3) xnew(n,9) = x(n,3)*x(n,3) % knew is the number of new independent variables. % Next we build a coeffient matrix C so we can solve the system of linear equations Ca=y. % a is the vector of adjustable parameters. The number of adjustable parameters is knew+1. % The coefficient matrix C has N rows (one for each data point) and knew+1 columns; the first % column is all 1's. Because this system has more equations than unknowns, Matlab automatically % performs least-squares regression to find the appropriate values for the vector a. % Building C for n = 1:N; C(n,1) = 1; end for q = 2:(knew+1); for n = 1:N; C(n,q) = xnew(n,q-1); end end y = y'; % Transpose y so it is a column vector, not a row vector % Now solve the system Ca=y for a using matrix math. a = C\y