% Multiple Linear Regression - mlinreg.m % This program performs multiple linear regression on a dataset. % The input should be an ASCII text file (named regmat.dat) which contains the matrix of data. % The file should be in the currently active directory for Matlab. % Each column should contain the values for a different variable. Each row should contain the % values for a different datapoint. The values of the independent variable should be in the % last column. Values within a row should be separated only by spaces. The file should % contain numerical data only, no title or units. Titles, units, and other metadata should be % kept in a separate "readme" file. % The m-file automatically counts the number of datapoints (N) and the number of independent % variables (k) from the dimensions of the input matrix. N = number of rows. k = number of % columns - 1. % The load command of Matlab automatically puts the data in regmat.dat into a matrix called % regmat. The m-file will then assign the values in the first k columns to independent variables % and the values in the (k+1)th column to the dependent variable. The index for the independent % variables is m. The index for the data points is n. % The m-file automatically generates a linear equation which includes mixed variables. For % example, for 3 independent variables, this equation would be . . . % y = a(0) + a(1)*x(n,1) + a(2)*x(n,2) + a(3)*x(n,3) + a(4)*x(n,1)*x(n,1) + a(5)*x(n,1)*x(n,2) % + a(6)*x(n,1)*x(n,3) + a(7)*x(n,2)*x(n,2) + a(8)*x(n,2)*x(n,3) + a(9)*x(n,3)*x(n,3) % It then determines the values of a that minimize the function (y-y(x))^2. % Clear workspace and set up screens clear; format compact; format short e; % Initialize indices m = 0; % index for variables n = 1; % index for data points % Read in regmat.dat and assign variables load regmat.dat; [N K] = size(regmat); k = K-1; for n = 1:N; for m = 1:k; x(n,m) = regmat(n,m); end y(n) = regmat(n,K); end % Now define new variables, which allow mixtures of the original independent variables m = 0; for q = 1:k; m = m+1; for n = 1:N; xnew(n,m) = x(n,q); end end for q = 1:k; for r = q:k; m = m+1; for n=1:N; xnew(n,m) = x(n,q)*x(n,r); end end end knew = m; % Now we have a matrix xnew. Each row of the matrix corresponds to a different data point. % Each column of the matrix corresponds to a different independent variable. These independent % variables are derived from the original independent variables. Referring to the example % supplied previously, where there are three original independent variables x . . . % xnew(n,1) = x(n,1) xnew(n,2) = x(n,2) xnew(n,3) = x(n,3) % xnew(n,4) = x(n,1)*x(n,1) xnew(n,5) = x(n,1)*x(n,2) xnew(n,6) = x(n,1)*x(n,3) % xnew(n,7) = x(n,2)*x(n,2) xnew(n,8) = x(n,2)*x(n,3) xnew(n,9) = x(n,3)*x(n,3) % knew is the number of new independent variables. % Next we must build the coeffient matrix C and the right-hand-side vector g so we can solve the % system of linear equations Ca=g. a is the vector of adjustable parameters. The number of % adjustable parameters is knew+1. % Building C C(1,1) = N; for q = 2:(knew+1); sum = 0; % A temporary variable for summing over all datapoints for n = 1:N; sum = sum + xnew(n,q-1); end C(1,q) = sum; C(q,1) = sum; for s = 2:(knew+1); sum = 0; for n = 1:N; sum = sum + (xnew(n,(q-1))*xnew(n,(s-1))); end C(q,s) = sum; end end C % Building g sum = 0; % A temporary variable for summing over all datapoints for n = 1:N; sum = sum + y(n); end g(1) = sum; for q = 2:(knew+1); sum = 0; for n = 1:N; sum = sum + (y(n)*xnew(n,q-1)); end g(q) = sum; end g = g' % Transpose g so it is a column vector, not a row vector % Now solve the system Ca=g for a using matrix math. a = C\g