Difference between revisions of "Software tutorial/Least squares modelling (linear regression)"

From Process Model Formulation and Solution: 3E4
Jump to navigation Jump to search
m
m
Line 7: Line 7:


* MATLAB: use the <tt>regress.m</tt> and <tt>regstats.m</tt> function
* MATLAB: use the <tt>regress.m</tt> and <tt>regstats.m</tt> function
* Python: use the <tt>numpy.linalg.lstsq</tt>
* Python: use the [http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html <tt>numpy.linalg.lstsq</tt>] function


http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html


{| class="wikitable"
|-
! MATLAB
! Python
|-
| width="50%" valign="top" |
<syntaxhighlight lang="matlab">
x = [1, 2, 3, 4, 5];
y = [2, 3, 4, 4, 5];
y = y(:);
n = numel(x);


x = np.array([0, 1, 2, 3])
X = [ones(n,1) x(:)];
    >>> y = np.array([-1, 0.2, 0.9, 2.1])
a = regress(y, X)      % Contains a_0=a(1) and a_1=a(2)
   
 
    By examining the coefficients, we see that the line should have a
plot(x, y, 'o')
    gradient of roughly 1 and cut the y-axis at, more or less, -1.
hold('on')
   
grid('on')
    We can rewrite the line equation as ``y = Ap``, where ``A = [[x 1]]``
plot(x, X*a, 'r')
    and ``p = [[m], [c]]``.  Now use `lstsq` to solve for `p`:
xlabel('x values')
   
ylabel('y values')
    >>> A = np.vstack([x, np.ones(len(x))]).T
title('Linear regression of y on x')
    >>> A
xlim([0.9, 5.1])
    array([[ 0., 1.],
ylim([1.9, 5.1])
          [ 1., 1.],
legend({'Original data', 'Fitted line'}, 'Location', 'Best')
          [ 2., 1.],
 
           [ 3.,  1.]])
 
   
% Additional calculations
    >>> m, c = np.linalg.lstsq(A, y)[0]
resids = y - X*a;           % resids = e = y - Xa
    >>> print m, c
RSS = resids' * resids;      % residual sum of squares
    1.0 -0.95
TSS = sum((y - mean(y)).^2); % total sum of squares
   
R2 = 1 - RSS/TSS;
     Plot the data along with the fitted line:
 
   
std_error = sqrt(RSS/(n-numel(a)));
    >>> import matplotlib.pyplot as plt
std_y = sqrt(TSS/(n-1));     % just the same as std(y)
    >>> plt.plot(x, y, 'o', label='Original data', markersize=10)
R2_adj = 1 - (std_error/std_y)^2
    >>> plt.plot(x, m*x + c, 'r', label='Fitted line')
</syntaxhighlight>
    >>> plt.legend()
| width="50%" valign="top" |
    >>> plt.show()
 
|}

Revision as of 20:06, 1 December 2010

One of the best packages for fitting least squares models, in addition to all sorts of other statistical manipulation of data is the R language. The following pages from the 4C3 (Statistics for Engineering) website will help you:

However, here is a tutorial on how you can use MATLAB or Python to fit a least squares model.

  • MATLAB: use the regress.m and regstats.m function
  • Python: use the numpy.linalg.lstsq function


MATLAB Python
x = [1, 2, 3, 4, 5];
y = [2, 3, 4, 4, 5];
y = y(:);
n = numel(x);

X = [ones(n,1) x(:)];
a = regress(y, X)      % Contains a_0=a(1) and a_1=a(2)

plot(x, y, 'o')
hold('on')
grid('on')
plot(x, X*a, 'r')
xlabel('x values')
ylabel('y values')
title('Linear regression of y on x')
xlim([0.9, 5.1])
ylim([1.9, 5.1])
legend({'Original data', 'Fitted line'}, 'Location', 'Best')


% Additional calculations
resids = y - X*a;            % resids = e = y - Xa
RSS = resids' * resids;      % residual sum of squares
TSS = sum((y - mean(y)).^2); % total sum of squares
R2 = 1 - RSS/TSS;

std_error = sqrt(RSS/(n-numel(a)));
std_y = sqrt(TSS/(n-1));     % just the same as std(y)
R2_adj = 1 - (std_error/std_y)^2