7. Multivariate regression tutorial: aragonite saturation state#

Using data from the West Coast Ocean Acidification (WCOA) cruise, create two different multiple linear regression models to calculate aragonite saturation state (\(\Omega_A\)) between 30 and 300 dbar as a function of more commonly observed variables. This is similar to “Model 1” described in the previous section, following Juranek et al. (2009). Note that there are a couple of issues with this model that led Juranek et al. (2009) to select a different model: bias and multicollinearity.

Juranek, L. W., R. A. Feely, W. T. Peterson, S. R. Alin, B. Hales, K. Lee, C. L. Sabine, and J. Peterson, 2009: A novel method for determination of aragonite saturation state on the continental shelf of central Oregon using multi-parameter relationships with hydrographic data. Geophys. Res. Lett., 36, doi:10.1029/2009GL040778.

First import the necessary packages and load the data.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from scipy import stats
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pingouin as pg

import PyCO2SYS as pyco2
filename07 = 'data/wcoa_cruise_2007/32WC20070511.exc.csv'
df07 = pd.read_csv(filename07,header=29,na_values=-999,
                  parse_dates=[[6,7]])

We use the PyCO2SYS package to calculate the carbonate system parameters. This gives us values for aragonite saturation state that we can put in our dataframe.

c07 = pyco2.sys(df07['ALKALI'], df07['TCARBN'], 1, 2,
               salinity=df07['CTDSAL'], temperature=df07['CTDTMP'], 
                pressure=df07['CTDPRS']);
df07['OmegaA'] = c07['saturation_aragonite'];
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/autograd/tracer.py:48: RuntimeWarning: invalid value encountered in sqrt
  return f_raw(*args, **kwargs)
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/equilibria/p1atm.py:99: RuntimeWarning: invalid value encountered in sqrt
  lnKF = 1590.2 / TempK - 12.641 + 1.525 * IonS**0.5
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/equilibria/p1atm.py:577: RuntimeWarning: overflow encountered in power
  K1 = 10.0 ** -(pK1)
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/equilibria/p1atm.py:583: RuntimeWarning: overflow encountered in power
  K2 = 10.0 ** -(pK2)
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/equilibria/p1atm.py:603: RuntimeWarning: overflow encountered in power
  K1 = 10.0**-pK1
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/equilibria/p1atm.py:611: RuntimeWarning: overflow encountered in power
  K2 = 10.0**-pK2
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/equilibria/p1atm.py:636: RuntimeWarning: overflow encountered in power
  K1 = 10.0**-pK1
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/equilibria/p1atm.py:641: RuntimeWarning: overflow encountered in power
  K2 = 10.0**-pK2
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/equilibria/p1atm.py:653: RuntimeWarning: overflow encountered in power
  K1 = 10.0**-pK1
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/equilibria/p1atm.py:658: RuntimeWarning: overflow encountered in power
  K2 = 10.0**-pK2
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/equilibria/p1atm.py:715: RuntimeWarning: overflow encountered in power
  K2 = 10.0**-pK2
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/solubility.py:41: RuntimeWarning: overflow encountered in power
  KAr = 10.0**logKAr  # this is in (mol/kg-SW)^2
/Users/tconnolly/opt/miniconda3/envs/ms263-24/lib/python3.11/site-packages/PyCO2SYS/solubility.py:25: RuntimeWarning: overflow encountered in power
  KCa = 10.0**logKCa  # this is in (mol/kg-SW)^2 at zero pressure

7.1. Model 1: Temperature, salinity, pressure, dissolved oxygen and nitrate:#

  • Temperature

  • Salinity

  • Pressure

  • Oxygen

  • Nitrate

\[\hat{\Omega}_A = c_{0} + c_{1} \times T + c_{2} \times S + c_{3} \times p + c_{4} \times O_2 + c_{5} \times N\]

First, create a subset of all good data between 30-300 dbar.

ii = ((df07['CTDPRS'] >= 30) & (df07['CTDPRS'] <= 300) &
      (df07['NITRAT_FLAG_W'] ==2) & (df07['PHSPHT_FLAG_W'] ==2) &
      (df07['CTDOXY_FLAG_W'] == 2) & (df07['CTDSAL_FLAG_W'] ==2) &
      (df07['TCARBN_FLAG_W'] == 2) & (df07['ALKALI_FLAG_W'] == 2) & 
      (df07['LATITUDE'] > 41) & (df07['LATITUDE'] < 48))
df07sub = df07[ii]
y = df07sub['OmegaA']

Solve for the coefficients \(\vec{c}\) in the least squares problem:

\[ \hat{y} = \textbf{X}\vec{c} \]

Create a 2-D array X that contains a column of all ones, and additional columns containing the explanatory variables. This 2-D array is called the “design matrix” and should have six columns. What are the explanatory variables in this case?

  • Approach: use np.ones() to create a 2-D array of correct size, then fill in the columns.

k = 5 # number of predictor variables
N = len(df07sub['OmegaA']) # number of observations 
N
159
X = np.ones([N, k+1])
np.shape(X)
(159, 6)
X[:,1] = df07sub['CTDTMP']
X[:,2] = df07sub['CTDSAL']
X[:,3] = df07sub['CTDPRS']
X[:,4] = df07sub['CTDOXY']
X[:,5] = df07sub['NITRAT']
X
array([[  1.   ,   6.395,  33.91 , 250.   , 114.9  ,  31.63 ],
       [  1.   ,   7.163,  33.8  , 175.2  , 145.2  ,  27.63 ],
       [  1.   ,   7.131,  33.65 , 152.2  , 182.3  ,  24.67 ],
       [  1.   ,   7.727,  33.38 , 131.2  , 186.5  ,  23.01 ],
       [  1.   ,   8.203,  32.7  , 109.3  , 259.9  ,  11.88 ],
       [  1.   ,   8.266,  32.47 ,  89.5  , 286.7  ,   7.1  ],
       [  1.   ,   8.624,  32.42 ,  70.5  , 294.7  ,   5.9  ],
       [  1.   ,   8.754,  32.39 ,  60.6  , 297.5  ,   5.61 ],
       [  1.   ,   9.656,  32.39 ,  49.1  , 300.7  ,   4.56 ],
       [  1.   ,  10.23 ,  32.39 ,  38.7  , 298.4  ,   4.1  ],
       [  1.   ,  10.683,  32.39 ,  30.4  , 295.1  ,   3.95 ],
       [  1.   ,   6.877,  33.99 , 249.8  ,  74.6  ,  33.81 ],
       [  1.   ,   7.55 ,  33.9  , 174.5  , 101.9  ,  30.   ],
       [  1.   ,   8.012,  33.74 , 128.9  , 126.6  ,  27.95 ],
       [  1.   ,   7.944,  32.84 ,  89.3  , 234.4  ,  14.97 ],
       [  1.   ,   8.507,  32.49 ,  71.1  , 280.5  ,   7.55 ],
       [  1.   ,   8.709,  32.44 ,  60.3  , 286.5  ,   6.3  ],
       [  1.   ,   8.962,  32.43 ,  49.1  , 294.2  ,   5.69 ],
       [  1.   ,   9.713,  32.43 ,  39.7  , 297.   ,   4.4  ],
       [  1.   ,  10.6  ,  32.46 ,  30.4  , 292.4  ,   3.79 ],
       [  1.   ,   6.863,  33.91 , 174.6  , 108.7  ,  32.75 ],
       [  1.   ,   7.171,  33.87 , 150.   , 115.6  ,  30.78 ],
       [  1.   ,   7.474,  33.79 , 130.7  , 124.1  ,  29.71 ],
       [  1.   ,   7.647,  33.65 , 110.8  , 138.3  ,  28.23 ],
       [  1.   ,   7.656,  33.45 ,  89.8  , 159.   ,  26.62 ],
       [  1.   ,   7.739,  33.01 ,  70.2  , 196.8  ,  21.97 ],
       [  1.   ,   8.035,  32.35 ,  39.6  , 285.3  ,   8.51 ],
       [  1.   ,   8.686,  32.22 ,  30.2  , 291.   ,   5.82 ],
       [  1.   ,   7.717,  33.65 ,  70.3  , 134.7  ,  27.72 ],
       [  1.   ,   7.777,  33.41 ,  60.   , 162.8  ,  24.15 ],
       [  1.   ,   8.005,  33.08 ,  50.2  , 197.8  ,  19.16 ],
       [  1.   ,   8.037,  32.63 ,  39.9  , 233.5  ,  15.76 ],
       [  1.   ,   8.353,  32.29 ,  30.1  , 246.4  ,  11.94 ],
       [  1.   ,   8.047,  33.74 , 109.2  ,  95.3  ,  30.47 ],
       [  1.   ,   8.366,  33.28 ,  90.2  , 162.6  ,  23.74 ],
       [  1.   ,   8.887,  32.52 ,  60.   , 269.1  ,   9.61 ],
       [  1.   ,   9.022,  32.44 ,  50.6  , 284.7  ,   6.68 ],
       [  1.   ,   9.491,  32.39 ,  40.2  , 289.1  ,   5.68 ],
       [  1.   ,  10.132,  32.33 ,  30.8  , 291.4  ,   3.77 ],
       [  1.   ,   5.973,  34.08 , 299.5  ,  43.4  ,  37.99 ],
       [  1.   ,   6.048,  34.07 , 274.3  ,  45.5  ,  37.83 ],
       [  1.   ,   6.078,  34.07 , 250.1  ,  46.3  ,  37.65 ],
       [  1.   ,   6.198,  34.06 , 224.9  ,  50.5  ,  37.35 ],
       [  1.   ,   7.101,  33.99 , 199.8  ,  73.4  ,  34.1  ],
       [  1.   ,   7.471,  33.94 , 174.3  ,  86.6  ,  32.55 ],
       [  1.   ,   7.849,  33.84 , 150.2  , 104.7  ,  30.34 ],
       [  1.   ,   7.945,  33.77 , 130.1  , 117.6  ,  29.1  ],
       [  1.   ,   8.236,  33.63 , 111.8  , 136.9  ,  27.   ],
       [  1.   ,   8.547,  33.01 ,  90.1  , 208.5  ,  19.34 ],
       [  1.   ,   8.723,  32.48 ,  69.7  , 269.3  ,   9.41 ],
       [  1.   ,   8.932,  32.44 ,  60.9  , 283.1  ,   7.24 ],
       [  1.   ,   9.243,  32.4  ,  50.2  , 283.3  ,   6.42 ],
       [  1.   ,   9.891,  32.39 ,  39.6  , 284.   ,   4.94 ],
       [  1.   ,   6.674,  34.03 , 299.6  ,  63.1  ,  35.95 ],
       [  1.   ,   7.022,  34.   , 249.9  ,  72.7  ,  34.58 ],
       [  1.   ,   8.036,  33.85 , 150.6  , 100.5  ,  30.58 ],
       [  1.   ,   8.278,  33.78 , 128.3  , 108.4  ,  29.45 ],
       [  1.   ,   8.774,  33.64 , 110.9  , 131.6  ,  26.57 ],
       [  1.   ,   9.161,  33.29 ,  89.8  , 176.2  ,  21.3  ],
       [  1.   ,   8.87 ,  32.63 ,  69.5  , 254.3  ,  12.13 ],
       [  1.   ,   9.006,  32.46 ,  60.5  , 278.9  ,   8.15 ],
       [  1.   ,   9.329,  32.43 ,  50.   , 282.6  ,   6.09 ],
       [  1.   ,   9.76 ,  32.4  ,  39.4  , 287.6  ,   4.79 ],
       [  1.   ,   7.161,  33.99 , 249.5  ,  77.2  ,  32.49 ],
       [  1.   ,   7.561,  33.94 , 201.2  ,  90.   ,  30.42 ],
       [  1.   ,   7.8  ,  33.92 , 176.2  ,  94.5  ,  29.61 ],
       [  1.   ,   8.185,  33.83 , 149.3  , 104.6  ,  28.17 ],
       [  1.   ,   8.727,  33.56 , 110.3  , 140.1  ,  24.16 ],
       [  1.   ,   9.056,  33.29 ,  90.8  , 185.2  ,  20.77 ],
       [  1.   ,   9.009,  32.87 ,  70.3  , 225.7  ,  14.77 ],
       [  1.   ,   8.774,  32.54 ,  59.3  , 267.   ,   9.81 ],
       [  1.   ,   8.887,  32.43 ,  50.4  , 279.4  ,   7.82 ],
       [  1.   ,   9.134,  32.41 ,  40.8  , 284.8  ,   5.94 ],
       [  1.   ,   9.379,  32.38 ,  30.1  , 285.6  ,   4.99 ],
       [  1.   ,   6.995,  34.01 , 247.6  ,  75.4  ,  32.99 ],
       [  1.   ,   7.463,  33.97 , 200.5  ,  84.3  ,  30.97 ],
       [  1.   ,   7.703,  33.95 , 175.6  ,  90.7  ,  30.37 ],
       [  1.   ,   8.15 ,  33.88 , 148.8  , 100.2  ,  28.94 ],
       [  1.   ,   8.434,  33.78 , 129.4  , 112.6  ,  27.46 ],
       [  1.   ,   8.692,  33.59 , 110.3  , 136.   ,  25.03 ],
       [  1.   ,   9.046,  33.31 ,  91.1  , 179.   ,  20.57 ],
       [  1.   ,   9.067,  32.69 ,  70.   , 251.7  ,  11.75 ],
       [  1.   ,   9.031,  32.51 ,  59.7  , 269.   ,   8.94 ],
       [  1.   ,   9.213,  32.46 ,  49.8  , 276.   ,   7.12 ],
       [  1.   ,   9.505,  32.42 ,  40.4  , 282.9  ,   6.02 ],
       [  1.   ,   7.125,  34.01 , 250.   ,  77.7  ,  31.73 ],
       [  1.   ,   7.73 ,  33.98 , 200.4  ,  87.2  ,  29.81 ],
       [  1.   ,   8.08 ,  33.93 , 175.   ,  94.1  ,  28.95 ],
       [  1.   ,   8.357,  33.87 , 149.5  , 103.7  ,  28.12 ],
       [  1.   ,   8.675,  33.79 , 130.3  , 111.5  ,  26.76 ],
       [  1.   ,   8.867,  33.61 , 109.1  , 136.6  ,  25.42 ],
       [  1.   ,   9.419,  33.25 ,  88.8  , 181.8  ,  19.62 ],
       [  1.   ,  10.006,  32.62 ,  59.9  , 272.6  ,   6.49 ],
       [  1.   ,  10.654,  32.6  ,  39.1  , 280.8  ,   4.81 ],
       [  1.   ,  11.117,  32.54 ,  30.2  , 283.7  ,   3.34 ],
       [  1.   ,   7.192,  33.99 , 249.3  ,  80.4  ,  32.99 ],
       [  1.   ,   7.589,  33.95 , 200.8  ,  94.5  ,  31.45 ],
       [  1.   ,   8.203,  33.88 , 149.8  , 100.7  ,  29.69 ],
       [  1.   ,   8.426,  33.81 , 131.   , 109.1  ,  28.73 ],
       [  1.   ,   8.884,  33.62 , 109.5  , 132.6  ,  25.73 ],
       [  1.   ,   9.233,  33.31 ,  91.1  , 184.7  ,  21.13 ],
       [  1.   ,   9.787,  32.58 ,  59.7  , 273.7  ,   6.55 ],
       [  1.   ,  10.961,  32.47 ,  38.8  , 285.8  ,   2.39 ],
       [  1.   ,   7.128,  34.02 , 250.4  ,  77.3  ,  33.22 ],
       [  1.   ,   7.509,  33.97 , 200.2  ,  95.5  ,  30.73 ],
       [  1.   ,   8.225,  33.88 , 174.9  , 122.1  ,  27.73 ],
       [  1.   ,   8.434,  33.8  , 151.3  , 132.8  ,  26.27 ],
       [  1.   ,   8.695,  33.6  , 129.9  , 155.5  ,  23.05 ],
       [  1.   ,   8.975,  33.44 , 110.2  , 185.9  ,  20.3  ],
       [  1.   ,   9.301,  33.13 ,  91.7  , 217.2  ,  16.04 ],
       [  1.   ,  10.387,  32.8  ,  70.6  , 257.3  ,   8.1  ],
       [  1.   ,  11.075,  32.74 ,  59.6  , 279.   ,   4.68 ],
       [  1.   ,  11.353,  32.68 ,  50.7  , 278.8  ,   2.86 ],
       [  1.   ,  11.515,  32.58 ,  41.5  , 280.7  ,   1.95 ],
       [  1.   ,   7.159,  34.02 , 245.6  ,  76.4  ,  32.15 ],
       [  1.   ,   7.612,  33.99 , 199.5  ,  85.5  ,  31.04 ],
       [  1.   ,  11.468,  32.42 ,  31.9  , 283.7  ,   1.18 ],
       [  1.   ,   6.916,  34.02 , 249.7  ,  72.1  ,  34.35 ],
       [  1.   ,   7.588,  33.96 , 198.2  ,  88.1  ,  31.7  ],
       [  1.   ,   7.804,  33.9  , 175.2  , 113.1  ,  29.2  ],
       [  1.   ,   8.154,  33.84 , 150.6  , 119.9  ,  28.59 ],
       [  1.   ,   8.253,  33.67 , 130.8  , 141.2  ,  26.22 ],
       [  1.   ,   8.787,  33.43 , 110.9  , 173.6  ,  22.13 ],
       [  1.   ,   9.58 ,  32.99 ,  89.3  , 220.8  ,  15.01 ],
       [  1.   ,   9.907,  32.77 ,  68.6  , 251.3  ,   9.94 ],
       [  1.   ,  10.009,  32.69 ,  60.4  , 261.2  ,   7.66 ],
       [  1.   ,  10.142,  32.61 ,  49.7  , 271.2  ,   5.53 ],
       [  1.   ,  10.291,  32.53 ,  39.8  , 275.1  ,   4.6  ],
       [  1.   ,  10.612,  32.4  ,  30.7  , 276.5  ,   2.49 ],
       [  1.   ,   6.613,  34.07 , 275.1  ,  56.4  ,  35.61 ],
       [  1.   ,   6.818,  34.06 , 250.2  ,  61.9  ,  34.72 ],
       [  1.   ,   6.941,  34.04 , 224.1  ,  68.   ,  33.87 ],
       [  1.   ,   7.194,  34.01 , 199.9  ,  74.9  ,  33.22 ],
       [  1.   ,   7.706,  33.93 , 150.4  ,  96.1  ,  30.59 ],
       [  1.   ,   7.821,  33.88 , 130.9  , 110.6  ,  29.1  ],
       [  1.   ,   8.02 ,  33.81 , 111.2  , 108.6  ,  28.81 ],
       [  1.   ,   8.687,  33.4  ,  90.   , 157.3  ,  24.32 ],
       [  1.   ,   9.123,  33.26 ,  69.5  , 200.9  ,  19.99 ],
       [  1.   ,   8.999,  33.12 ,  60.3  , 202.7  ,  18.78 ],
       [  1.   ,   9.393,  33.04 ,  50.5  , 234.1  ,  15.43 ],
       [  1.   ,   9.372,  32.71 ,  39.9  , 259.1  ,  10.76 ],
       [  1.   ,   6.311,  34.09 , 274.7  ,  50.6  ,  36.5  ],
       [  1.   ,   6.605,  34.07 , 224.2  ,  54.6  ,  35.56 ],
       [  1.   ,   6.716,  34.06 , 199.7  ,  55.7  ,  35.07 ],
       [  1.   ,   6.959,  34.05 , 175.2  ,  56.   ,  34.28 ],
       [  1.   ,   7.377,  33.99 , 129.7  ,  74.9  ,  32.38 ],
       [  1.   ,   7.615,  33.95 , 110.7  ,  95.3  ,  31.52 ],
       [  1.   ,   8.036,  33.85 ,  89.6  , 128.3  ,  30.   ],
       [  1.   ,   8.278,  33.78 ,  70.2  , 135.5  ,  28.84 ],
       [  1.   ,   8.454,  33.73 ,  60.   , 146.3  ,  27.54 ],
       [  1.   ,   8.667,  33.69 ,  50.   , 183.6  ,  25.74 ],
       [  1.   ,   8.833,  33.69 ,  40.   , 197.2  ,  24.37 ],
       [  1.   ,   7.36 ,  33.99 ,  80.3  ,  78.5  ,  32.3  ],
       [  1.   ,   7.439,  33.97 ,  69.8  ,  86.4  ,  31.94 ],
       [  1.   ,   7.582,  33.94 ,  59.9  , 107.1  ,  30.87 ],
       [  1.   ,   7.77 ,  33.91 ,  50.3  , 125.   ,  29.57 ],
       [  1.   ,   7.872,  33.89 ,  40.2  , 131.2  ,  29.13 ],
       [  1.   ,   7.92 ,  33.88 ,  30.7  , 144.8  ,  28.67 ],
       [  1.   ,   7.032,  34.03 ,  30.2  ,  68.3  ,  32.9  ]])

Use np.linalg.lstsq to compute the set of coefficients, c.

result = np.linalg.lstsq(X,y)
/var/folders/z7/lmyk7sz94177j166ck0x63h80000gr/T/ipykernel_58373/2890136322.py:1: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
  result = np.linalg.lstsq(X,y)
result
(array([-1.19029082e+00, -2.51555185e-03,  9.38753728e-02,  3.36570803e-04,
         5.61953385e-04, -4.24568174e-02]),
 array([0.77762969]),
 6,
 array([2.71587204e+03, 1.28264376e+03, 1.19159901e+02, 1.60456253e+01,
        4.80243679e+00, 3.53983652e-02]))
c = result[0]
c
array([-1.19029082e+00, -2.51555185e-03,  9.38753728e-02,  3.36570803e-04,
        5.61953385e-04, -4.24568174e-02])

Alternative solution using matrix multiplication:

\[\hat{c} = (\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\vec{y}\]
c2 = np.linalg.inv(X.T@X)@X.T@y
c2
array([-1.19029082e+00, -2.51555185e-03,  9.38753728e-02,  3.36570803e-04,
        5.61953385e-04, -4.24568174e-02])

Now calculate the model values, \(\hat{y}\).

yhat = X@c

7.2. Assessing the model for bias#

Plot model vs. observations. Note the systematic underestimate at high and low values of \(\Omega_A\). The residual plot shows this even more clearly.

plt.figure()
plt.plot(y,yhat,'.')
xl = plt.xlim()
yl = plt.ylim()
plt.plot(xl, xl, 'k--')
plt.xlim(xl)
plt.ylim(yl)
plt.xlabel('observed $\Omega_A$')
plt.ylabel('modeled $\Omega_A$');
_images/59fda77036d90267e8d74634fc2669d2ab70acec569df743bdc4b299e8d710e1.png

Plot residuals vs. observations.

plt.figure()
plt.plot(y, yhat-y,'.')
xl = plt.xlim()
plt.plot(xl,[0,0],'k--')
plt.xlim(xl)
plt.xlabel('observed $\Omega_A$')
plt.ylabel('residuals (modeled - observed)');
_images/4a95b574644e72ceda7b59914d50c9b555e16918909cfc5dffc065476f6e7887.png

Alternative approach using statsmodels#

Use statsmodels to get a complete summary of regression statistics. Note that the coefficients are the same as those obtained using Numpy above.

resultsm  = sm.OLS(y, X).fit()
print(resultsm.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 OmegaA   R-squared:                       0.976
Model:                            OLS   Adj. R-squared:                  0.975
Method:                 Least Squares   F-statistic:                     1230.
Date:                Tue, 27 Feb 2024   Prob (F-statistic):          1.44e-121
Time:                        18:20:08   Log-Likelihood:                 197.36
No. Observations:                 159   AIC:                            -382.7
Df Residuals:                     153   BIC:                            -364.3
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -1.1903      2.013     -0.591      0.555      -5.167       2.787
x1            -0.0025      0.015     -0.169      0.866      -0.032       0.027
x2             0.0939      0.064      1.465      0.145      -0.033       0.220
x3             0.0003      0.000      2.105      0.037    2.07e-05       0.001
x4             0.0006      0.001      1.100      0.273      -0.000       0.002
x5            -0.0425      0.005     -7.968      0.000      -0.053      -0.032
==============================================================================
Omnibus:                       14.235   Durbin-Watson:                   0.755
Prob(Omnibus):                  0.001   Jarque-Bera (JB):               21.787
Skew:                           0.487   Prob(JB):                     1.86e-05
Kurtosis:                       4.530   Cond. No.                     7.67e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.67e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
resultsm.params
const   -1.190291
x1      -0.002516
x2       0.093875
x3       0.000337
x4       0.000562
x5      -0.042457
dtype: float64
c
array([-1.19029082e+00, -2.51555185e-03,  9.38753728e-02,  3.36570803e-04,
        5.61953385e-04, -4.24568174e-02])

Confronting multicollinearity#

In the output above, statsmodels gave a warning about a high condition number - this is an indicator of how much a small change to the matrix \(\textbf{X}\) will change the solution to the least squares problem \(\hat{c} = (\textbf{X}^T\textbf{X})^{-1}\textbf{X}^T\vec{y}\). It is a measure of the extent to which the columns in \(\textbf{X}\) are linearly independent and closely scaled in magnitude. The same condition number can be calculated using Numpy.

np.linalg.cond(X)
76723.09230168296

Multiple collinearity can also be assessed by looking at the correlation matrix (R) or calculating the variance inflation factor (VIF).

R = df07sub[['CTDTMP', 'CTDSAL', 'CTDPRS', 'CTDOXY', 'NITRAT']].corr()
R
CTDTMP CTDSAL CTDPRS CTDOXY NITRAT
CTDTMP 1.000000 -0.778421 -0.772172 0.822573 -0.863899
CTDSAL -0.778421 1.000000 0.770493 -0.980860 0.979989
CTDPRS -0.772172 0.770493 1.000000 -0.815279 0.786283
CTDOXY 0.822573 -0.980860 -0.815279 1.000000 -0.985540
NITRAT -0.863899 0.979989 0.786283 -0.985540 1.000000
VIF = 1/(1 - R**2)
VIF
CTDTMP CTDSAL CTDPRS CTDOXY NITRAT
CTDTMP inf 2.537683 2.476778 3.092395 3.942003
CTDSAL 2.537683 inf 2.460989 26.375274 25.238230
CTDPRS 2.476778 2.460989 inf 2.982225 2.619453
CTDOXY 3.092395 26.375274 2.982225 inf 34.830366
NITRAT 3.942003 25.238230 2.619453 34.830366 inf

Final model for aragonite saturation state#

Due to issues with bias and multiple collinearity, the following alternative model proposed by Juranek et al. (2009). It is simpler and contains an interaction term.

\( \hat{\Omega}_A = a_0 + a_1 \times (O - Oref) + a_2 \times (O - Oref) \times (T - Tref) \)

An important question is whether this model can predict aragonite saturation state in different years. This model can be tested using an independent data set from later cruises that cover the same region.