pyamr

Tutorial

  • Introduction
  • Installation
  • Quickstart
  • Advanced
  • Future Actions

Example Galleries

  • Tutorials
  • Examples with indexes
  • Examples with TSA
  • Visualization
  • API
pyamr
  • Step 04 - TSA to estimate trends
  • View page source

Note

Go to the end to download the full example code

Step 04 - TSA to estimate trends

Warning

Pending!

In this example, we will learn about important concepts such as time series decomposition, autocorrelation, moving averages, and forecasting methods from basic linear square models to more complex approaches such as ARIMA (AutoRegressive Integrated Moving Average). We will also explore various statistical tools commonly used for time series analysis, empowering you to harness the power of time-dependent data and extract meaningful information from it. Get ready to embark on a brief yet exciting journey into the world of time series analysis!

The diagram in Figure 4.7 describes the methodology suggested to estimate secular trends in AMR from susceptibility test data. Since data corruption might occur in clinical environments, those susceptibility test records wrongly reported (human or device errors) or duplicated should be discarded. The remaining records are often divided into combinations (tuples defined by sample type or specimen, pathogen and antimicrobial) for which a resistance time series signal needs to be generated using either independent or overlapping time intervals (see xxx). The time series were linearly interpolated to fill sporadic missing values. No additional filters or preprocessing steps were applied. An analysis of stationarity around a trend was carried out to identify interesting combinations and regression analysis was used to quantify its tendency

Diagram

The linear model (see Equation 4.3) has been selected to quantify resistance tendency for several reasons: (i) the development of resistance in pathogens is an evolutionary response to the selective pressure of antimicrobials, hence large variations in short periods (e.g. consecutive days or months) are not expected (ii) the slope parameter can be directly translated to change over time increasing its practicability and (iii) the offset parameter is highly related with the overall resistance. Hence, the response variable in regression analysis (resistance index) is described by the explanatory variable (time). The slope (m) ranges within the interval [-1,1] where sign and absolute value capture direction and rate of change respectively. The unit of the slope is represented by ∆y/∆x.

  • Least Squares Regression

    The optimization problem in ordinary least squares or OLS regression minimizes the least square errors to find the best fitting model. Ordinary least squares - OLS - assumes identical weights (wi) and independently distributed residuals with a normal distribution. It is frequent to observe that some residuals might have higher variance than others, meaning that those observations are effectively less certain. To contemplate such variability, weighted linear squares or WLS regression (see Equation 4.4) applies a weighting function to the residuals. The confidence of the computed resistance index (observed variable) relies on the number of susceptibility test records manipulated. Hence, the sigmoid function has been used to define weights proportional to the population size.

    Warning

    Include equation and/or external reference.

  • Auto Regressive Integrated Moving Average or ARIMA

    An autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model which can be also applied in scenarios where data show evidence of non-stationarity. The autoregressive (AR) part expresses the variable of interest (resistance index) as a function of past values of the variable. The moving average (MA) indicates that the regression error is a linear combination of error terms which occurred contemporaneously and at various times in the past. An ARIMA(p,d,q) model is described by: p is the number of autoregressive terms, d is the number of differences needed for stationarity and q is the number of lagged forecast errors.

    Warning

    Include equation and/or external reference.

    The interpretation of the parameter µ depends on the ARIMA model used for the fitting. In order to estimate the linear trend, it was interesting to consider exclusively MA models so that the expected value of µ was the mean of the one-time differenced series; that is, the slope coefficient of the un-differenced series. The Bayesian information criterion (BIC) was used to select the best ARIMA(0,1,q) model, being the one with the lowest BIC the preferred.

Lets load the libraries and create the time series.

 88 # Import
 89 import pandas as pd
 90
 91 # Import class.
 92 import sys
 93 import warnings
 94 import numpy as np
 95 import pandas as pd
 96 import matplotlib as mpl
 97 import matplotlib.pyplot as plt
 98 import statsmodels.api as sm
 99 import statsmodels.robust.norms as norms
100
101 # import weights.
102 from pyamr.datasets.load import make_timeseries
103
104 # Filter warnings
105 warnings.simplefilter(action='ignore', category=FutureWarning)
106
107 # ----------------------------
108 # set basic configuration
109 # ----------------------------
110 # Matplotlib options
111 mpl.rc('legend', fontsize=6)
112 mpl.rc('xtick', labelsize=6)
113 mpl.rc('ytick', labelsize=6)
114
115 # Set pandas configuration.
116 pd.set_option('display.max_colwidth', 14)
117 pd.set_option('display.width', 150)
118 pd.set_option('display.precision', 4)
119
120 # ----------------------------
121 # create data
122 # ----------------------------
123 # Create timeseries data
124 x, y, f = make_timeseries()

Example using WLS

Libraries

130 from pyamr.core.regression.wls import WLSWrapper
131 from pyamr.metrics.weights import SigmoidA
132
133 # Create method to compute weights from frequencies
134 W = SigmoidA(r=200, g=0.5, offset=0.0, scale=1.0)
135
136 # Note that the function fit will call M.weights(weights) inside and will
137 # store the M converter in the instance. Therefore, the code execute is
138 # equivalent to <weights=M.weights(f)> with the only difference being that
139 # the weight converter is not saved.
140 wls = WLSWrapper(estimator=sm.WLS).fit( \
141     exog=x, endog=y, trend='c', weights=f,
142     W=W, missing='raise')
c:\users\kelda\desktop\repositories\virtualenvs\venv-py391-pyamr\lib\site-packages\statsmodels\regression\linear_model.py:807: RuntimeWarning:

divide by zero encountered in log

All information can be seen as a series

147 # Print series.
148 print("\nSeries:")
149 print(wls.as_series())
Series:
wls-rsquared                  0.5042
wls-rsquared_adj              0.4992
wls-fvalue                   99.6762
wls-fprob                        0.0
wls-aic                          inf
wls-bic                          inf
wls-llf                         -inf
wls-mse_model            153748.2628
wls-mse_resid              1542.4777
wls-mse_total              3079.9099
wls-const_coef              269.2284
wls-const_std                18.0069
wls-const_tvalue             14.9514
wls-const_tprob                  0.0
wls-const_cil               233.4942
wls-const_ciu               304.9625
wls-x1_coef                   2.5201
wls-x1_std                    0.2524
wls-x1_tvalue                 9.9838
wls-x1_tprob                     0.0
wls-x1_cil                    2.0192
wls-x1_ciu                    3.0211
wls-s_dw                       0.548
wls-s_jb_value                16.984
wls-s_jb_prob                 0.0002
wls-s_skew                     0.355
wls-s_kurtosis                  4.89
wls-s_omnibus_value            9.863
wls-s_omnibus_prob             0.007
wls-m_dw                      0.1566
wls-m_jb_value                7.0686
wls-m_jb_prob                 0.0292
wls-m_skew                   -0.6131
wls-m_kurtosis                3.4392
wls-m_nm_value                7.5327
wls-m_nm_prob                 0.0231
wls-m_ks_value                0.6094
wls-m_ks_prob                    0.0
wls-m_shp_value                0.939
wls-m_shp_prob                0.0002
wls-m_ad_value                2.7507
wls-m_ad_nnorm                 False
wls-missing                    raise
wls-exog               [[1.0, 0.0...
wls-endog              [26.961124...
wls-trend                          c
wls-weights            [0.0423348...
wls-W                  <pyamr.met...
wls-model              <statsmode...
wls-id                 WLS(c,Sig(...
dtype: object

Or a summary

154 # Print summary.
155 print("\nSummary:")
156 print(wls.as_summary())
Summary:
                            WLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.504
Model:                            WLS   Adj. R-squared:                  0.499
Method:                 Least Squares   F-statistic:                     99.68
Date:                Thu, 15 Jun 2023   Prob (F-statistic):           1.31e-16
Time:                        18:15:44   Log-Likelihood:                   -inf
No. Observations:                 100   AIC:                               inf
Df Residuals:                      98   BIC:                               inf
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        269.2284     18.007     14.951      0.000     233.494     304.962
x1             2.5201      0.252      9.984      0.000       2.019       3.021
==============================================================================
Omnibus:                        9.863   Durbin-Watson:                   0.548
Prob(Omnibus):                  0.007   Jarque-Bera (JB):               16.984
Skew:                           0.355   Prob(JB):                     0.000205
Kurtosis:                       4.890   Cond. No.                         228.
Normal (N):                     7.533   Prob(N):                         0.023
==============================================================================

The regression line can be seen as

161 # Print regression line.
162 print("\nRegression line:")
163 print(wls.line(np.arange(10)))
Regression line:
[269.22835846 271.74850753 274.26865659 276.78880566 279.30895472
 281.82910379 284.34925285 286.86940192 289.38955098 291.90970005]

Let’s see how to make predictions using the wrapper and how to plot the result in data. In addition, it compares the intervals provided by get_prediction (confidence intervals) and the intervals provided by wls_prediction_std (prediction intervals).

171 # -----------------------------
172 # Predictions
173 # -----------------------------
174 # Variables.
175 start, end = None, 180
176
177 # Compute predictions (exogenous?). It returns a 2D array
178 # where the rows contain the time (t), the mean, the lower
179 # and upper confidence (or prediction?) interval.
180 preds = wls.get_prediction(start=start, end=end)
181
182
183 # Create figure
184 fig, ax = plt.subplots(1, 1, figsize=(11,5))
185
186 # -----------------------------
187 # Plotting confidence intervals
188 # -----------------------------
189 # Plot truth values.
190 ax.plot(x, y, color='#A6CEE3', alpha=0.5, marker='o',
191            markeredgecolor='k', markeredgewidth=0.5,
192            markersize=5, linewidth=0.75, label='Observed')
193
194 # Plot forecasted values.
195 ax.plot(preds[0,:], preds[1, :], color='#FF0000', alpha=1.00,
196              linewidth=2.0, label=wls._identifier(short=True))
197
198 # Plot the confidence intervals.
199 ax.fill_between(preds[0, :],
200                 preds[2, :],
201                 preds[3, :],
202                 color='r',
203                 alpha=0.1)
204
205 # Legend
206 plt.legend()
207
208 # Show
209 plt.show()
plot step 04

Example using ARIMA

Note

Since we are using arima, we should have checked for stationarity!

220 # Libraries
221 # Import ARIMA from statsmodels.
222 from statsmodels.tsa.arima.model import ARIMA
223 # Import ARIMA wrapper from pyamr.
224 from pyamr.core.regression.arima import ARIMAWrapper
225
226 # ----------------------------
227 # create data
228 # ----------------------------
229 # Create timeseries data
230 x, y, f = make_timeseries()
231
232 # Create exogenous variable
233 exog = x
234
235 # ----------------------------
236 # fit the model
237 # ----------------------------
238 # Create specific arima model.
239 arima = ARIMAWrapper(estimator=ARIMA).fit( \
240  endog=y[:80], order=(1,0,0), trend='c', disp=0)

All information can be seen as a series

245 # Print series
246 print("\nSeries:")
247 print(arima.as_series())
Series:
arima-aic                    761.509
arima-bic                   768.6551
arima-hqic                  764.3741
arima-llf                  -377.7545
arima-const_coef            266.5951
arima-const_std              165.575
arima-const_tvalue            1.6101
arima-const_tprob             0.1074
arima-const_cil             -57.9259
arima-const_ciu             591.1161
arima-ar.L1_coef              0.9917
arima-ar.L1_std                0.019
arima-ar.L1_tvalue           52.1312
arima-ar.L1_tprob                0.0
arima-ar.L1_cil               0.9544
arima-ar.L1_ciu                1.029
arima-sigma2_coef           702.5324
arima-sigma2_std              92.619
arima-sigma2_tvalue           7.5852
arima-sigma2_tprob               0.0
arima-sigma2_cil            521.0025
arima-sigma2_ciu            884.0623
arima-m_dw                     1.735
arima-m_jb_value           2049.8936
arima-m_jb_prob                  0.0
arima-m_skew                 -3.8991
arima-m_kurtosis             26.5405
arima-m_nm_value            101.5915
arima-m_nm_prob                  0.0
arima-m_ks_value               0.588
arima-m_ks_prob                  0.0
arima-m_shp_value             0.6822
arima-m_shp_prob                 0.0
arima-m_ad_value              4.6133
arima-m_ad_nnorm               False
arima-converged                 True
arima-endog            [9.1093314...
arima-order                (1, 0, 0)
arima-trend                        c
arima-disp                         0
arima-model            <statsmode...
arima-id               ARIMA(1, 0...
dtype: object

Or a summary

252 # Print summary.
253 print("\nSummary:")
254 print(arima.as_summary())
Summary:
                               SARIMAX Results
==============================================================================
Dep. Variable:                      y   No. Observations:                   80
Model:                 ARIMA(1, 0, 0)   Log Likelihood                -377.754
Date:                Thu, 15 Jun 2023   AIC                            761.509
Time:                        18:15:44   BIC                            768.655
Sample:                             0   HQIC                           764.374
                                 - 80
Covariance Type:                  opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        266.5951    165.575      1.610      0.107     -57.926     591.116
ar.L1          0.9917      0.019     52.131      0.000       0.954       1.029
sigma2       702.5324     92.619      7.585      0.000     521.002     884.062
==============================================================================
                                    Manual
------------------------------------------------------------------------------
Omnibus:                        0.000  Durbin-Watson:                    1.735
Prob(Omnibus):                  0.000  Jarque-Bera (JB):              2049.894
Skew:                          -3.899  Prob(JB):                         0.000
Kurtosis_m:                    26.541  Cond No:
Normal (N):                   101.591  Prob(N):                          0.000
==============================================================================

Lets see how to make predictions using the wrapper which has been previously fitted. It also demonstrates how to plot the resulting data for visualization purposes. It shows two different types of predictions: (i) dynamic predictions in which the prediction is done based on the previously predicted values. Note that for the case of ARIMA(0,1,1) it returns a line and (ii) not dynamic in which the prediction is done based on the real values of the time series, no matter what the prediction was for those values.

265 # ----------------
266 # Predictions
267 # ----------------
268 # Variables.
269 s, e = 50, 120
270
271 # Compute predictions
272 preds_1 = arima.get_prediction(start=s, end=e, dynamic=False)
273 preds_2 = arima.get_prediction(start=s, end=e, dynamic=True)
274
275 # Create figure
276 fig, axes = plt.subplots(1, 2, figsize=(8, 3))
277
278 # ----------------
279 # Plot non-dynamic
280 # ----------------
281 # Plot truth values.
282 axes[0].plot(y, color='#A6CEE3', alpha=0.5, marker='o',
283              markeredgecolor='k', markeredgewidth=0.5,
284              markersize=5, linewidth=0.75, label='Observed')
285
286 # Plot forecasted values.
287 axes[0].plot(preds_1[0, :], preds_1[1, :], color='#FF0000', alpha=1.00,
288              linewidth=2.0, label=arima._identifier())
289
290 # Plot the confidence intervals.
291 axes[0].fill_between(preds_1[0, :], preds_1[2, :],
292                      preds_1[3, :],
293                      color='#FF0000',
294                      alpha=0.25)
295
296 # ------------
297 # Plot dynamic
298 # ------------
299 # Plot truth values.
300 axes[1].plot(y, color='#A6CEE3', alpha=0.5, marker='o',
301              markeredgecolor='k', markeredgewidth=0.5,
302              markersize=5, linewidth=0.75, label='Observed')
303
304 # Plot forecasted values.
305 axes[1].plot(preds_2[0, :], preds_2[1, :], color='#FF0000', alpha=1.00,
306              linewidth=2.0, label=arima._identifier())
307
308 # Plot the confidence intervals.
309 axes[1].fill_between(preds_2[0, :], preds_2[2, :],
310                      preds_2[3, :],
311                      color='#FF0000',
312                      alpha=0.25)
313
314 # Configure axes
315 axes[0].set_title("ARIMA non-dynamic")
316 axes[1].set_title("ARIMA dynamic")
317
318 # Format axes
319 axes[0].grid(True, linestyle='--', linewidth=0.25)
320 axes[1].grid(True, linestyle='--', linewidth=0.25)
321
322 # Legend
323 axes[0].legend()
324 axes[1].legend()
325
326 # Show
327 plt.show()
ARIMA non-dynamic, ARIMA dynamic

Total running time of the script: ( 0 minutes 0.301 seconds)

Download Python source code: plot_step_04.py

Download Jupyter notebook: plot_step_04.ipynb

Gallery generated by Sphinx-Gallery


© Copyright 2021-2023, Bernard Hernandez.

Built with Sphinx using a theme provided by Read the Docs.