.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "_examples\tutorial\guide\plot_step_04.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr__examples_tutorial_guide_plot_step_04.py: Step 04 - TSA to estimate trends ================================ .. warning:: Pending! In this example, we will learn about important concepts such as time series decomposition, autocorrelation, moving averages, and forecasting methods from basic linear square models to more complex approaches such as ARIMA (AutoRegressive Integrated Moving Average). We will also explore various statistical tools commonly used for time series analysis, empowering you to harness the power of time-dependent data and extract meaningful information from it. Get ready to embark on a brief yet exciting journey into the world of time series analysis! .. GENERATED FROM PYTHON SOURCE LINES 18-83 The diagram in Figure 4.7 describes the methodology suggested to estimate secular trends in AMR from susceptibility test data. Since data corruption might occur in clinical environments, those susceptibility test records wrongly reported (human or device errors) or duplicated should be discarded. The remaining records are often divided into combinations (tuples defined by sample type or specimen, pathogen and antimicrobial) for which a resistance time series signal needs to be generated using either independent or overlapping time intervals (see xxx). The time series were linearly interpolated to fill sporadic missing values. No additional filters or preprocessing steps were applied. An analysis of stationarity around a trend was carried out to identify interesting combinations and regression analysis was used to quantify its tendency .. image:: ../../../_static/imgs/sart-diagram.png :width: 600 :align: center :alt: Diagram | The linear model (see Equation 4.3) has been selected to quantify resistance tendency for several reasons: (i) the development of resistance in pathogens is an evolutionary response to the selective pressure of antimicrobials, hence large variations in short periods (e.g. consecutive days or months) are not expected (ii) the slope parameter can be directly translated to change over time increasing its practicability and (iii) the offset parameter is highly related with the overall resistance. Hence, the response variable in regression analysis (resistance index) is described by the explanatory variable (time). The slope (m) ranges within the interval [-1,1] where sign and absolute value capture direction and rate of change respectively. The unit of the slope is represented by ∆y/∆x. - **Least Squares Regression** The optimization problem in ordinary least squares or ``OLS`` regression minimizes the least square errors to find the best fitting model. Ordinary least squares - OLS - assumes identical weights (wi) and independently distributed residuals with a normal distribution. It is frequent to observe that some residuals might have higher variance than others, meaning that those observations are effectively less certain. To contemplate such variability, weighted linear squares or ``WLS`` regression (see Equation 4.4) applies a weighting function to the residuals. The confidence of the computed resistance index (observed variable) relies on the number of susceptibility test records manipulated. Hence, the sigmoid function has been used to define weights proportional to the population size. .. warning:: Include equation and/or external reference. - **Auto Regressive Integrated Moving Average** or ``ARIMA`` An autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model which can be also applied in scenarios where data show evidence of non-stationarity. The autoregressive (AR) part expresses the variable of interest (resistance index) as a function of past values of the variable. The moving average (MA) indicates that the regression error is a linear combination of error terms which occurred contemporaneously and at various times in the past. An ARIMA(p,d,q) model is described by: p is the number of autoregressive terms, d is the number of differences needed for stationarity and q is the number of lagged forecast errors. .. warning:: Include equation and/or external reference. The interpretation of the parameter µ depends on the ARIMA model used for the fitting. In order to estimate the linear trend, it was interesting to consider exclusively MA models so that the expected value of µ was the mean of the one-time differenced series; that is, the slope coefficient of the un-differenced series. The Bayesian information criterion (BIC) was used to select the best ARIMA(0,1,q) model, being the one with the lowest BIC the preferred. .. GENERATED FROM PYTHON SOURCE LINES 86-87 Lets load the libraries and create the time series. .. GENERATED FROM PYTHON SOURCE LINES 87-126 .. code-block:: default :lineno-start: 88 # Import import pandas as pd # Import class. import sys import warnings import numpy as np import pandas as pd import matplotlib as mpl import matplotlib.pyplot as plt import statsmodels.api as sm import statsmodels.robust.norms as norms # import weights. from pyamr.datasets.load import make_timeseries # Filter warnings warnings.simplefilter(action='ignore', category=FutureWarning) # ---------------------------- # set basic configuration # ---------------------------- # Matplotlib options mpl.rc('legend', fontsize=6) mpl.rc('xtick', labelsize=6) mpl.rc('ytick', labelsize=6) # Set pandas configuration. pd.set_option('display.max_colwidth', 14) pd.set_option('display.width', 150) pd.set_option('display.precision', 4) # ---------------------------- # create data # ---------------------------- # Create timeseries data x, y, f = make_timeseries() .. GENERATED FROM PYTHON SOURCE LINES 127-130 Example using WLS ----------------- Libraries .. GENERATED FROM PYTHON SOURCE LINES 130-144 .. code-block:: default :lineno-start: 130 from pyamr.core.regression.wls import WLSWrapper from pyamr.metrics.weights import SigmoidA # Create method to compute weights from frequencies W = SigmoidA(r=200, g=0.5, offset=0.0, scale=1.0) # Note that the function fit will call M.weights(weights) inside and will # store the M converter in the instance. Therefore, the code execute is # equivalent to with the only difference being that # the weight converter is not saved. wls = WLSWrapper(estimator=sm.WLS).fit( \ exog=x, endog=y, trend='c', weights=f, W=W, missing='raise') .. rst-class:: sphx-glr-script-out .. code-block:: none c:\users\kelda\desktop\repositories\virtualenvs\venv-py391-pyamr\lib\site-packages\statsmodels\regression\linear_model.py:807: RuntimeWarning: divide by zero encountered in log .. GENERATED FROM PYTHON SOURCE LINES 145-146 All information can be seen as a series .. GENERATED FROM PYTHON SOURCE LINES 146-151 .. code-block:: default :lineno-start: 147 # Print series. print("\nSeries:") print(wls.as_series()) .. rst-class:: sphx-glr-script-out .. code-block:: none Series: wls-rsquared 0.5042 wls-rsquared_adj 0.4992 wls-fvalue 99.6762 wls-fprob 0.0 wls-aic inf wls-bic inf wls-llf -inf wls-mse_model 153748.2628 wls-mse_resid 1542.4777 wls-mse_total 3079.9099 wls-const_coef 269.2284 wls-const_std 18.0069 wls-const_tvalue 14.9514 wls-const_tprob 0.0 wls-const_cil 233.4942 wls-const_ciu 304.9625 wls-x1_coef 2.5201 wls-x1_std 0.2524 wls-x1_tvalue 9.9838 wls-x1_tprob 0.0 wls-x1_cil 2.0192 wls-x1_ciu 3.0211 wls-s_dw 0.548 wls-s_jb_value 16.984 wls-s_jb_prob 0.0002 wls-s_skew 0.355 wls-s_kurtosis 4.89 wls-s_omnibus_value 9.863 wls-s_omnibus_prob 0.007 wls-m_dw 0.1566 wls-m_jb_value 7.0686 wls-m_jb_prob 0.0292 wls-m_skew -0.6131 wls-m_kurtosis 3.4392 wls-m_nm_value 7.5327 wls-m_nm_prob 0.0231 wls-m_ks_value 0.6094 wls-m_ks_prob 0.0 wls-m_shp_value 0.939 wls-m_shp_prob 0.0002 wls-m_ad_value 2.7507 wls-m_ad_nnorm False wls-missing raise wls-exog [[1.0, 0.0... wls-endog [26.961124... wls-trend c wls-weights [0.0423348... wls-W |t| [0.025 0.975] ------------------------------------------------------------------------------ const 269.2284 18.007 14.951 0.000 233.494 304.962 x1 2.5201 0.252 9.984 0.000 2.019 3.021 ============================================================================== Omnibus: 9.863 Durbin-Watson: 0.548 Prob(Omnibus): 0.007 Jarque-Bera (JB): 16.984 Skew: 0.355 Prob(JB): 0.000205 Kurtosis: 4.890 Cond. No. 228. Normal (N): 7.533 Prob(N): 0.023 ============================================================================== .. GENERATED FROM PYTHON SOURCE LINES 159-160 The regression line can be seen as .. GENERATED FROM PYTHON SOURCE LINES 160-165 .. code-block:: default :lineno-start: 161 # Print regression line. print("\nRegression line:") print(wls.line(np.arange(10))) .. rst-class:: sphx-glr-script-out .. code-block:: none Regression line: [269.22835846 271.74850753 274.26865659 276.78880566 279.30895472 281.82910379 284.34925285 286.86940192 289.38955098 291.90970005] .. GENERATED FROM PYTHON SOURCE LINES 166-170 Let's see how to make predictions using the wrapper and how to plot the result in data. In addition, it compares the intervals provided by ``get_prediction`` (confidence intervals) and the intervals provided by ``wls_prediction_std`` (prediction intervals). .. GENERATED FROM PYTHON SOURCE LINES 170-212 .. code-block:: default :lineno-start: 171 # ----------------------------- # Predictions # ----------------------------- # Variables. start, end = None, 180 # Compute predictions (exogenous?). It returns a 2D array # where the rows contain the time (t), the mean, the lower # and upper confidence (or prediction?) interval. preds = wls.get_prediction(start=start, end=end) # Create figure fig, ax = plt.subplots(1, 1, figsize=(11,5)) # ----------------------------- # Plotting confidence intervals # ----------------------------- # Plot truth values. ax.plot(x, y, color='#A6CEE3', alpha=0.5, marker='o', markeredgecolor='k', markeredgewidth=0.5, markersize=5, linewidth=0.75, label='Observed') # Plot forecasted values. ax.plot(preds[0,:], preds[1, :], color='#FF0000', alpha=1.00, linewidth=2.0, label=wls._identifier(short=True)) # Plot the confidence intervals. ax.fill_between(preds[0, :], preds[2, :], preds[3, :], color='r', alpha=0.1) # Legend plt.legend() # Show plt.show() .. image-sg:: /_examples/tutorial/guide/images/sphx_glr_plot_step_04_001.png :alt: plot step 04 :srcset: /_examples/tutorial/guide/images/sphx_glr_plot_step_04_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 213-215 Example using ARIMA ------------------- .. GENERATED FROM PYTHON SOURCE LINES 217-219 .. note:: Since we are using arima, we should have checked for stationarity! .. GENERATED FROM PYTHON SOURCE LINES 219-242 .. code-block:: default :lineno-start: 220 # Libraries # Import ARIMA from statsmodels. from statsmodels.tsa.arima.model import ARIMA # Import ARIMA wrapper from pyamr. from pyamr.core.regression.arima import ARIMAWrapper # ---------------------------- # create data # ---------------------------- # Create timeseries data x, y, f = make_timeseries() # Create exogenous variable exog = x # ---------------------------- # fit the model # ---------------------------- # Create specific arima model. arima = ARIMAWrapper(estimator=ARIMA).fit( \ endog=y[:80], order=(1,0,0), trend='c', disp=0) .. GENERATED FROM PYTHON SOURCE LINES 243-244 All information can be seen as a series .. GENERATED FROM PYTHON SOURCE LINES 244-249 .. code-block:: default :lineno-start: 245 # Print series print("\nSeries:") print(arima.as_series()) .. rst-class:: sphx-glr-script-out .. code-block:: none Series: arima-aic 761.509 arima-bic 768.6551 arima-hqic 764.3741 arima-llf -377.7545 arima-const_coef 266.5951 arima-const_std 165.575 arima-const_tvalue 1.6101 arima-const_tprob 0.1074 arima-const_cil -57.9259 arima-const_ciu 591.1161 arima-ar.L1_coef 0.9917 arima-ar.L1_std 0.019 arima-ar.L1_tvalue 52.1312 arima-ar.L1_tprob 0.0 arima-ar.L1_cil 0.9544 arima-ar.L1_ciu 1.029 arima-sigma2_coef 702.5324 arima-sigma2_std 92.619 arima-sigma2_tvalue 7.5852 arima-sigma2_tprob 0.0 arima-sigma2_cil 521.0025 arima-sigma2_ciu 884.0623 arima-m_dw 1.735 arima-m_jb_value 2049.8936 arima-m_jb_prob 0.0 arima-m_skew -3.8991 arima-m_kurtosis 26.5405 arima-m_nm_value 101.5915 arima-m_nm_prob 0.0 arima-m_ks_value 0.588 arima-m_ks_prob 0.0 arima-m_shp_value 0.6822 arima-m_shp_prob 0.0 arima-m_ad_value 4.6133 arima-m_ad_nnorm False arima-converged True arima-endog [9.1093314... arima-order (1, 0, 0) arima-trend c arima-disp 0 arima-model |z| [0.025 0.975] ------------------------------------------------------------------------------ const 266.5951 165.575 1.610 0.107 -57.926 591.116 ar.L1 0.9917 0.019 52.131 0.000 0.954 1.029 sigma2 702.5324 92.619 7.585 0.000 521.002 884.062 ============================================================================== Manual ------------------------------------------------------------------------------ Omnibus: 0.000 Durbin-Watson: 1.735 Prob(Omnibus): 0.000 Jarque-Bera (JB): 2049.894 Skew: -3.899 Prob(JB): 0.000 Kurtosis_m: 26.541 Cond No: Normal (N): 101.591 Prob(N): 0.000 ============================================================================== .. GENERATED FROM PYTHON SOURCE LINES 257-264 Lets see how to make predictions using the wrapper which has been previously fitted. It also demonstrates how to plot the resulting data for visualization purposes. It shows two different types of predictions: (i) **dynamic predictions** in which the prediction is done based on the previously predicted values. Note that for the case of ARIMA(0,1,1) it returns a line and (ii) **not dynamic** in which the prediction is done based on the real values of the time series, no matter what the prediction was for those values. .. GENERATED FROM PYTHON SOURCE LINES 264-328 .. code-block:: default :lineno-start: 265 # ---------------- # Predictions # ---------------- # Variables. s, e = 50, 120 # Compute predictions preds_1 = arima.get_prediction(start=s, end=e, dynamic=False) preds_2 = arima.get_prediction(start=s, end=e, dynamic=True) # Create figure fig, axes = plt.subplots(1, 2, figsize=(8, 3)) # ---------------- # Plot non-dynamic # ---------------- # Plot truth values. axes[0].plot(y, color='#A6CEE3', alpha=0.5, marker='o', markeredgecolor='k', markeredgewidth=0.5, markersize=5, linewidth=0.75, label='Observed') # Plot forecasted values. axes[0].plot(preds_1[0, :], preds_1[1, :], color='#FF0000', alpha=1.00, linewidth=2.0, label=arima._identifier()) # Plot the confidence intervals. axes[0].fill_between(preds_1[0, :], preds_1[2, :], preds_1[3, :], color='#FF0000', alpha=0.25) # ------------ # Plot dynamic # ------------ # Plot truth values. axes[1].plot(y, color='#A6CEE3', alpha=0.5, marker='o', markeredgecolor='k', markeredgewidth=0.5, markersize=5, linewidth=0.75, label='Observed') # Plot forecasted values. axes[1].plot(preds_2[0, :], preds_2[1, :], color='#FF0000', alpha=1.00, linewidth=2.0, label=arima._identifier()) # Plot the confidence intervals. axes[1].fill_between(preds_2[0, :], preds_2[2, :], preds_2[3, :], color='#FF0000', alpha=0.25) # Configure axes axes[0].set_title("ARIMA non-dynamic") axes[1].set_title("ARIMA dynamic") # Format axes axes[0].grid(True, linestyle='--', linewidth=0.25) axes[1].grid(True, linestyle='--', linewidth=0.25) # Legend axes[0].legend() axes[1].legend() # Show plt.show() .. image-sg:: /_examples/tutorial/guide/images/sphx_glr_plot_step_04_002.png :alt: ARIMA non-dynamic, ARIMA dynamic :srcset: /_examples/tutorial/guide/images/sphx_glr_plot_step_04_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.301 seconds) .. _sphx_glr_download__examples_tutorial_guide_plot_step_04.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_step_04.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_step_04.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_