补充：https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-15-276 如果用arima的话，还不如使用随机森林。。。

原文地址：https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-9-time-series-analysis-in-python-a270cb05e0b3

数据集样子：

                           y

timestamp

1980-09-25 14:01:00  182.478

1980-09-25 14:02:00  176.231

1980-09-25 14:03:00  183.917

1980-09-25 14:04:00  177.798

1980-09-25 14:05:00  165.469

1980-09-25 14:06:00  181.878

1980-09-25 14:07:00  184.502

1980-09-25 14:08:00  183.303

1980-09-25 14:09:00  177.578

1980-09-25 14:10:00  171.641

1980-09-25 14:11:00  191.014

1980-09-25 14:12:00  184.068

1980-09-25 14:13:00  188.457

1980-09-25 14:14:00  175.739

1980-09-25 14:15:00  175.524

1980-09-25 14:16:00  189.128

1980-09-25 14:17:00  176.885

1980-09-25 14:18:00  167.140

1980-09-25 14:19:00  173.723

1980-09-25 14:20:00  168.460

1980-09-25 14:21:00  177.623

1980-09-25 14:22:00  183.888

我将自己学习的代码贴出来。由于原文没有给出数据，我直接使用twitter的时间序列example data。download:https://github.com/nicolasmiller/pyculiarity/blob/master/tests/raw_data.csv

直接亮出最后线性回归（注释了xgboost）的代码：

# coding: utf-8

import matplotlib

matplotlib.use('Agg')

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

from sklearn.model_selection import TimeSeriesSplit  # you have everything done for you

from sklearn.linear_model import LinearRegression

from sklearn.model_selection import cross_val_score

from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LassoCV, RidgeCV

# for time-series cross-validation set 5 folds

tscv = TimeSeriesSplit(n_splits=5)

def mean_absolute_percentage_error(y_true, y_pred):

    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def timeseries_train_test_split(X, y, test_size):

    """

        Perform train-test split with respect to time series structure

    """

    # get the index after which test set starts

    test_index = int(len(X) * (1 - test_size))

    X_train = X.iloc[:test_index]

    y_train = y.iloc[:test_index]

    X_test = X.iloc[test_index:]

    y_test = y.iloc[test_index:]

    return X_train, X_test, y_train, y_test

def plotModelResults(model, X_train, X_test, y_train, y_test, plot_intervals=False, plot_anomalies=False):

    """

        Plots modelled vs fact values, prediction intervals and anomalies

    """

    prediction = model.predict(X_test)

    plt.figure(figsize=(15, 7))

    plt.plot(prediction, "g", label="prediction", linewidth=2.0)

    plt.plot(y_test.values, label="actual", linewidth=2.0)

    if plot_intervals:

        cv = cross_val_score(model, X_train, y_train,

                             cv=tscv,

                             scoring="neg_mean_absolute_error")

        mae = cv.mean() * (-1)

        deviation = cv.std()

        scale = 20

        lower = prediction - (mae + scale * deviation)

        upper = prediction + (mae + scale * deviation)

        plt.plot(lower, "r--", label="upper bond / lower bond", alpha=0.5)

        plt.plot(upper, "r--", alpha=0.5)

        if plot_anomalies:

            anomalies = np.array([np.NaN] * len(y_test))

            anomalies[y_test < lower] = y_test[y_test < lower]

            anomalies[y_test > upper] = y_test[y_test > upper]

            plt.plot(anomalies, "o", markersize=10, label="Anomalies")

    error = mean_absolute_percentage_error(prediction, y_test)

    plt.title("Mean absolute percentage error {0:.2f}%".format(error))

    plt.legend(loc="best")

    plt.tight_layout()

    plt.grid(True);

    plt.savefig("linear.png")

def plotCoefficients(model, X_train):

    """

        Plots sorted coefficient values of the model

    """

    coefs = pd.DataFrame(model.coef_, X_train.columns)

    coefs.columns = ["coef"]

    coefs["abs"] = coefs.coef.apply(np.abs)

    coefs = coefs.sort_values(by="abs", ascending=False).drop(["abs"], axis=1)

    plt.figure(figsize=(15, 9))

    coefs.coef.plot(kind='bar')

    plt.grid(True, axis='y')

    plt.hlines(y=0, xmin=0, xmax=len(coefs), linestyles='dashed')

    plt.savefig("linear-cov.png")

def code_mean(data, cat_feature, real_feature):

    """

    Returns a dictionary where keys are unique categories of the cat_feature,

    and values are means over real_feature

    """

    return dict(data.groupby(cat_feature)[real_feature].mean())

def prepareData(series, lag_start, lag_end, test_size, target_encoding=False):

    """

        series: pd.DataFrame

            dataframe with timeseries

        lag_start: int

            initial step back in time to slice target variable

            example - lag_start = 1 means that the model

                      will see yesterday's values to predict today

        lag_end: int

            final step back in time to slice target variable

            example - lag_end = 4 means that the model

                      will see up to 4 days back in time to predict today

        test_size: float

            size of the test dataset after train/test split as percentage of dataset

        target_encoding: boolean

            if True - add target averages to the dataset

    """

    # copy of the initial dataset

    data = pd.DataFrame(series.copy())

    data.columns = ["y"]

    # lags of series

    for i in range(lag_start, lag_end):

        data["lag_{}".format(i)] = data.y.shift(i)

    # datetime features

    # data.index = data.index.to_datetime()

    data["hour"] = data.index.hour

    data["weekday"] = data.index.weekday

    data['is_weekend'] = data.weekday.isin([5, 6]) * 1

    if target_encoding:

        # calculate averages on train set only

        test_index = int(len(data.dropna()) * (1 - test_size))

        data['weekday_average'] = list(map(

            code_mean(data[:test_index], 'weekday', "y").get, data.weekday))

        data["hour_average"] = list(map(

            code_mean(data[:test_index], 'hour', "y").get, data.hour))

        # frop encoded variables

        data.drop(["hour", "weekday"], axis=1, inplace=True)

    # train-test split

    y = data.dropna().y

    X = data.dropna().drop(['y'], axis=1)

    X_train, X_test, y_train, y_test = \

        timeseries_train_test_split(X, y, test_size=test_size)

    return X_train, X_test, y_train, y_test

def plt_linear():

    data = pd.read_csv('raw_data.csv',

                                       usecols=['timestamp', 'count'])

    data['timestamp'] = pd.to_datetime(data['timestamp'])

    data.set_index("timestamp", drop=True, inplace=True)

    data.rename(columns={'count': 'y'}, inplace=True)

    X_train, X_test, y_train, y_test = \

        prepareData(data, lag_start=6, lag_end=25, test_size=0.3, target_encoding=True)

    scaler = StandardScaler()

    X_train_scaled = scaler.fit_transform(X_train)

    X_test_scaled = scaler.transform(X_test)

    # lr = LinearRegression()

    lr = LassoCV(cv=tscv)

    # lr = RidgeCV(cv=tscv)

    # lr.fit(X_train_scaled, y_train)

    """

    from xgboost import XGBRegressor

    lr = XGBRegressor()

    # lr = xgb.XGBRegressor(max_depth=5, learning_rate=0.1, n_estimators=160, silent=False, objective='reg:gamma')

    """

    lr.fit(X_train_scaled, y_train)

    """

    IMPORTANT

    Generally tree-based models poorly handle trends in data, compared to linear models,

    so you have to detrend your series first or use some tricks to make the magic happen.

    Ideally make the series stationary and then use XGBoost, for example, you can forecast

    trend separately with a linear model and then add predictions from xgboost to get final forecast.

    """

    plotModelResults(lr, X_train=X_train_scaled, X_test=X_test_scaled,  y_train=y_train, y_test=y_test, plot_intervals=True, plot_anomalies=True)

    plotCoefficients(lr, X_train=X_train)

plt_linear()

效果图：用于异常检测还是不错的！！！

可以看到 hour和day的avg 这两个特征影响还是比较大的。

Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python

Hi there!We continue our open machine learning course with a new article on time series.

Let’s take a look at how to work with time series in Python, what methods and models we can use for prediction; what’s double and triple exponential smoothing; what to do if stationarity is not you favorite game; how to build SARIMA and stay alive; how to make predictions using xgboost. All of this will be applied to (harsh) real world example.

Article outline

Introduction
— Basic definitions
— Quality metrics
Move, smoothe, evaluate
— Rolling window estimations
— Exponential smoothing, Holt-Winters model
— Time-series cross validation, parameters selection
Econometric approach
— Stationarity, unit root
— Getting rid of non-stationarity
— SARIMA intuition and model building
Linear (and not quite) models on time series
— Feature extraction
— Linear models, feature importance
— Regularization, feature selection
— XGBoost
Assignment #9

The following content is better viewed and reproduced as a Jupyter-notebook

In
my day to day job I encounter time series-connected tasks almost every
day. The most frequent question is — what will happen with our metrics
in the next day/week/month/etc. — how many players will install the app,
how much time will they spend online, how many actions users will do,
and so forth. We can approach prediction task using different methods,
depending on the required quality of the prediction, length of the
forecasted period, and, of course, time we have to choose features and
tune parameters to achieve desired results.

Introduction

Small definition of time series:
Time series — is a series of data points indexed (or listed or graphed) in time order.

Therefore
data is organized around relatively deterministic timestamps, and
therefore, compared to random samples, may contain additional
information that we will try to extract.

Let’s import some libraries. First and foremost we will need statsmodels
library that has tons of statistical modeling functions, including time
series. For R afficionados (that had to move to python) statsmodels
will definitely look familiar as it supports model definitions like
‘Wage ~ Age + Education’.

As an example let’s use some real mobile game data on hourly ads watched by players and daily in-game currency spent:

Forecast quality metrics

Before
actually forecasting, let’s understand how to measure the quality of
predictions and have a look at the most common and widely used metrics

R squared, coefficient of determination (in econometrics it can be interpreted as a percentage of variance explained by the model), (-inf, 1] sklearn.metrics.r2_score
Mean Absolute Error, it is an interpretable metric because it has the same unit of measurement as the initial series, [0, +inf)
sklearn.metrics.mean_absolute_error
Median Absolute Error, again an interpretable metric, particularly interesting because it is robust to outliers, [0, +inf)
sklearn.metrics.median_absolute_error
Mean Squared Error, most commonly used, gives higher penalty to big mistakes and vise versa, [0, +inf)
sklearn.metrics.mean_squared_error
Mean Squared Logarithmic Error,
practically the same as MSE but we initially take logarithm of the
series, as a result we give attention to small mistakes as well, usually
is used when data has exponential trends, [0, +inf)
sklearn.metrics.mean_squared_log_error
Mean Absolute Percentage Error, same as MAE but percentage, — very convenient when you want to explain the quality of the model to your management, [0, +inf), not implemented in sklearn

Excellent,
now we know how to measure the quality of the forecasts, what metrics
can we use and how to translate the results to the boss. Little thing is
left — building the model.

Move, smoothe, evaluate

Let’s
start with a naive hypothesis — “tomorrow will be the same as today”,
but instead of a model like ŷ(t)=y(t−1) (which is actually a great
baseline for any time series prediction problems and sometimes it’s
impossible to beat it with any model) we’ll assume that the future value
of the variable depends on the average n of its previous values and therefore we’ll use moving average.

Out: 116805.0

from __future__ import division, print_function, absolute_import

import tflearn

import numpy as np

import math

import matplotlib

matplotlib.use('Agg')

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

from sklearn.metrics import r2_score, median_absolute_error, mean_absolute_error

from sklearn.metrics import median_absolute_error, mean_squared_error, mean_squared_log_error

def mean_absolute_percentage_error(y_true, y_pred):

    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def moving_average(series, n):

    """

        Calculate average of last n observations

    """

    return np.average(series[-n:])

def plotMovingAverage(window, plot_intervals=True, scale=1.96, plot_anomalies=True):

    """

        series - dataframe with timeseries

        window - rolling window size

        plot_intervals - show confidence intervals

        plot_anomalies - show anomalies

    """

    twitter_example_data = pd.read_csv('raw_data.csv',

                                       usecols=['count'])

                                       #usecols=['timestamp', 'count'])

    series = twitter_example_data

    rolling_mean = series.rolling(window=window, min_periods=1).mean()

    plt.figure(figsize=(20, 4))

    # plt.figure(figsize=(15, 5))

    plt.title("Moving average\n window size = {}".format(window))

    plt.plot(rolling_mean[window:], "g", label="Rolling mean trend")

    # Plot confidence intervals for smoothed values

    if plot_intervals:

        mae = mean_absolute_error(series[window:], rolling_mean[window:])

        deviation = np.std(series[window:] - rolling_mean[window:])

        lower_bond = rolling_mean - (mae + scale * deviation)

        upper_bond = rolling_mean + (mae + scale * deviation)

        plt.plot(upper_bond, "b--", label="Upper Bond / Lower Bond")

        plt.plot(lower_bond, "b--")

        # Having the intervals, find abnormal values

        if plot_anomalies:

            anomalies = pd.DataFrame(index=series.index, columns=series.columns)

            anomalies[series < lower_bond] = series[series < lower_bond]

            anomalies[series > upper_bond] = series[series > upper_bond]

            plt.plot(anomalies, "ro", markersize=10)

    plt.plot(series[window:], 'k',label="Actual values")

    plt.legend(loc="upper left")

    plt.grid(True)

    # plt.show()

    plt.savefig('MA_rolling.png')

plotMovingAverage(4, True, 1.96, True)

应用twitter数据的效果图：

可以看到上下界比较平稳，从而检测出较多的异常点。整体来说MA方法当window比较大时，适合趋势预测。

Unfortunately we can’t make this prediction long-term — to get one for the next step we need the previous value to be actually observed. But moving average has another use case — smoothing of the original time series to indicate trends. Pandas has an implementation available DataFrame.rolling(window).mean(). The wider the window - the smoother will be the trend. In the case of the very noisy data, which can be very often encountered in finance, this procedure can help to detect common patterns.

Smoothing by last 4 hours
plotMovingAverage(ads, 4)

Smoothing by last 12 hours
plotMovingAverage(ads, 12)

Smoothing by 24 hours — we get daily trend
plotMovingAverage(ads, 24)

As
you can see, applying daily smoothing on hour data allowed us to
clearly see the dynamics of ads watched. During the weekends the values
are higher (weekends — time to play) and weekdays are generally lower.

We can also plot confidence intervals for our smoothed values
plotMovingAverage(ads, 4, plot_intervals=True)

And
now let’s create a simple anomaly detection system with the help of the
moving average. Unfortunately, in this particular series everything is
more or less normal, so we’ll intentionally make one of the values
abnormal in the dataframe ads_anomaly

Let’s see, if this simple method can catch the anomaly
plotMovingAverage(ads_anomaly, 4, plot_intervals=True, plot_anomalies=True)

Neat! What about the second series (with weekly smoothing)?
plotMovingAverage(currency, 7, plot_intervals=True, plot_anomalies=True)

Oh
no! Here is the downside of our simple approach — it did not catch
monthly seasonality in our data and marked almost all 30-day peaks as an
anomaly. If you don’t want to have that many false alarms — it’s best
to consider more complex models.

这就是MA方法的局限，它无法捕获到一些周期性（季节性）的数据尖峰，而单纯地认为是异常点。

Weighted average
is a simple modification of the moving average, inside of which
observations have different weights summing up to one, usually more
recent observations have greater weight.

Out: 98423.0

Exponential smoothing

And now let’s take a look at what happens if instead of weighting the last nn values of the time series we start weighting all available observations while exponentially decreasing weights as we move further back in historical data. There’s a formula of the simple exponential smoothing that will help us in that:

Here the model value is a weighted average between the current true value and the previous model values. The α weight is called a smoothing factor. It defines how quickly we will “forget” the last available true observation. The less α is the more influence previous model values have, and the smoother the series is.

Exponentiality is hiding in the recursivity of the function — we multiply each time (1−α) by the previous model value which, in its turn, also containes (1−α) and so forth until the very beginning.

from __future__ import division, print_function, absolute_import

import tflearn

import numpy as np

import math

import matplotlib

matplotlib.use('Agg')

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

from sklearn.metrics import r2_score, median_absolute_error, mean_absolute_error

from sklearn.metrics import median_absolute_error, mean_squared_error, mean_squared_log_error

def mean_absolute_percentage_error(y_true, y_pred):

    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def exponential_smoothing(series, alpha):

    """

        series - dataset with timestamps

        alpha - float [0.0, 1.0], smoothing parameter

    """

    result = [series[0]]  # first value is same as series

    for n in range(1, len(series)):

        result.append(alpha * series[n] + (1 - alpha) * result[n - 1])

    return result

def plotExponentialSmoothing(series, alphas):

    """

        Plots exponential smoothing with different alphas

        series - dataset with timestamps

        alphas - list of floats, smoothing parameters

    """

    with plt.style.context('seaborn-white'):

        plt.figure(figsize=(15, 7))

        for alpha in alphas:

            plt.plot(exponential_smoothing(series, alpha), label="Alpha {}".format(alpha))

        plt.plot(series, "c", label="Actual")

        plt.legend(loc="best")

        plt.axis('tight')

        plt.title("Exponential Smoothing")

        plt.grid(True);

        plt.savefig('exp_smooth.png')

def plt_exp_smooth():

    twitter_example_data = pd.read_csv('raw_data.csv',

                                       usecols=['count'])

                                       #usecols=['timestamp', 'count'])

    series = list(twitter_example_data['count'])

    plotExponentialSmoothing(series, [0.3, 0.05])

plt_exp_smooth()

方法也比较简单，加权的参数选择需要自己去尝试：

Double exponential smoothing

Until now all we could get from our methods in the best case was just a single future point prediction (and also some nice smoothing), that’s cool but not enough, so let’s extend exponential smoothing so that we can predict two future points (of course, we also get some smoothing).

Series decomposition should help us — we obtain two components: intercept (also, level) ℓ and trend (also, slope) b. We’ve learnt to predict intercept (or expected series value) using previous methods, and now we will apply the same exponential smoothing to the trend, believing naively or perhaps not that the future direction of the time series changes depends on the previous weighted changes.

As a result we get a set of functions. The first one describes intercept, as before it depends on the current value of the series, and the second term is now split into previous values of the level and of the trend. The second function describes trend — it depends on the level changes at the current step and on the previous value of the trend. In this case β coefficient is a weight in the exponential smoothing. The final prediction is the sum of the model values of the intercept and trend.

Now we have to tune two parameters — α and β. The former is responsible for the series smoothing around trend, and the latter for the smoothing of the trend itself. The bigger the values, the more weight the latest observations will have and the less smoothed the model series will be. Combinations of the parameters may produce really weird results, especially if set manually. We’ll look into choosing parameters automatically in a bit, immediately after triple exponential smoothing.

加权的MA方法，

import numpy as np

import matplotlib

matplotlib.use('Agg')

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

from sklearn.metrics import r2_score, median_absolute_error, mean_absolute_error

from sklearn.metrics import median_absolute_error, mean_squared_error, mean_squared_log_error

def mean_absolute_percentage_error(y_true, y_pred):

    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

def exponential_smoothing(series, alpha):

    """

        series - dataset with timestamps

        alpha - float [0.0, 1.0], smoothing parameter

    """

    result = [series[0]]  # first value is same as series

    for n in range(1, len(series)):

        result.append(alpha * series[n] + (1 - alpha) * result[n - 1])

    return result

def double_exponential_smoothing(series, alpha, beta):

    """

        series - dataset with timeseries

        alpha - float [0.0, 1.0], smoothing parameter for level

        beta - float [0.0, 1.0], smoothing parameter for trend

    """

    # first value is same as series

    result = [series[0]]

    for n in range(1, len(series) + 1):

        if n == 1:

            level, trend = series[0], series[1] - series[0]

        if n >= len(series):  # forecasting

            value = result[-1]

        else:

            value = series[n]

        last_level, level = level, alpha * value + (1 - alpha) * (level + trend)

        trend = beta * (level - last_level) + (1 - beta) * trend

        result.append(level + trend)

    return result

def plotDoubleExponentialSmoothing(series, alphas, betas):

    """

        Plots double exponential smoothing with different alphas and betas

        series - dataset with timestamps

        alphas - list of floats, smoothing parameters for level

        betas - list of floats, smoothing parameters for trend

    """

    with plt.style.context('seaborn-white'):

        plt.figure(figsize=(20, 8))

        for alpha in alphas:

            for beta in betas:

                plt.plot(double_exponential_smoothing(series, alpha, beta),

                         label="Alpha {}, beta {}".format(alpha, beta))

        plt.plot(series, label="Actual")

        plt.legend(loc="best")

        plt.axis('tight')

        plt.title("Double Exponential Smoothing")

        plt.grid(True)

        plt.savefig("double_smooth.png")

def plt_double_smooth():

    twitter_example_data = pd.read_csv('raw_data.csv',

                                       usecols=['count'])

    series = list(twitter_example_data['count'])

    plotDoubleExponentialSmoothing(series, alphas=[0.9, 0.02], betas=[0.9, 0.02])

plt_double_smooth()

加权参数需要自己去调试。

Triple exponential smoothing a.k.a. Holt-Winters

Hooray! We’ve successfully reached our next variant of exponential smoothing, this time triple.

也就是说季节性这种比较明显的，前面的方法已经搞不定了。所以Hot-winters方法是可以预测趋势+周期性的，这是为啥我觉得它比较ARIMA强的地方，因为它不需要ARIMA那样去进行复杂的数据预处理，参数选择直接使用交叉验证就可以搞定。

The idea of this method is that we add another, third component — seasonality. This means we should’t use the method if our time series do not have seasonality, which is not the case in our example. Seasonal component in the model will explain repeated variations around intercept and trend, and it will be described by the length of the season, in other words by the period after which variations repeat. For each observation in the season there’s a separate component, for example, if the length of the season is 7 (weekly seasonality), we will have 7 seasonal components, one for each day of the week.

Now we get a new system:

Intercept now depends on the current value of the series minus corresponding seasonal component, trend stays unchanged, and the seasonal component depends on the current value of the series minus intercept and on the previous value of the component. Please take into account that the component is smoothed through all the available seasons, for example, if we have a Monday component then it will only be averaged with other Mondays. You can read more on how averaging works and how initial approximation of the trend and seasonal components is done here. Now that we have seasonal component we can predict not one and not even two but arbitrary mm future steps which is very encouraging.

Below is the code for a triple exponential smoothing model, also known by the last names of its creators — Charles Holt and his student Peter Winters. Additionally Brutlag method was included into the model to build confidence intervals:

where T is the length of the season, d is the predicted deviation, and the other parameters were taken from the triple exponential smoothing. You can read more about the method and its applicability to anomalies detection in time series here.

Time series cross validation

Before we start building model let’s talk first about how to estimate model parameters automatically.

There’s nothing unusual here, as always we have to choose a loss function suitable for the task, that will tell us how close the model approximates data. Then using cross-validation we will evaluate our chosen loss function for given model parameters, calculate gradient, adjust model parameters and so forth, bravely descending to the global minimum of error.

The question is how to do cross-validation on time series, because, you know, time series do have time structure and one just can’t randomly mix values in a fold without preserving this structure, otherwise all time dependencies between observations will be lost. That’s why we will have to use a bit more tricky approach to optimization of the model parameters, I don’t know if there’s an official name to it but on CrossValidated, where one can find all the answers but the Answer to the Ultimate Question of Life, the Universe, and Everything, “cross-validation on a rolling basis” was proposed as a name.

The idea is rather simple — we train our model on a small segment of the time series, from the beginning until some t, make predictions for the next t+n steps and calculate an error. Then we expand our training sample until t+n value and make predictions from t+n until t+2∗n, and we continue moving our test segment of the time series until we hit the last available observation. As a result we have as many folds as many n will fit between the initial training sample and the last observation.

Now, knowing how to set cross-validation, we will find optimal parameters for the Holt-Winters model, recall that we have daily seasonality in ads, hence the slen=24 parameter

In the Holt-Winters model, as well as in the other models of exponential smoothing, there’s a constraint on how big smoothing parameters could be, each of them is in the range from 0 to 1, therefore to minimize loss function we have to choose an algorithm that supports constraints on model parameters, in our case — Truncated Newton conjugate gradient.

0.11652680227350454 0.002677697431105852 0.05820973606789237
CPU times: user 1.96 s, sys: 17.3 ms, total: 1.98 s
Wall time: 2 s

Chart rendering code

Judging by the chart, our model was able to successfully approximate the initial time series, catching daily seasonality, overall downwards trend and even some anomalies. If you take a look at the modeled deviation, you can clearly see that the model reacts quite sharply to the changes in the structure of the series but then quickly returns deviation to the normal values, “forgetting” the past. This feature of the model allows us to quickly build anomaly detection systems even for quite noisy series without spending too much time and money on preparing data and training the model.

Deviation increases when we have anomalies

We’ll apply the same algorithm for the second series which, as we know, has trend and 30-day seasonality

0.012841445048055122 0.04883371471892228 0.00943678056045777
CPU times: user 3.03 s, sys: 24.8 ms, total: 3.05 s
Wall time: 3.11 s

Looks quite adequate, model has caught both upwards trend and seasonal spikes and overall fits our values nicely

Caught some anomalies as well

Deviations increases as forecast period moves further

Hot-winters的demo代码：

import numpy as np

import matplotlib

matplotlib.use('Agg')

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

from sklearn.metrics import r2_score, median_absolute_error, mean_absolute_error

from sklearn.metrics import median_absolute_error, mean_squared_error, mean_squared_log_error

def mean_absolute_percentage_error(y_true, y_pred):

    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

class HoltWinters:

    """

    Holt-Winters model with the anomalies detection using Brutlag method

    # series - initial time series

    # slen - length of a season

    # alpha, beta, gamma - Holt-Winters model coefficients

    # n_preds - predictions horizon

    # scaling_factor - sets the width of the confidence interval by Brutlag (usually takes values from 2 to 3)

    """

    def __init__(self, series, slen, alpha, beta, gamma, n_preds, scaling_factor=1.96):

        self.series = series

        self.slen = slen

        self.alpha = alpha

        self.beta = beta

        self.gamma = gamma

        self.n_preds = n_preds

        self.scaling_factor = scaling_factor

    def initial_trend(self):

        sum = 0.0

        for i in range(self.slen):

            sum += float(self.series[i + self.slen] - self.series[i]) / self.slen

        return sum / self.slen

    def initial_seasonal_components(self):

        seasonals = {}

        season_averages = []

        n_seasons = int(len(self.series) / self.slen)

        # let's calculate season averages

        for j in range(n_seasons):

            season_averages.append(sum(self.series[self.slen * j:self.slen * j + self.slen]) / float(self.slen))

        # let's calculate initial values

        for i in range(self.slen):

            sum_of_vals_over_avg = 0.0

            for j in range(n_seasons):

                sum_of_vals_over_avg += self.series[self.slen * j + i] - season_averages[j]

            seasonals[i] = sum_of_vals_over_avg / n_seasons

        return seasonals

    def triple_exponential_smoothing(self):

        self.result = []

        self.Smooth = []

        self.Season = []

        self.Trend = []

        self.PredictedDeviation = []

        self.UpperBond = []

        self.LowerBond = []

        seasonals = self.initial_seasonal_components()

        for i in range(len(self.series) + self.n_preds):

            if i == 0:  # components initialization

                smooth = self.series[0]

                trend = self.initial_trend()

                self.result.append(self.series[0])

                self.Smooth.append(smooth)

                self.Trend.append(trend)

                self.Season.append(seasonals[i % self.slen])

                self.PredictedDeviation.append(0)

                self.UpperBond.append(self.result[0] +

                                      self.scaling_factor *

                                      self.PredictedDeviation[0])

                self.LowerBond.append(self.result[0] -

                                      self.scaling_factor *

                                      self.PredictedDeviation[0])

                continue

            if i >= len(self.series):  # predicting

                m = i - len(self.series) + 1

                self.result.append((smooth + m * trend) + seasonals[i % self.slen])

                # when predicting we increase uncertainty on each step

                self.PredictedDeviation.append(self.PredictedDeviation[-1] * 1.01)

            else:

                val = self.series[i]

                last_smooth, smooth = smooth, self.alpha * (val - seasonals[i % self.slen]) + (1 - self.alpha) * (

                            smooth + trend)

                trend = self.beta * (smooth - last_smooth) + (1 - self.beta) * trend

                seasonals[i % self.slen] = self.gamma * (val - smooth) + (1 - self.gamma) * seasonals[i % self.slen]

                self.result.append(smooth + trend + seasonals[i % self.slen])

                # Deviation is calculated according to Brutlag algorithm.

                self.PredictedDeviation.append(self.gamma * np.abs(self.series[i] - self.result[i])

                                               + (1 - self.gamma) * self.PredictedDeviation[-1])

            self.UpperBond.append(self.result[-1] +

                                  self.scaling_factor *

                                  self.PredictedDeviation[-1])

            self.LowerBond.append(self.result[-1] -

                                  self.scaling_factor *

                                  self.PredictedDeviation[-1])

            self.Smooth.append(smooth)

            self.Trend.append(trend)

            self.Season.append(seasonals[i % self.slen])

def plt_hot_winter():

    twitter_example_data = pd.read_csv('raw_data.csv',

                                       usecols=['count'])

    data = twitter_example_data['count']

    # data = ads.Ads[:-20]  # leave some data for testing

    # initializing model parameters alpha, beta and gamma

    # x = [0, 0, 0]

    #

    # # Minimizing the loss function

    # opt = minimize(timeseriesCVscore, x0=x,

    #                args=(data, mean_squared_log_error),

    #                method="TNC", bounds=((0, 1), (0, 1), (0, 1))

    #                )

    # Take optimal values...

    # alpha_final, beta_final, gamma_final = opt.x

    # print(alpha_final, beta_final, gamma_final)

    # !!!!!!!!!!!!!!! use girdCV to choose best ARG  !!!!!!!!!!!!!!!!

    alpha_final, beta_final, gamma_final = 0.11652680227350454,0.002677697431105852,0.05820973606789237

    # ...and train the model with them, forecasting for the next 50 hours

    model = HoltWinters(data, slen=1440,

                        alpha=alpha_final,

                        beta=beta_final,

                        gamma=gamma_final,

                        n_preds=50, scaling_factor=6)

    model.triple_exponential_smoothing()

    def plotHoltWinters(series, plot_intervals=False, plot_anomalies=False):

        """

            series - dataset with timeseries

            plot_intervals - show confidence intervals

            plot_anomalies - show anomalies

        """

        plt.figure(figsize=(20, 10))

        plt.plot(model.result, label="Model")

        plt.plot(series.values, label="Actual")

        error = mean_absolute_percentage_error(series.values, model.result[:len(series)])

        plt.title("Mean Absolute Percentage Error: {0:.2f}%".format(error))

        if plot_anomalies:

            anomalies = np.array([np.NaN] * len(series))

            anomalies[series.values < model.LowerBond[:len(series)]] = \

                series.values[series.values < model.LowerBond[:len(series)]]

            anomalies[series.values > model.UpperBond[:len(series)]] = \

                series.values[series.values > model.UpperBond[:len(series)]]

            plt.plot(anomalies, "o", markersize=10, label="Anomalies")

        if plot_intervals:

            plt.plot(model.UpperBond, "r--", alpha=0.5, label="Up/Low confidence")

            plt.plot(model.LowerBond, "r--", alpha=0.5)

            plt.fill_between(x=range(0, len(model.result)), y1=model.UpperBond,

                             y2=model.LowerBond, alpha=0.2, color="grey")

        plt.vlines(len(series), ymin=min(model.LowerBond), ymax=max(model.UpperBond), linestyles='dashed')

        plt.axvspan(len(series) - 20, len(model.result), alpha=0.3, color='lightgrey')

        plt.grid(True)

        plt.axis('tight')

        plt.legend(loc="best", fontsize=13);

        plt.savefig("hot_winter.png")

    plotHoltWinters(data, plot_intervals=True, plot_anomalies=True)

plt_hot_winter()

Econometric approach

Stationarity

Before we start modeling we should mention such an important property of time series as stationarity.

If the process is stationary that means it doesn’t change its statistical properties over time, namely mean and variance do not change over time (constancy of variance is also called homoscedasticity), also covariance function does not depend on the time (should only depend on the distance between observations). You can see this visually on the pictures from the post of Sean Abu:

The red graph below is not stationary because the mean increases over time.

We were unlucky with the variance, see the varying spread of values over time

Finally, the covariance of the i th term and the (i + m) th term should not be a function of time. In the following graph, you will notice the spread becomes closer as the time increases. Hence, the covariance is not constant with time for the right chart.

So why stationarity is so important? Because it’s easy to make predictions on the stationary series as we assume that the future statistical properties will not be different from the currently observed. Most of the time series models in one way or the other model and predict those properties (mean or variance, for example), that’s why predictions would be wrong if the original series were not stationary. Unfortunately most of the time series we see outside of textbooks are non-stationary but we can (and should) change this.

So, to fight non-stationarity we have to know our enemy so to say. Let’s see how to detect it. To do that we will now take a look at the white noise and random walks and we will learn how to get from one to another for free, without registration and SMS.

White noise chart:

So the process generated by standard normal distribution is stationary and oscillates around 0 with with deviation of 1. Now based on this process we will generate a new one where each next value will depend on the previous one: x(t)=ρ*x(t−1)+e(t)

Chart rendering code

On the first chart you can see the same stationary white noise you’ve seen before. On the second one the value of ρρ increased to 0.6, as a result wider cycles appeared on the chart but overall it is still stationary. The third chart deviates even more from the 0 mean but still oscillates around it. Finally, the value of ρ equal to 1 gives us a random walk process — non-stationary time series.

This happens because after reaching the critical value the series x(t)=ρ*x(t−1)+e(t) does not return to its mean value. If we subtract x(t−1) from the left and the right side we will get x(t)−x(t−1)=(ρ−1)*x(t−1)+e(t), where the expression on the left is called the first difference. If ρ=1 then the first difference gives us stationary white noise e(t). This fact is the main idea of the Dickey-Fuller test for the stationarity of time series (presence of a unit root). If we can get stationary series from non-stationary using the first difference we call those series integrated of order 1. Null hypothesis of the test — time series is non-stationary, was rejected on the first three charts and was accepted on the last one. We’ve got to say that the first difference is not always enough to get stationary series as the process might be integrated of order d, d > 1 (and have multiple unit roots), in such cases the augmented Dickey-Fuller test is used that checks multiple lags at once.

We can fight non-stationarity using different approaches — various order differences, trend and seasonality removal, smoothing, also using transformations like Box-Cox or logarithmic.

Getting rid of non-stationarity and building SARIMA

Now let’s build an ARIMA model by walking through all the circles of hell stages of making series stationary.

Chart rendering code

Surprisingly, initial series are stationary, Dickey-Fuller test rejected null hypothesis that a unit root is present. Actually, it can be seen on the plot itself — we don’t have a visible trend, so mean is constant, variance is pretty much stable throughout the series. The only thing left is seasonality which we have to deal with before modelling. To do so let’s take “seasonal difference” which means a simple subtraction of series from itself with a lag that equals the seasonal period.

That’s better, visible seasonality is gone, however autocorrelation function still has too many significant lags. To remove them we’ll take first differences — subtraction of series from itself with lag 1

Perfect! Our series now look like something undescribable, oscillating around zero, Dickey-Fuller indicates that it’s stationary and the number of significant peaks in ACF has dropped. We can finally start modelling!

ARIMA-family Crash-Course

A few words about the model. Letter by letter we’ll build the full name — SARIMA(p,d,q)(P,D,Q,s), Seasonal Autoregression Moving Average model:

AR(p) — autoregression model, i.e., regression of the time series onto itself. Basic assumption — current series values depend on its previous values with some lag (or several lags). The maximum lag in the model is referred to as p. To determine the initial p you need to have a look at PACF plot — find the biggest significant lag, after which most other lags are becoming not significant.
MA(q) — moving average model. Without going into detail it models the error of the time series, again the assumption is — current error depends on the previous with some lag, which is referred to as q. Initial value can be found on ACF plot with the same logic.

Let’s have a small break and combine the first 4 letters:

AR(p) + MA(q) = ARMA(p,q)

What we have here is the Autoregressive–moving-average model! If the series is stationary, it can be approximated with those 4 letters. Shall we continue?

I(d)— order of integration. It is simply the number of nonseasonal differences needed for making the series stationary. In our case it’s just 1, because we used first differences.

Adding this letter to four previous gives us ARIMA model which knows how to handle non-stationary data with the help of nonseasonal differences. Awesome, last letter left!

S(s) — this letter is responsible for seasonality and equals the season period length of the series

After attaching the last letter we find out that instead of one additional parameter we get three in a row — (P,D,Q)

P — order of autoregression for seasonal component of the model, again can be derived from PACF, but this time you need to look at the number of significant lags, which are the multiples of the season period length, for example, if the period equals 24 and looking at PACF we see 24-th and 48-th lags are significant, that means initial P should be 2.
Q — same logic, but for the moving average model of the seasonal component, use ACF plot
D — order of seasonal integration. Can be equal to 1 or 0, depending on whether seasonal differences were applied or not

Now, knowing how to set initial parameters, let’s have a look at the final plot once again and set the parameters:
tsplot(ads_diff[24+1:], lags=60)

p is most probably 4, since it’s the last significant lag on PACF after which most others are becoming not significant.
d just equals 1, because we had first differences
q should be somewhere around 4 as well as seen on ACF
P might be 2, since 24-th and 48-th lags are somewhat significant on PACF
D again equals 1 — we performed seasonal differentiation
Q is probably 1, 24-th lag on ACF is significant, while 48-th is not

Now we want to test various models and see which one is better

Out: 54

CPU times: user 41.3 s, sys: 2.76 s, total: 44.1 s
Wall time: 37.4 s

Let’s inspect the residuals of the model
tsplot(best_model.resid[24+1:], lags=60)

Well,
it’s clear that the residuals are stationary, there are no apparent
autocorrelations, let’s make predictions using our model

In
the end we got quite adequate predictions, our model on average was
wrong by 4.01%, which is very very good, but overall costs of preparing
data, making series stationary and brute-force parameters selecting
might not be worth this accuracy.

Linear (and not quite) models on time series

Small lyrical digression again. Often in my job I have to build models with the only principle guiding me known as fast, good, cheap.
That means some of the models will never be “production ready” as they
demand too much time for the data preparation (for example, SARIMA), or
require frequent re-training on new data (again, SARIMA), or are
difficult to tune (good example — SARIMA), so it’s very often much
easier to select a couple of features from the existing time series and
build a simple linear regression or, say, a random forest. Good and
cheap.

Maybe
this approach is not backed up by theory, breaks different assumptions
(like, Gauss-Markov theorem, especially about the errors being
uncorrelated), but it’s very useful in practice and quite frequently
used in machine learning competitions.

Feature exctraction

Alright, model needs features and all we have is a 1-dimentional time series to work with. What features can we exctract?

Lags of time series, of course

Window statistics:

Max/min value of series in a window
Average/median value in a window
Window variance
etc.

Date and time features:

Minute of an hour, hour of a day, day of the week, you get it
Is this day a holiday? Maybe something special happened? Make it a boolean feature

Target encoding

Forecasts from other models (though we can lose the speed of prediction this way)

Let’s run through some of the methods and see what we can extract from our ads series

Lags of time series

Shifting the series n steps back we get a feature column where the current value of time series is aligned with its value at the time t−n.
If we make a 1 lag shift and train a model on that feature, the model
will be able to forecast 1 step ahead having observed current state of
the series. Increasing the lag, say, up to 6 will allow the model to
make predictions 6 steps ahead, however it will use data, observed 6
steps back. If something fundamentally changes the series during that
unobserved period, the model will not catch the changes and will return
forecasts with big error. So, during the initial lag selection one has
to find a balance between the optimal prediction quality and the length
of forecasting horizon.

Wonderful, we got ourselves a dataset here, why don’t we train a model?

Well,
simple lags and linear regression gave us predictions that are not that
far from SARIMA in quality. There are lot’s of unnecessary features,
but we’ll do feature selection a bit later. Now let’s continue
engineering!

We’ll
add into our dataset hour, day of the week and boolean for the weekend.
To do so we need to transform current dataframe index into datetime format and exctract hour and weekday out of it.

We can visualize the resulting features

Blue spiky line — hour feature, green ladder — weekday, red bump — weekends!

Since
now we have different scales of variables — thousands for lag features
and tens for categorical, it’s reasonable to transform them into same
scale to continue exploring feature importances and
later — regularization.

Test error goes down a little bit and judging by the coefficients plot we can say that weekday and is_weekend are rather useful features

Target encoding

I’d
like to add another variant of encoding categorical variables — by mean
value. If it’s undesirable to explode dataset by using tons of dummy
variables that can lead to the loss of information about the distance,
and if they can’t be used as real values because of the conflicts like
“0 hours < 23 hours”, then it’s possible to encode a variable with a
little bit more interpretable values. Natural idea is to encode with the
mean value of the target variable. In our example every day of the week
and every hour of the day can be encoded by the corresponding average
number of ads watched during that day or hour. It’s very important to
make sure that the mean value is calculated over train set only (or over
current cross-validation fold only), so that the model is not aware of
the future.

Let’s have a look at hour averages

Finally, put all the transformations together in a single function

Here comes overfitting! Hour_average
variable was so great on train dataset that the model decided to
concentrate all its forces on it - as a result the quality of prediction
dropped. This problem can be approached in a variety of ways, for
example, we can calculate target encoding not for the whole train set,
but for some window instead, that way encodings from the last observed
window will probably describe current series state better. Or we can
just drop it manually, since we're sure here it makes things only worse.

Regularization and feature selection

As
we already know, not all features are equally healthy, some may lead to
overfitting and should be removed. Besides manual inspecting we can
apply regularization. Two most popular regression models with
regularization are Ridge and Lasso regressions. They both add some more
constrains to our loss function.

In case of Ridge regression — those
constrains are the sum of squares of coefficients, multiplied by the
regularization coefficient. I.e. the bigger coefficient feature
has — the bigger our loss will be, hence we will try to optimize the
model while keeping coefficients fairly low.

As a result of such regularization which has a proud name L2 we’ll have higher bias and lower variance, so the model will generalize better (at least that’s what we hope will happen).

Second model — Lasso regression,
here we add to the loss function not squares but absolute values of the
coefficients, as a result during the optimization process coefficients
of unimportant features may become zeroes, so Lasso regression allows
for automated feature selection. This regularization type is called L1.

First, make sure we have things to drop and data truly has highly correlated features

Prettier than some modern art

We
can clearly see how coefficients are getting closer and closer to zero
(thought never actually reach it) as their importance in the model drops

Lasso
regression turned out to be more conservative and removed 23-rd lag
from most important features (and also dropped 5 features completely)
which only made the quality of prediction better.

Boosting

Why not try XGBoost now?

Here is the winner! The smallest error on the test set among all the models we’ve tried so far.

Yet
this victory is decieving and it might not be the brightest idea to fit
xgboost as soon as you get your hands over time series data. Generally
tree-based models poorly handle trends in data, compared to linear
models, so you have to detrend your series first or use some tricks to
make the magic happen. Ideally — make the series stationary and then use
XGBoost, for example, you can forecast trend separately with a linear
model and then add predictions from xgboost to get final forecast.

Conclusion

We
got acquainted with different time series analysis and prediction
methods and approaches. Unfortunately, or maybe luckily, there’s no
silver bullet to solve this kind of problems. Methods developed in the
60s of the last century (and some even in the beginning of the XIX
century) are still popular along with the LSTM and RNN (not covered in
this article). Partially this is related to the fact that the prediction
task as any other data related task is creative in so many aspects and
definitely requires research. In spite of the large number of formal
quality metrics and approaches to parameters estimation, it’s often
required to seek and try something different for each time series. Last
but not least the balance between quality and cost is important. As a
good example SARIMA model mentioned here not once or twice can produce
spectacular results after due tuning but might require many hours of
tambourine dancing time series manipulation, as in the same time simple
linear regression model can be build in 10 minutes giving more or less
comparable results.

Assignment #9

Full
versions of assignments are announced each week in a new run of the
course (October 1, 2018). Meanwhile, you can practice with a demo
version: Kaggle Kernel, nbviewer.

Useful resources

Online textbook
of the advanced statistical forecasting course of the Duke
University — covers in details various smoothing techniques, linear and
ARIMA models
Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks — one of a few where random forest applicability to the tasks of time series forecasting is actively defended
Time Series Analysis (TSA) in Python — Linear Models to GARCH ARIMA models family and their applicability to the task of modeling financial indicators (Brian Christopher)

Author: Dmitry Sergeyev. Translated and edited by Borys Zibrov, and Yuanyuan Pao.

时间序列预测——深度好文，ARIMA是最难用的（数据预处理过程不适合工业应用），线性回归模型简单适用，预测趋势很不错，xgboost的话，不太适合趋势预测，如果数据平稳也可以使用。的更多相关文章

Python数据预处理之清及
使用Pandas进行数据预处理数据清洗中不是每一步都是必须的,按实际需求操作. 内容目录 1.数据的生成与导入 2.数据信息查看 2.1.查看整体数据信息 2.2.查看数据维度.列名称.数据格式 2 ...
终于有人把云计算、大数据和 AI 讲明白了【深度好文】
一个执着于技术的公众号我今天要讲这三个话题,一个是云计算,一个大数据,一个人工智能,我为什么要讲这三个东西呢?因为这三个东西现在非常非常的火,它们之间好像互相有关系,一般谈云计算的时候也会提到大数据 ...
TestNG详解-深度好文
转自: https://blog.csdn.net/lykangjia/article/details/56485295 TestNG详解-深度好文 2017年02月22日 14:51:52 阅读数: ...
深度好文：Linux系统内存知识
点击关注上方"开源Linux", 后台回复"读书",有我为您特别筛选书籍资料~ 相关阅读: 深度好文:Linux文件系统剖析 Linux 内存是后台开发人员,需 ...
Deep Learning 11_深度学习UFLDL教程：数据预处理（斯坦福大学深度学习教程）
理论知识:UFLDL数据预处理和http://www.cnblogs.com/tornadomeet/archive/2013/04/20/3033149.html 数据预处理是深度学习中非常重要的一 ...
TensorFlow从1到2（六）结构化数据预处理和心脏病预测
结构化数据的预处理前面所展示的一些示例已经很让人兴奋.但从总体看,数据类型还是比较单一的,比如图片,比如文本. 这个单一并非指数据的类型单一,而是指数据组成的每一部分,在模型中对于结果预测的影响基本 ...
(深度好文)重构CMDB，避免运维之耻
(深度好文)重构CMDB,避免运维之耻 CMDB,几乎是每个运维人都绕不过去的字眼,但又是很多运维人的痛,因为CMDB很少有成功的,因此我也把它称之为运维人的耻辱. 那么到底错在哪儿了?该如何去重构它 ...
tensorflow数据加载、模型训练及预测
数据集 DNN 依赖于大量的数据.可以收集或生成数据,也可以使用可用的标准数据集.TensorFlow 支持三种主要的读取数据的方法,可以在不同的数据集中使用:本教程中用来训练建立模型的一些数据集介绍 ...
如何预测股票分析--自动ARIMA
在上一篇中,我们发现knn和线性回归一样,表现的不是特别好,来看看时间序列的表现时间序列预测法其实是一种回归预测方法,属于定量预测,其基本原理是;一方面承认事物发展的延续性,运用过去时间序列的数据进 ...

随机推荐

01: awk常用
1.1 awk基本使用 1.找出当前登录用户数量 [root@localhost ~]# w 14:09:07 up 48 min, 2 users, load average: 0.00, 0.01 ...
20145333茹翔《网络对抗》Exp9 Web安全基础实践
20145333茹翔<网络对抗>Exp9 Web安全基础实践基础问题回答 1.SQL注入原理,如何防御 SQL注入就是通过把SQL命令插入到"Web表单递交"或&q ...
2018-2019-1 20189218《Linux内核原理与分析》第四周作业
构造简单的Linux内核显然用实验楼配好的环境做这个实验太简单了,按照没有困难制造困难也要上的原则,在自己的64位虚拟机上做这个实验. 按照课本(视频)上的步骤一直做下去,到编译生成init时出现了 ...
vs下C++内存泄露检测
本文原链接: http://www.cnblogs.com/zouzf/p/4152279.html 参考文章: http://msdn.microsoft.com/zh-cn/library/x98 ...
JAVA I/O（五）多线程网络Socket和ServerSocket
上篇文章介绍了Socket和ServerSocket简单使用和源码,服务器端会对每个客户端请求创建一个线程,为使服务器端能同时处理多个客户端请求,可以采用多线程的方式.本文仅对实例进行简单记录,如下. ...
Go第十一篇之编译与工具
Go 语言的工具链非常丰富,从获取源码.编译.文档.测试.性能分析,到源码格式化.源码提示.重构工具等应有尽有. 在 Go 语言中可以使用测试框架编写单元测试,使用统一的命令行即可测试及输出测试报告的 ...
ACM-ICPC 2018 徐州赛区网络预赛 J. Maze Designer 最大生成树 lca
大概就是要每两个点只能有一条路径,并且约束,最短的边用来砌墙,那么反之的意思就是最大的边用来穿过故最大生成树生成以后再用lca计算树上两点间的距离 (当然防止生成树是一条链,可以用树的重心作为 ...
入门dp总结
写这篇博文主要是为了归纳总结一下dp的有关问题(不定期更新,暑假应该会更的快一些) 会大概讲一下思路,不会事无巨细地讲另一篇是平时做过的一些dp题,这篇博客里面提到的题都有题解放在那边:https: ...
【TCP/IP详解卷一：协议】TCP的小结
前言:TCP学习的综述在学习TCP/IP协议的大头:TCP协议的过程中,遇到了很多机制和知识点,详解中更是用了足足8章的内容介绍它. TCP协议作为应用层和网络层中间的传输层协议,既要为 ...
深入Animation,在SurfaceView中照样使用Android—Tween Animation！
第一类:Frame By Frame 帧动画( 不推荐游戏开发中使用) 所谓帧动画,就是顺序播放事先做好的图像,类似于放电影: 分析: 此种方式类似我之 ...

时间序列预测——深度好文，ARIMA是最难用的（数据预处理过程不适合工业应用），线性回归模型简单适用，预测趋势很不错，xgboost的话，不太适合趋势预测，如果数据平稳也可以使用。

Open Machine Learning Course. Topic 9. Part 1. Time series analysis in Python

Article outline

Introduction

Forecast quality metrics

Move, smoothe, evaluate

Exponential smoothing

Double exponential smoothing

Triple exponential smoothing a.k.a. Holt-Winters

Time series cross validation

Econometric approach

Stationarity

Getting rid of non-stationarity and building SARIMA

ARIMA-family Crash-Course

Linear (and not quite) models on time series

Feature exctraction

Lags of time series

Target encoding

Regularization and feature selection

Boosting

Conclusion

Assignment #9

Useful resources

时间序列预测——深度好文，ARIMA是最难用的（数据预处理过程不适合工业应用），线性回归模型简单适用，预测趋势很不错，xgboost的话，不太适合趋势预测，如果数据平稳也可以使用。的更多相关文章

随机推荐

热门专题