RNNs在股票价格预测的应用

前言

RNN和LSTMs在时态数据上表现特别好,这就是为什么他们在语音识别上是有效的。我们通过前25天的开高收低价格,去预测下一时刻的前收盘价。每个时间序列通过一个高斯分布和2层LSTM模型训练数据。文章分为两个版块,外汇价格预测和每日盘中价格预测(30分钟、15分钟、5分钟,等等)。源代码请在文末获取!

外汇预测(用英语描述)

a. Daily Data is pulled from Yahoo’s Data Reader

b. Only the training set is preprocessed because we create a separate test set later on

c. “model_forex” is the model for to build and train.

d. Create separate daily test set by specifying dates which start after your training set ends.

e. You can see “model_forex” is plugged in here for running the prediction

predicted_st = predict_standard(X_test_stock,y_test_stock, model_forex)

盘中预测(用英语描述)

a. Intraday Data is pulled from Google’s API. The second argument is the time in seconds (900 secs = 15 mins) and the third argument it the number of days, the max backtrack day for Googles API is 15 days I believe.

df = get_google_data(INTRA_DAY_TICKER, 900, 150)

b. Preprocess the full set of data and train test split it with “train_test_split_intra”

c. “model_intra” is the model for to build and train.

d. You can see “model_intra” is plugged in here for running the prediction

predicted_intra = predict_intra(X_test_intra,y_test_intra, model_intra)

代码展示

SITE = "http://en.wikipedia.org/wiki/List_of_S%26P_500_companies"

def scrape_list(site):
   hdr = {'User-Agent': 'Mozilla/5.0'}
   req = urllib2.Request(site, headers=hdr)
   page = urllib2.urlopen(req)
   soup = BeautifulSoup(page)    table = soup.find('table', {'class': 'wikitable sortable'})
   sector_tickers = dict()    for row in table.findAll('tr'):
       col = row.findAll('td')        if len(col) > 0:
           sector = str(col[3].string.strip()).lower().replace(' ', '_')
           ticker = str(col[0].string.strip())            if sector not in sector_tickers:
               sector_tickers[sector] = list()
           sector_tickers[sector].append(ticker)    return sector_tickers
sector_tickers = scrape_list(SITE)
##Help functions to normalize and denormalize values
(省略)
# Sequence Length, or # of days of tradingSEQ_LENGTH = 25

# Number of units in the two hidden (LSTM) layersN_HIDDEN = 256

#Number of attributes used for each trading daynum_attr = 4

#Out of those attribute how many are indicatorsnum_indicators = 0

#Variable to help define how far you want your y to reachREWARD_LAG = 1

#How many dats ahead do you want to predictLOOK_AHEAD = 5

#Window StrideSTRIDE = 1
def _load_data(data, n_prev = SEQ_LENGTH):  
   docX, docY = [], []    for i in range(len(data)-n_prev):
       x,y = norm(data.iloc[i:i+n_prev,:num_attr].as_matrix(),data.iloc[i+n_prev-1,num_attr:].as_matrix())
       docX.append(x)
       docY.append(y)
   alsX = np.array(docX)
   alsY = np.array(docY)    return alsX, alsYdef _load_data_test(data, n_prev = SEQ_LENGTH):  
   docX, docY = [], []
   num_sequences = (len(data)-n_prev+1)/STRIDE    for i in range(num_sequences):
       i = i*STRIDE
       x = (data.iloc[i:i+n_prev,:num_attr].as_matrix())
       y = (data.iloc[i+n_prev-1,num_attr:].as_matrix())        #x,y = norm(data.iloc[i:i+n_prev,:num_attr].as_matrix(),data.iloc[i+n_prev-1,num_attr:].as_matrix())
       docX.append(x)
       docY.append(y)
   alsX = np.array(docX)
   alsY = np.array(docY)    return alsX, alsYdef _load_data_norm(data, n_prev = SEQ_LENGTH):  
   docX, docY = [], []    for i in range(len(data)-n_prev):
       x = np.array((data.iloc[i:i+n_prev,:num_attr].as_matrix()))
       y = np.array((data.iloc[i+n_prev-1,num_attr:].as_matrix()))
(省略)

外汇数据

##Dataset on just single ticker to test performancesdf = data.DataReader('EUR=X', 'yahoo', datetime(2010,8,1), datetime(2014,8,1))
# df['RSI'] = ta.RSI(df.Close.values,timeperiod=14)# _,_, macdhist = ta.MACD(df.Close.values, fastperiod=12, slowperiod=26, signalperiod=9)# df['MACDHist'] = macdhist
##Add the predicted coloumn Y, as the last coloumn can be defined however you think is a good representation of a good decision
##Clean the rest of the Data Frame
y = []for i in range(0,len(df)):    if i >= (len(df)- STRIDE):
       y.append(None)    else:        if (REWARD_LAG > 1):
           val = 0
           for n in range(REWARD_LAG):
               val = val + df['Close'][i+n+1]
           val = val / float(REWARD_LAG)
           y.append(val)        else:
           y.append(df['Close'][i+REWARD_LAG]) df['Y_Values'] =np.asarray(y)
df = df.dropna()#print (df)sliced_df = df.drop(['Adj Close','Volume'] ,axis=1)#print (sliced_df)#(X_train, y_train), (X_test, y_test) = train_test_split(sliced_df)(X_train, y_train) = train_test_split(sliced_df)
print(X_train[0],y_train[0])print (X_train.shape,y_train.shape)
(array([[-0.76244909, -0.75153814, -1.36800657, -1.28695383],
      [-1.28305706, -1.17005084, -1.66649887, -1.50673145],

(省略)

盘中数据

def get_google_data(symbol, period, window):
   url_root = 'http://www.google.com/finance/getprices?i='
   url_root += str(period) + '&p=' + str(window)
   url_root += 'd&f=d,o,h,l,c,v&df=cpct&q=' + symbol
   print(url_root)
   response = urllib2.urlopen(url_root)
   data = response.read().split('\n')    #actual data starts at index = 7
   #first line contains full timestamp,
   #every other line is offset of period from timestamp
   parsed_data = []
   anchor_stamp = ''
   end = len(data)    for i in range(7, end):
       cdata = data[i].split(',')        if 'a' in cdata[0]:            #first one record anchor timestamp
           anchor_stamp = cdata[0].replace('a', '')
           cts = int(anchor_stamp)        else:            try:
               coffset = int(cdata[0])
               cts = int(anchor_stamp) + (coffset * period)
               parsed_data.append((dt.datetime.fromtimestamp(float(cts)), float(cdata[1]), float(cdata[2]), float(cdata[3]), float(cdata[4]), float(cdata[5])))            except:                pass # for time zone offsets thrown into data
   df = pd.DataFrame(parsed_data)
   df.columns = ['ts', 'Open', 'High', 'Low', 'Close', 'Volume']
   df.index = df.ts    del df['ts']    return df

盘中创建单独的数据集

df = get_google_data('AAPL', 900, 150)#print(df)plt.plot(df['Close'].values[:])
y = []for i in range(0,len(df)):    if i >= (len(df)- REWARD_LAG):
       y.append(None)    else:        if (REWARD_LAG > 1):
           val = 0
           for n in range(REWARD_LAG):
               val = val + df['Close'][i+n+1]
           val = val / float(REWARD_LAG)
           y.append(val)
           print('here')        else:
           y.append(df['Close'][i+REWARD_LAG]) df['Y_Values'] =np.asarray(y)
df = df.dropna()
sliced_df = df.drop(['Volume'] ,axis=1)#print(sliced_df)(X_train, y_train), (X_test, y_test) = train_test_split_intra(sliced_df)#print(X_train[0],y_train[0])print(len(X_train),len(X_test))#print(X_test[0],y_test[0])
(1168, 108)

构建网络结构

model_intra = Sequential() 

model_intra.add(LSTM(N_HIDDEN, return_sequences=True, activation='tanh', input_shape=(SEQ_LENGTH, num_attr)))#model_intra.add(LSTM(N_HIDDEN, return_sequences=True, activation='tanh'))model_intra.add(LSTM(N_HIDDEN, return_sequences=False, activation='tanh'))

model_intra.add(Dense(1,activation='linear'))
model_intra.compile(loss="mean_squared_error", optimizer='adam')
model_intra_full = Sequential() model_intra_full.add(LSTM(N_HIDDEN, return_sequences=True, activation='tanh', input_shape=(SEQ_LENGTH, num_attr)))#model_intra_full.add(LSTM(N_HIDDEN, return_sequences=True, activation='tanh'))model_intra_full.add(LSTM(N_HIDDEN, return_sequences=False, activation='tanh')) model_intra_full.add(Dense(1,activation='linear'))
model_intra_full.compile(loss="mean_squared_error", optimizer='adam') model_forex = Sequential() model_forex.add(LSTM(N_HIDDEN, return_sequences=True, activation='tanh', input_shape=(SEQ_LENGTH, num_attr)))#model_forex.add(LSTM(N_HIDDEN, return_sequences=True, activation='tanh'))model_forex.add(LSTM(N_HIDDEN, return_sequences=False, activation='tanh')) model_forex.add(Dense(1,activation='linear'))
model_forex.compile(loss="mean_squared_error", optimizer

符合模型的模型和参数

print(X_train.shape)
print(y_train.shape)
(1018, 25, 4)
(1018, 1)
model_intra.fit(X_train, y_train, batch_size=50, nb_epoch=
Train on 1156 samples, validate on 12 samples
Epoch 1/150
1156/1156 [==============================] - 1s - loss: 1.9575 - val_loss: 0.5494
Epoch 2/150
1156/1156 [==============================] - 1s - loss: 1.4731 - val_loss: 0.4006

(省略)

辅助绩效评估功能

#Function to normalize the test input then denormalize the result. Calculate the rmse of the predicted values on the test setdef predict(X_test,y_test, myModel):
   predicted = []    for example in X_test:
       x = copy.copy(example)        #print (x)
       x_norm, mn, mx = normalize(x)
       toPred = []
       toPred.append(x_norm)
       add = np.array(toPred)        #Predict for the standard model
       predict_standard = myModel.predict(add)
       pred_st = copy.copy(predict_standard)
       y_real_st = deNormalizeY(pred_st,mn,mx)
       predicted.append(y_real_st[0])        #Predict for the bidirectional model#         predict_bidirectional = myModel.predict([add,add])#         pred_bi = copy.copy(predict_bidirectional)#         y_real_bi = deNormalizeY(pred_bi,mn,mx)#         predicted.append(y_real_bi[0])(省略)df_test = data.DataReader('EUR=X', 'yahoo', datetime(2014,8,1), datetime(2015,8,1))# df_test['RSI'] = ta.RSI(df_test.Close.values,timeperiod=14)# _,_, macdhist = ta.MACD(df_test.Close.values, fastperiod=12, slowperiod=26, signalperiod=9)# df_test['MACDHist'] = macdhisty = []for i in range(0,len(df_test)):    if i >= (len(df_test)- STRIDE):
        y.append(None)    else:        if (REWARD_LAG > 1):
           val = 0
           for n in range(REWARD_LAG):
               val = val + df_test['Close'][i+n+1]
           val = val / float(REWARD_LAG)
           y.append(val)        else:
           y.append(df_test['Close'][i+REWARD_LAG]) (省略)

MAE for LSTM is: [0.0035823152701196983]
MAE for doing nothing is: [0.0045693478326778786]
RMSE for LSTM is: [0.0050684837061917686]
RMSE for doing nothing is: [0.0061416562709802761]
Net profit for 0.0 threshhold is 245.261025777 making 234 trades
Net profit for 0.001 threshhold is 242.673572498 making 201 trades
(省略)

盘中交易评价和结果

def predict_intra(X_test, y_test, myModel):
   print(len(X_test))
   predicted = []    for example in X_test:        #Transform the training example into gaussing distribution
       x_norm, mean, std = normDist(np.array(example))        #Add examples to array to predict
       toPred = []
       toPred.append(x_norm)
       add = np.array(toPred)        #Predict these examples
       predict_standard = myModel.predict(add)
       pred = copy.copy(predict_standard)
       y_real = deNormDist(pred,mean,std)
       predicted.append(y_real[0])    return predicted predicted_intra = predict_intra(X_test,y_test, model_intra)
plt.figure(figsize=(20,20))
plt.plot(y_test)
plt.plot(predicted_intra)
plt.show()

MAE and RMSE 评估

sum_error = 0sum_error_donothing = 0for i in range(len(predicted_intra)):    if i>0:
       sum_error = sum_error + abs(predicted_intra[i] - y_test[i])
       sum_error_donothing = sum_error_donothing + abs(predicted_intra[i] - y_test[i-1])
MAE_lstm = sum_error/len(predicted_intra)
MAE_donothing = sum_error_donothing/len(predicted_intra)
print("MAE for LSTM is: " + str(MAE_lstm))
print("MAE for doing nothing is: " + str(MAE_donothing))
MAE for LSTM is: [0.091961468484759237]
MAE for doing nothing is: [0.16699238882416201]
sum_error = 0sum_error_donothing = 0for i in range(len(predicted_intra)):    if i>0:
       sum_error = sum_error + (predicted_intra[i] - y_test[i])**2
       sum_error_donothing = sum_error_donothing + (predicted_intra[i] - y_test[i-1])**2RMSE_lstm = (sum_error/len(predicted_intra))**(1.0/2.0)
RMSE_donothing = (sum_error_donothing/len(predicted_intra))**(1.0/2.0)
print("RMSE for LSTM is: " + str(RMSE_lstm))
print("RMSE for doing nothing is: " + str(RMSE_dono
RMSE for LSTM is: [0.15719269057322682]
RMSE for doing nothing is: [0.23207816758496383]

Policy的功能评价

net_profits = []
protits_per_trade = []for i in range(50):
   THRESH = i/10000.0
   LOT_SIZE = 100
   net_profit = 0
   num_trades = 0
   for i in range(len(predicted_intra)):        if i>1:
           predicted_change = ((predicted_intra[i] / y_test[i-1]) - 1)            #print(predicted_change)
           actual_change = (predicted_intra[i] -  y_test[i])*LOT_SIZE            if predicted_change >= THRESH:                #print("Buy")
               net_profit = net_profit + actual_change
               num_trades = num_trades + 1
(省略)
(array([327.67074597699519], dtype=object), 106)
(array([322.81673063817777], dtype=object), 103)
plt.plot(net_profits)
plt.show()

plt.plot(protits_per_trade)
plt.show()

其他

buyTotal = 0sellTotal = 0correct = 0sellCorrect = 0buyCorrect = 0for i in range(len(predicted_st)):
   realAnswer = y_test_stock[i][0][0]    if predicted_st[i][1] > predicted_st[i][0]:
       predicted = 0 #Buy
   else:
       predicted = 1 #Sell    if realAnswer == 0:        ##This is where the actual answer is Buy:Up:[0,1]:0
       buyTotal = buyTotal + 1
       if predicted == realAnswer:
           buyCorrect = buyCorrect + 1
           correct = correct + 1(省略)
(349, 730, 0.4780821917808219)
(210, 382, 0.5497382198952879)
(139, 348, 0.3994252873563218)
0.523287671233
0.476712328767
MMM
AYI
ALK
ALLE(省略)

创造基线RMSE

totalCorrect = 0total = 0for stock in testing_dataframes[:50]:

    X_test_stock, y_test_stock = _load_data_test(stock[1])
   predicted_st = predict_standard(X_test_stock,y_test_stock, model)    buyTotal = 0
   sellTotal = 0
   correct = 0
   sellCorrect = 0
   buyCorrect = 0(省略)
#Count the number of positive and the number of negative calls you got righttotalCorrect = 0total = 0buyTotal = 0sellTotal = 0correct = 0sellCorrect = 0buyCorrect = 0for i in range(len(predicted_st)):
   realAnswer = y_test_stock[i][0][0]    if predicted_st[i][1] > predicted_st[i][0]:
       predicted = 0 #Buy
(省略)
(104, 235, 0.4425531914893617)
(104, 104, 1.0)
(0, 131, 0.0)
0.442553191489
0.557446808511
from sklearn.metrics import f1_score##Calculate F1 scoreactual = []
result = []for y in y_test_merged:    if y[0] == 0:
       actual.append(0)    else:
       actual.append(1)for y in predicted_st:    if y[1] > y[0]:
       result.append(0)    else:
       result.append(1)
score = f1_score(actual,result,average='weighted',pos_label=1)
print(score)
0.498192044998
#Same percentage calculations but with a thresholdTHRESH = 0.1totalCorrect = 0total = 0noDecision = 0buyTotal = 0sellTotal = 0correct = 0sellCorrect = 0buyCorrect = 0for i in range(len(predicted_st)):
   realAnswer = y_test_merged[i][0]    if predicted_st[i][1] - THRESH > .5:
       predicted = 0 #Buy
   elif predicted_st[i][0] - THRESH > .5:
       predicted = 1 #Sell
   else:
       predicted = 2 #Pass, do not count towards percentages because you make no decision if .6>x>.4(省略)
(347, 750, 0.46266666666666667)
(190, 351, 0.5413105413105413)
(157, 399, 0.39348370927318294)
If you just predicted all Up 0.468
If you just predicted all Down 0.532
thresholds = []
totalAcc = []
positiveAcc = []
negativeAcc = []##Graph this graph of the threshold vs accuracyfor i in range(10):
   thresh = i/100.0
   totalCorrect = 0
   total = 0
   noDecision = 0
   buyTotal = 0
   sellTotal = 0
   correct = 0
   sellCorrect = 0
   buyCorrect = 0
   for i in range(len(predicted_st)):
       realAnswer = y_test_merged[i][0]        if predicted_st[i][1] - thresh > .5:
           predicted = 0 #Buy
       elif predicted_st[i][0] - thresh > .5:
           predicted = 1 #Sell
(省略)   
plt.plot(totalAcc)
plt.show()

plt.plot(positiveAcc)
plt.show()

plt.plot(negativeAcc)
plt.show()

通过测试表明,每日价格预测,外汇有更好的表现,比传统股票。因为他有更少的噪音。

RNNs在股票价格预测的应用的更多相关文章

  1. 【python量化】将Transformer模型用于股票价格预测

    本篇文章主要教大家如何搭建一个基于Transformer的简单预测模型,并将其用于股票价格预测当中.原代码在文末进行获取.小熊猫的python第二世界 1.Transformer模型 Transfor ...

  2. 股票价格涨跌预测—基于KNN分类器

    code{white-space: pre;} pre:not([class]) { background-color: white; }if (window.hljs && docu ...

  3. 基于 Keras 的 LSTM 时间序列分析——以苹果股价预测为例

    简介 时间序列简单的说就是各时间点上形成的数值序列,时间序列分析就是通过观察历史数据预测未来的值.预测未来股价走势是一个再好不过的例子了.在本文中,我们将看到如何在递归神经网络的帮助下执行时间序列分析 ...

  4. 【机器学习Machine Learning】资料大全

    昨天总结了深度学习的资料,今天把机器学习的资料也总结一下(友情提示:有些网站需要"科学上网"^_^) 推荐几本好书: 1.Pattern Recognition and Machi ...

  5. 机器学习(Machine Learning)&深度学习(Deep Learning)资料【转】

    转自:机器学习(Machine Learning)&深度学习(Deep Learning)资料 <Brief History of Machine Learning> 介绍:这是一 ...

  6. 机器学习&深度学习经典资料汇总,data.gov.uk大量公开数据

    <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost到随机森林.D ...

  7. Python学习路径和个人增值(整合版)

    PS:内容来源于网络 一.简介         Python是一种面向对象.直译式计算机程序设计语言,由Guido van Rossum于1989年底发明.由于他简单.易学.免费开源.可移植性.可扩展 ...

  8. 【莫烦Pytorch】【P1】人工神经网络VS. 生物神经网络

    滴:转载引用请注明哦[握爪] https://www.cnblogs.com/zyrb/p/9700343.html 莫烦教程是一个免费的机器学习(不限于)的学习教程,幽默风俗的语言让我们这些刚刚起步 ...

  9. LSTM(长短期记忆网络)及其tensorflow代码应用

     本文主要包括: 一.什么是LSTM 二.LSTM的曲线拟合 三.LSTM的分类问题 四.为什么LSTM有助于消除梯度消失 一.什么是LSTM Long Short Term 网络即为LSTM,是一种 ...

随机推荐

  1. Counting blessings can actually increase happiness and health by reminding us of the good things in life.

    Counting blessings can actually increase happiness and health by reminding us of the good things in ...

  2. Android 悬浮窗权限各机型各系统适配大全

    这篇博客主要介绍的是 Android 主流各种机型和各种版本的悬浮窗权限适配,但是由于碎片化的问题,所以在适配方面也无法做到完全的主流机型适配,这个需要大家的一起努力,这个博客的名字永远都是一个将来时 ...

  3. 彻底解决Android 应用方法数不能超过65K的问题

    作为一名Android开发者,相信你对Android方法数不能超过65K的限制应该有所耳闻,随着应用程序功能不断的丰富,总有一天你会遇到一个异常: Conversion to Dalvik forma ...

  4. [转]超全!iOS 面试题汇总

    转自:http://www.cocoachina.com/programmer/20151019/13746.html 1. Object-c的类可以多重继承么?可以实现多个接口么?Category是 ...

  5. linux 命令——12 more (转)

    more命令,功能类似 cat ,cat命令是整个文件的内容从上到下显示在屏幕上. more会以一页一页的显示方便使用者逐页阅读,而最基本的指令就是按空白键(space)就往下一页显示,按 b 键就会 ...

  6. RHEL7 本地yum源配置

    配置yum 源 1.挂载DVD光盘到/mnt   因为配置时候路径名里面不能有空格,否则不能识别  [root@ mnt]# mount /dev/cdrom /mnt 2.在目录/etc/yum.r ...

  7. PMBOK(第六版) PMP笔记——第十章(项目沟通管理)

    PM 大多数时间都用在与干系人的沟通上.第十章有三个过程: 规划沟通管理:根据干系人的需求,制定沟通管理计划管理沟通:根据沟通管理计划发布.收集.处理信息监督沟通:确保在正确时间将正确信息传递给正确的 ...

  8. vue列表过渡效果

    <transition-group></transition-group> ① 列表 <transition-group> </transition-grou ...

  9. 深入浅出:了解JavaScript的六种继承

    了解继承前我们需要了解函数的构造,方便我们理解. 常见六种继承方式: 1.原型继承call和apply: 2.原型拷贝:循环父函数protype的key值=子函数prototype的key值: 3.原 ...

  10. 牛客小白月赛1 G あなたの蛙は旅⽴っています【图存储】【DP】

    题目链接:https://www.nowcoder.com/acm/contest/85/G 思路: DP 空间可以优化成一维的, 用一维数组的 0 号单元保存左斜对角的值即可. 存图这里真不好理解 ...