Edgar Jullien, Antoine Settelen, Simon Weiss
CFM Data Challenge January 2021 with College de France and ENS https://challengedata.ens.fr/challenges/60/
In many stock exchanges, at the end of a trading day, auction takes place for each stock. Each stock is then exchanged at a single price, based on the interest that market participants show in the auction.
It is advantageous to do some trading during this auction instead of during the preceding continuous trading, as trading costs are usually lower.
In addition, some market participants (day traders, market makers…) prefer to not hold stocks overnight, because events might affect the stock price between the close of the market and the price at the open on the next day (elections, company announcements, etc.), which may result in a loss. For these market participants, the auction is the last chance to limit such a risk, as it is an opportunity to get rid of their remaining stocks and not hold any overnight.
For market participants that instead want to hold a specific number of stocks by the end of the day (asset managers…), the auction is their last chance to reach this target. This is in principle important, because they have optimized this number of stocks to hold. For example, if they predict that the price of a stock should rise, then it is in their interest to buy as much of the stock as possible, within the limits set by how much they can invest and by the financial risk that they are ready to take.
Market participants may thus want to estimate the expected number of stocks available during the auction, as this allows them to gauge how many they can hope to buy or sell during this final, financially advantageous trading opportunity.
The goal of this challenge is to predict the volume (total value of stock exchanged) available for auction, for 900 stocks over about 350 days.
Input data
The prediction of the auction volume for a given stock on a given day can be made based on the following 126 input columns:
pid: a Product ID, that represents a stock.
day: day of the data sample, as an integer. The ordering is chronological, with day 0 coming before day 1, etc.
abs_retn (n from 0 to 60): absolute values of stock returns (relative price change) between the last known price (typically the price at the beginning of period n) and the end of period n (as a percentage), where the periods cover a good part of the trading day, don't overlap, and have the same duration. Return n=0 comes before return n=1, etc.
rel_voln: like abs_retn, but represents the traded stock volume as a fraction of the volume traded during the period covered (thus, they sum to 1, over a day). The periods are the same as for the returns.
LS and NLV: two quantities associated with the trades of the day for the stock in question. Their nature is kept undisclosed for this challenge.
Output data
The output data contains, for a given stock and a given day, the natural logarithm of the auction volume (= total value of traded stocks), as a fraction of the total volume in the 61 given periods. Thus, if the auction volume represents 10 % of the volume traded over all the periods of a day, the target is log 0.10 = -2.30…
Training and test data
The 900 stocks found in the training and test data are the same: it is therefore in principle possible to devise predictions that are customized for each stock.
The training data contains information on about 800 different days, while the test data requires auction volume predictions for about 350 days.
Furthermore, the test inputs correspond to days that come after those of the training data. A challenge is that auction volumes can evolve over time (for instance by becoming relatively larger and larger over time), but we only see what the past (training) auction volumes looks like.
Presentation ideas :
the evolution of the volume in the future is not necessarily linear. => Feature: Evolution of the global volume per day
For a given day (see idea amf report, election), the market can behave in a particular way that can have csq on the volume.
Ideas from AMF report (only for France):
In a given quarter, the auction share for end-of-quarter months (March, June, September and December) is about 4% to 6% higher than that observed for the other months (bc of derivative product) => Fature Need quarter feature encode / months
The days on which quarterly derivatives expire are not only amongst the most active days, they are also the days on which the share of the closing auction reaches its highest level cf graph amf
=> Encode these specific days
=> Encode these specific days
Origin :
RAPID DEVELOPMENT OF PASSIVE MANAGEMENT in Europe notably the ETF which have to increase their volume at the end of the day (vs US) => If we have the days, we could add the share of passive funds in equity funds
Best Exectution obligation : since the closing auction offers a single simple reference price.
Avoiding HTF
THE AMPLIFYING EFFECT OF EXECUTION ALGORITHMS VWAP type, which adapt their execution volumes to that of the market in general
=> Newcolumn : Abs_var_price* Abs_vol
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
#pip install -U scikit-learn
#pip install delayed
Collecting delayed Downloading https://files.pythonhosted.org/packages/7b/80/96302b67fe8d324af597748d5eef9cfb98bb1e6590b5f25a5b58b5e6f93f/delayed-0.11.0b1-py2.py3-none-any.whl Collecting redis Downloading https://files.pythonhosted.org/packages/a7/7c/24fb0511df653cf1a5d938d8f5d19802a88cef255706fdda242ff97e91b7/redis-3.5.3-py2.py3-none-any.whl (72kB) |████████████████████████████████| 81kB 7.4MB/s Collecting hiredis Downloading https://files.pythonhosted.org/packages/ed/7d/6acf1c8d4f2fb327ff6feec000b4c56a20628fbe966a4c7cd16c0b80343c/hiredis-1.1.0-cp36-cp36m-manylinux2010_x86_64.whl (61kB) |████████████████████████████████| 61kB 6.6MB/s Installing collected packages: redis, hiredis, delayed Successfully installed delayed-0.11.0b1 hiredis-1.1.0 redis-3.5.3
import pandas as pd
import numpy as np
import os
import glob
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import TimeSeriesSplit
import xgboost
from sklearn.model_selection import train_test_split
#import delayed
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score
import lightgbm as lgb
from lightgbm import LGBMRegressor
from sklearn.datasets import make_moons
from sklearn.model_selection import RandomizedSearchCV
Load Light GBM in Google collab for futur use of GPU
"""!rm -r /content/LightGBM
!git clone --recursive https://github.com/Microsoft/LightGBM
%cd /content/LightGBM
!mkdir build
!cmake -DUSE_GPU=1 #avoid ..
!make -j$(nproc)
!sudo apt-get -y install python-pip
!sudo -H pip install setuptools pandas numpy scipy scikit-learn -U
%cd /content/LightGBM/python-package
!sudo python setup.py install --precompile"""
'!rm -r /content/LightGBM\n!git clone --recursive https://github.com/Microsoft/LightGBM\n%cd /content/LightGBM\n!mkdir build\n!cmake -DUSE_GPU=1 #avoid ..\n!make -j$(nproc)\n!sudo apt-get -y install python-pip\n!sudo -H pip install setuptools pandas numpy scipy scikit-learn -U\n%cd /content/LightGBM/python-package\n!sudo python setup.py install --precompile'
data_dir = "/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/dataset"
data_list = glob.glob(os.path.join(data_dir, '**.csv'))
data_list
['/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/dataset/x_train.csv', '/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/dataset/x_test.csv', '/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/dataset/submission_csv_file_random_example.csv', '/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/dataset/y_train.csv']
y_train = pd.read_csv("%s/y_train.csv" % data_dir, sep=",")
x_train = pd.read_csv("%s/x_train.csv" % data_dir, sep=",")
x_test=pd.read_csv("%s/x_test.csv" % data_dir, sep=",")
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-8-487f0ed00526> in <module>() 1 y_train = pd.read_csv("%s/y_train.csv" % data_dir, sep=",") ----> 2 x_train = pd.read_csv("%s/x_train.csv" % data_dir, sep=",") 3 x_test=pd.read_csv("%s/x_test.csv" % data_dir, sep=",") /usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision) 686 ) 687 --> 688 return _read(filepath_or_buffer, kwds) 689 690 /usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) 458 459 try: --> 460 data = parser.read(nrows) 461 finally: 462 parser.close() /usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in read(self, nrows) 1196 def read(self, nrows=None): 1197 nrows = _validate_integer("nrows", nrows) -> 1198 ret = self._engine.read(nrows) 1199 1200 # May alter columns / col_dict /usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in read(self, nrows) 2155 def read(self, nrows=None): 2156 try: -> 2157 data = self._reader.read(nrows) 2158 except StopIteration: 2159 if self._first_chunk: pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read() pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory() pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows() pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data() /usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/common.py in is_extension_array_dtype(arr_or_dtype) 1458 1459 -> 1460 def is_extension_array_dtype(arr_or_dtype) -> bool: 1461 """ 1462 Check if an object is a pandas extension array type. KeyboardInterrupt:
np.any(np.isnan(x_train))
x_train.fillna(0, inplace=True)
x_test.fillna(0, inplace=True)
We create min, max, std and median for the volume
x_train['min_ret'] = np.min(x_train.iloc[:,3:63], axis=1)
x_train['max_ret'] = np.max(x_train.iloc[:,3:63], axis=1)
x_train['std_ret'] = np.std(x_train.iloc[:,3:63], axis=1)
x_train['median_ret'] = np.median(x_train.iloc[:,3:63], axis=1)
x_test['min_ret'] = np.min(x_train.iloc[:,3:63], axis=1)
x_test['max_ret'] = np.max(x_test.iloc[:,3:63], axis=1)
x_test['std_ret'] = np.std(x_test.iloc[:,3:63], axis=1)
x_test['median_ret'] = np.median(x_test.iloc[:,3:63], axis=1)
x_train['min_vol'] = np.min(x_train.iloc[:,64:125], axis=1)
x_train['max_vol'] = np.max(x_train.iloc[:,64:125], axis=1)
x_train['std_vol'] = np.std(x_train.iloc[:,64:125], axis=1)
x_train['median_vol'] = np.median(x_train.iloc[:,64:125], axis=1)
x_test['min_vol'] = np.min(x_test.iloc[:,64:125], axis=1)
x_test['max_vol'] = np.max(x_test.iloc[:,64:125], axis=1)
x_test['std_vol'] = np.std(x_test.iloc[:,64:125], axis=1)
x_test['median_vol'] = np.median(x_test.iloc[:,64:125], axis=1)
We Convert pid (categorical variable) to dummies for XGboost (doesn't take cat features)
x_train=pd.get_dummies(x_train,columns=['pid'])
x_test=pd.get_dummies(x_test,columns=['pid'])
for f in x_train.columns[134:1034]:
x_train[f]=x_train[f].astype('category')
for f in x_test.columns[134:1034]:
x_test[f]=x_test[f].astype('category')
First we merge x_train and y_train and drop ID columns
train_df = y_train.merge(x_train, on="ID")
train_df.drop(['ID'], axis=1, inplace=True)
train_X_, test_X_, train_y_, test_y_ = train_test_split(train_df.iloc[:,1:134], train_df['target'], test_size=0.2, random_state=42)
import delayed
For EDA we are going to use one of pid
We are going to use pid 10
pid_10=train_dataset[train_dataset['pid']==10]
pid_10.head()
day_sum = pid_10.groupby("day")[["sum_ret", "median_vol"]].agg("sum").reset_index()
fig = make_subplots(rows=2, cols=1)
fig.add_trace(go.Scatter(x=day_sum.day,
y=day_sum.median_vol,
#showlegend=False,
mode="lines",
name="median_vol",
#marker=dict(color="mediumseagreen"),
),
row=1,col=1
)
fig.add_trace(go.Scatter(x=day_sum.day,
y=day_sum.sum_ret,
#showlegend=False,
mode="lines",
name="sum_ret",
#marker=dict(color="mediumseagreen")
),
row=2,col=1
)
fig.update_layout(height=1000, title_text="SUM -> Demand and Sell_price")
fig.show()
from IPython.display import Image
Image(filename='C:/Users/swp/Documents/_Perso/Cours/M2/U4.Artificial_Intelligence/artificial_intelligence_for_finance/Git/U4_Prediction_stock_auction_volumes/img/1stplot.png', width=500, height=500)
## For each Deportment mean of deman and sell_price
dep_pid = train_dataset[["day", "pid", "sum_ret", "median_vol"]].reset_index()
fig = make_subplots(rows=1, cols=1)
for pid_i in dep_pid['pid'].unique()[0:2]:
dep_pid_df = dep_pid[dep_pid['pid']==pid_i]
fig.add_trace(go.Scatter(x=dep_pid_df["day"],
y=dep_pid_df["median_vol"],
#showlegend=Ture,
mode="lines",
name=str(pid_i),
#marker=dict(color="mediumseagreen")
),
row=1,col=1
)
fig.update_layout(title_text="Vol Mean Over pid by day-by-day")
fig.show()
Image(filename='C:/Users/swp/Documents/_Perso/Cours/M2/U4.Artificial_Intelligence/artificial_intelligence_for_finance/Git/U4_Prediction_stock_auction_volumes/img/2sdplot.png', width=1000, height=800)
fig = make_subplots(rows=1, cols=1)
for pid_i in dep_pid['pid'].unique():
dep_pid_df = dep_pid[dep_pid['pid']==pid_i]
fig.add_trace(go.Scatter(x=dep_pid_df["day"],
y=dep_pid_df["sum_ret"],
#showlegend=Ture,
mode="lines",
name=str(pid_i),
#marker=dict(color="mediumseagreen")
),
row=1,col=1
)
fig.update_layout(title_text="Sum ret Mean Over pid day-by-day")
fig.show()
from IPython.display import Image
Image(filename='C:/Users/swp/Documents/_Perso/Cours/M2/U4.Artificial_Intelligence/artificial_intelligence_for_finance/Git/U4_Prediction_stock_auction_volumes/img/3plot.png', width=1000, height=800)
We first run a linear regression on our train_dataset in order to get a new error variable that we will try to minimize in a second time with XGboost
regrLin = LinearRegression()
regrLin.fit(train_X_, train_y_)
test_X_['predict'] = regrLin.predict(test_X_)
(mean_squared_error(test_y_, test_X_['predict']))
test_y_=pd.DataFrame(test_y_)
test_X_['error']=test_y_["target"] - test_X_['predict']
Categorize pid in XGboost
test_X_=test_X_.join(train_df[train_df.columns[134:1034]])
Store days for time series split
date_series = test_X_['day']
min_date = date_series.min()
max_date = date_series.max()
dates = list(range(min_date, max_date + 1))
len(dates)
Create new train and test dataset from X_test
X_error_train = test_X_[test_X_['day']<=600].drop(['error'], axis = 1)
y_error_train = test_X_[test_X_['day']<=600]['error']
X_error_test = test_X_[test_X_['day']>600].drop(['error'], axis = 1)
y_error_test = test_X_[test_X_['day']>600]['error']
def learning_rate_010_decay_power_099(current_iter):
base_learning_rate = 0.1
lr = base_learning_rate * np.power(.99, current_iter)
return lr if lr > 1e-3 else 1e-3
def learning_rate_010_decay_power_0995(current_iter):
base_learning_rate = 0.1
lr = base_learning_rate * np.power(.995, current_iter)
return lr if lr > 1e-3 else 1e-3
def learning_rate_005_decay_power_099(current_iter):
base_learning_rate = 0.05
lr = base_learning_rate * np.power(.99, current_iter)
return lr if lr > 1e-3 else 1e-3
fit_params={"early_stopping_rounds":30,
"eval_metric" : 'neg_mean_squared_error',
"eval_set" : [(X_error_test,y_error_test)],
'eval_names': ['valid'],
'callbacks': [lgb.reset_parameter(learning_rate=learning_rate_010_decay_power_099)],
'verbose': 100,
'categorical_feature':'auto'}
from scipy.stats import randint as sp_randint
from scipy.stats import uniform as sp_uniform
param_test ={'num_leaves': sp_randint(6, 50),
'min_child_samples': sp_randint(100, 500),
'min_child_weight': [1e-5, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4],
'subsample': sp_uniform(loc=0.2, scale=0.8),
'colsample_bytree': sp_uniform(loc=0.4, scale=0.6),
'reg_alpha': [0, 1e-1, 1, 2, 5, 7, 10, 50, 100],
'reg_lambda': [0, 1e-1, 1, 5, 10, 20, 50, 100]}
model = LGBMRegressor( random_state=314, n_estimators=1000, device='gpu')
tscv = TimeSeriesSplit(n_splits=2)
n_HP_points_to_test = 100
gs = RandomizedSearchCV(
estimator=model,
param_distributions=param_test,
n_iter= n_HP_points_to_test,
scoring= 'neg_mean_squared_error',
cv= tscv,
refit=True,
random_state=314,
verbose=True)
#gs.fit(X_error_train, y_error_train, **fit_params)
#print('Best score reached: {} with params: {} '.format(gs.best_score_, gs.best_params_))
We got following parameters from our GridSearch :
opt_parameters = {'colsample_bytree': 0.7916380440478592,
'min_child_samples': 211,
'min_child_weight': 1,
'num_leaves': 45,
'reg_alpha': 2,
'reg_lambda': 20,
'subsample': 0.5211522776637936}
model_final = LGBMRegressor(**model.get_params())
model_final.set_params(**opt_parameters)
We fit the xgboost model with the entire train dataset
model_final.fit(X_error_train, y_error_train)
X_error_train['predict']
We refit the linear model with the entire train dataset
regrLin = LinearRegression()
regrLin.fit(x_train.iloc[:,1:134], y_train.iloc[:,1])
x_test['predict'] = regrLin.predict(x_test.iloc[:,1:134])
x_test
x_test['error'] = model_final.predict(x_test.iloc[:,1:])
x_test['target'] = x_test['predict'] + x_test['error']
predictions = x_test[['ID', 'target']]
predictions.columns = ['ID', 'target']
predictions.to_csv('predictions_02.02.17.15.csv', sep=',', index=False)
%reset -f
import pandas as pd
import numpy as np
import os
import glob
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import TimeSeriesSplit
import xgboost
from sklearn.model_selection import train_test_split
import delayed
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score
import lightgbm as lgb
from lightgbm import LGBMRegressor
from sklearn.datasets import make_moons
from sklearn.model_selection import RandomizedSearchCV
import math
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.tree import DecisionTreeClassifier
import sys
from numpy import mean
import pickle
data_dir = "/content/drive/MyDrive/M2/U4_Prediction_stock_auction_volumes/dataset"
data_list = glob.glob(os.path.join(data_dir, '**.csv'))
y_train = pd.read_csv("%s/y_train.csv" % data_dir, sep=",")
x_train = pd.read_csv("%s/x_train.csv" % data_dir, sep=",")
x_test=pd.read_csv("%s/x_test.csv" % data_dir, sep=",")
all_features = list(x_train.columns)
columns_with_missing_values = x_train.columns[x_train.isnull().any()]
x_train[columns_with_missing_values].isnull().sum()
return_cols = [c for c in x_train.columns if c.startswith("abs_ret")]
volume_cols = [c for c in x_train.columns if c.startswith("rel_vol")]
target_exp_col = ["target_exp"]
date_col = ["day"]
prod_id_col=["pid"]
other_cols = ["LS" , "NLV"]
#sum of nans
for df in [x_train,x_test]:
df["return_nan"] = df[return_cols].isnull().sum(axis=1)
#replace nans by interpolation
for df in [x_train,x_test]:
for x in [return_cols,volume_cols]:
df[x] = df[x].interpolate(axis=1, limit_direction="both", inplace=False)
del df
x_train = y_train.merge(x_train, on="ID")
x_train["is_train"] = True
x_test["is_train"] = False
x_test["target"] = None
all_data = pd.concat([x_train, x_test])
all_data['target_exp'] = all_data['target'].apply(lambda x : math.exp(x))
def get_stats_groupby(all_data, groupby_cols):
for groupby_obj in groupby_cols:
groupby_col = groupby_obj["id"]
#print(groupby_col)
cols = groupby_obj["cols"]
group_by = all_data.groupby([groupby_col])
#print(group_by)
data_arr = []
data_arr.append({"i": "avg", "d": group_by[cols].mean()})
#data_arr.append({"i": "skew", "d": group_by[cols].skew()})
#data_arr.append({"i": "kurt", "d": group_by[cols].apply(pd.DataFrame.kurt)})
data_arr.append({"i": "std", "d": group_by[cols].std()})
data_arr.append({"i": "median", "d": group_by[cols].median()})
#data_arr.append({"i": "nan", "d": all_data.isnull().groupby(all_data[groupby_col])[cols].sum()})
#print(data_arr)
all_data_stats = all_data.copy()
all_data_stats.set_index([groupby_col], inplace=True)
for obj_data in data_arr:
names = ['%s_%s_%s' % (obj_data["i"], groupby_col, col) for col in cols]
all_data_stats[names] = (obj_data["d"]).astype("float32")
all_data_stats.reset_index(inplace=True)
return all_data_stats
#group by day to get more features
calculation_group_by =[
{"id":"day",
"cols": volume_cols + return_cols,
}
]
all_data_stats = get_stats_groupby(all_data, calculation_group_by)
In order to identify later the "special days" of high fixing volume
def get_target_groupby(all_data, groupby_cols):
for groupby_obj in groupby_cols:
groupby_col = groupby_obj["id"]
#print(groupby_col)
cols = groupby_obj["cols"]
group_by = all_data.groupby([groupby_col])
#print(group_by)
data_arr = []
data_arr.append({"i": "avg", "d": group_by[cols].mean()})
#data_arr.append({"i": "skew", "d": group_by[cols].skew()})
#data_arr.append({"i": "kurt", "d": group_by[cols].apply(pd.DataFrame.kurt)})
data_arr.append({"i": "std", "d": group_by[cols].std()})
data_arr.append({"i": "median", "d": group_by[cols].median()})
data_arr.append({"i": "sum", "d": group_by[cols].sum()})
#data_arr.append({"i": "min", "d": group_by[cols].min()})
#data_arr.append({"i": "max", "d": group_by[cols].max()})
#data_arr.append({"i": "nan", "d": all_data.isnull().groupby(all_data[groupby_col])[cols].sum()})
#print(data_arr)
all_data_stats = all_data.copy()
all_data_stats.set_index([groupby_col], inplace=True)
for obj_data in data_arr:
names = ['%s_%s_%s' % (obj_data["i"], groupby_col, col) for col in cols]
all_data_stats[names] = (obj_data["d"]).astype("float32")
all_data_stats.reset_index(inplace=True)
return all_data_stats
target_group_by =[
{"id":"day",
"cols": ["target_exp"],
}
]
all_data_stats = get_target_groupby(all_data_stats, target_group_by)
all_data_stats['min_ret'] = np.min(all_data_stats.iloc[:,4:64], axis=1)
all_data_stats['max_ret'] = np.max(all_data_stats.iloc[:,4:64], axis=1)
all_data_stats['std_ret'] = np.std(all_data_stats.iloc[:,4:64], axis=1)
all_data_stats['median_ret'] = np.median(all_data_stats.iloc[:,4:64], axis=1)
all_data_stats['sum_ret'] = np.sum(all_data_stats.iloc[:,4:64], axis=1)
all_data_stats['min_vol'] = np.min(all_data_stats.iloc[:,65:126], axis=1)
all_data_stats['max_vol'] = np.max(all_data_stats.iloc[:,65:126], axis=1)
all_data_stats['std_vol'] = np.std(all_data_stats.iloc[:,65:126], axis=1)
all_data_stats['median_vol'] = np.median(all_data_stats.iloc[:,65:126], axis=1)
def get_groupby_med(all_data, groupby_cols):
for groupby_obj in groupby_cols:
groupby_col = groupby_obj["id"]
#print(groupby_col)
cols = groupby_obj["cols"]
group_by = all_data.groupby([groupby_col])
#print(group_by)
data_arr = []
data_arr.append({"i": "median", "d": group_by[cols].median()})
#data_arr.append({"i": "nan", "d": all_data.isnull().groupby(all_data[groupby_col])[cols].sum()})
#print(data_arr)
all_data_stats = all_data.copy()
all_data_stats.set_index([groupby_col], inplace=True)
for obj_data in data_arr:
names = ['%s_%s_%s' % (obj_data["i"], groupby_col, col) for col in cols]
all_data_stats[names] = (obj_data["d"]).astype("float32")
all_data_stats.reset_index(inplace=True)
return all_data_stats
dic_med =[
{"id":"day",
"cols": ["sum_ret"],
}
]
all_data_stats = get_groupby_med(all_data_stats, dic_med)
Before : getting daily variation of ret
target_analysis = all_data_stats[['day','median_day_sum_ret']]
target_analysis=target_analysis.sort_values(by='day').groupby(by='day').mean()
target_analysis['median_day_sum_ret_before'] = None
for i in target_analysis.index:
if i == 0:
target_analysis['median_day_sum_ret_before'][i] = 0
else:
target_analysis['median_day_sum_ret_before'][i] = (target_analysis['median_day_sum_ret'][i]-target_analysis['median_day_sum_ret'][j])/target_analysis['median_day_sum_ret'][j]
j = i
target_analysis['day']=target_analysis.index
target_analysis.index.name=''
target_analysis
(target_analysis.sort_values(by = 'median_day_sum_ret_before', ascending=False)[['day','median_day_sum_ret_before']].head(n =12)).sort_values(by = 'day')
K_means according to the sign of 'median_day_sum_ret_before'
kmeans = KMeans(n_clusters=5, random_state=0).fit(target_analysis['median_day_sum_ret_before'].values.reshape(-1, 1))
target_analysis['kmeans_cluster_median_day_sum_ret_before'] = kmeans.labels_
target_analysis
K_means absolute value of 'median_day_sum_ret_before'
kmeans_abs = KMeans(n_clusters=3, random_state=0).fit(abs(target_analysis['median_day_sum_ret_before'].values).reshape(-1, 1))
target_analysis['abs_kmeans_cluster_median_day_sum_ret_before'] = kmeans_abs.labels_
target_analysis
all_data_stats = all_data_stats.merge(target_analysis[['day','median_day_sum_ret_before','kmeans_cluster_median_day_sum_ret_before','abs_kmeans_cluster_median_day_sum_ret_before']], how="inner",on="day")
del target_analysis
train_dataset = all_data_stats[all_data_stats['day']<805]
test_dataset = all_data_stats[all_data_stats['day']>=805]
train_dataset['median_target_dummy'] = (train_dataset['median_day_target_exp'] >0.52).astype(int)
return_cols = [c for c in train_dataset.columns if c.startswith("abs_ret")]
volume_cols = [c for c in train_dataset.columns if c.startswith("rel_vol")]
all_cols = [c for c in train_dataset.columns if not (c.endswith("day_target_exp") or (c =='median_day_target_dummy') or (c == 'target') or (c== 'target_exp') or (c=='is_train'))]
date_col = ["day"]
prod_id_col=["pid"]
other_cols = ["LS" , "NLV"]
test_dataset= test_dataset[test_dataset.columns & all_cols]
train_dataset = train_dataset[train_dataset.columns & all_cols]
train_dataset['pid']=train_dataset['pid'].astype('category')
test_dataset['pid']=test_dataset['pid'].astype('category')
train_dataset['median_target_dummy']=train_dataset['median_target_dummy'].astype('category')
train_dataset['ID']=train_dataset['ID'].astype('category')
test_dataset['ID']= test_dataset['ID'].astype('category')
features = [c for c in train_dataset.columns if not (c.endswith("median_target_dummy"))]
label = ["median_target_dummy"]
X_classif = train_dataset[features]
y_classif = train_dataset[label]
def evaluate_clf(clf, features, labels, num_iters=10, test_size=0.3):
print (clf)
accuracy = []
precision = []
recall = []
first = True
for trial in range(num_iters):
features_train, features_test, labels_train, labels_test =\
train_test_split(features, labels, test_size=test_size)
clf.fit(features_train,labels_train)
predictions = clf.predict(features_test)
accuracy.append(accuracy_score(labels_test, predictions))
precision.append(precision_score(labels_test, predictions))
recall.append(recall_score(labels_test, predictions))
if trial % 10 == 0:
if first:
sys.stdout.write('\nProcessing')
sys.stdout.write('.')
sys.stdout.flush()
first = False
print ("done.\n")
print ("precision: {}".format(mean(precision)))
print ("recall: {}".format(mean(recall)))
print ("accuracy: {}".format(mean(accuracy)))
return len(labels_test)
return mean(precision), mean(recall)
tre_clf=DecisionTreeClassifier(random_state=42)
#evaluate_clf(tre_clf, X_classif,y_classif['median_target_dummy'])
precision: 1.0 recall: 1.0 accuracy: 1.0
205345
tre_clf.fit(X_classif,y_classif['median_target_dummy'])
# save the classifier
#with open('/content/drive/MyDrive/M2/U4_Prediction_stock_auction_volumes/notebook/rf_classifier.pkl', 'wb') as fid:
#pickle.dump(tre_clf, fid)
# load it again
with open('/content/drive/MyDrive/M2/U4_Prediction_stock_auction_volumes/notebook/rf_classifier.pkl', 'rb') as fid:
tre_clf = pickle.load(fid)
test_dataset['median_target_dummy'] = tre_clf.predict(test_dataset)
data_dir = "/content/drive/MyDrive/M2/U4_Prediction_stock_auction_volumes/dataset"
data_list = glob.glob(os.path.join(data_dir, '**.csv'))
y_train = pd.read_csv("%s/y_train.csv" % data_dir, sep=",")
train_dataset
train_dataset=train_dataset.merge(y_train, on="ID")
del y_train
train_dataset.to_csv('/content/drive/MyDrive/M2/U4_Prediction_stock_auction_volumes/dataset/clean_dataset/train_dataset.csv',sep=',', index=False)
test_dataset.to_csv('/content/drive/MyDrive/M2/U4_Prediction_stock_auction_volumes/dataset/clean_dataset/test_dataset.csv',sep=',', index=False)
%reset -f
import pandas as pd
import numpy as np
import os
import glob
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import TimeSeriesSplit
import xgboost
from sklearn.model_selection import train_test_split
import delayed
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score
import lightgbm as lgb
from lightgbm import LGBMRegressor
from sklearn.datasets import make_moons
from sklearn.model_selection import RandomizedSearchCV
import math
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.tree import DecisionTreeClassifier
import sys
from numpy import mean
import pickle
from sklearn.feature_selection import f_regression
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn import preprocessing
from sklearn.metrics import mean_squared_error
"""!rm -r /content/LightGBM
!git clone --recursive https://github.com/Microsoft/LightGBM
%cd /content/LightGBM
!mkdir build
!cmake -DUSE_GPU=1 #avoid ..
!make -j$(nproc)
!sudo apt-get -y install python-pip
!sudo -H pip install setuptools pandas numpy scipy scikit-learn -U
%cd /content/LightGBM/python-package
!sudo python setup.py install --precompile"""
'!rm -r /content/LightGBM\n!git clone --recursive https://github.com/Microsoft/LightGBM\n%cd /content/LightGBM\n!mkdir build\n!cmake -DUSE_GPU=1 #avoid ..\n!make -j$(nproc)\n!sudo apt-get -y install python-pip\n!sudo -H pip install setuptools pandas numpy scipy scikit-learn -U\n%cd /content/LightGBM/python-package\n!sudo python setup.py install --precompile'
data_dir = "/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/dataset/clean_dataset"
data_list = glob.glob(os.path.join(data_dir, '**.csv'))
train_dataset = pd.read_csv("%s/train_dataset.csv" % data_dir, sep=",")
test_dataset = pd.read_csv("%s/test_dataset.csv" % data_dir, sep=",")
We don't need ID in ML model
ID_train=train_dataset['ID']
ID_test=test_dataset['ID']
train_dataset=train_dataset.drop("ID",axis=1)
test_dataset=test_dataset.drop("ID",axis=1)
train_dataset['pid']=train_dataset['pid'].astype('category')
train_dataset['median_target_dummy']=train_dataset['median_target_dummy'].astype('category')
test_dataset['pid']=test_dataset['pid'].astype('category')
test_dataset['median_target_dummy']=test_dataset['median_target_dummy'].astype('category')
cat_features=["pid","median_target_dummy"]
#train_dataset=pd.get_dummies(train_dataset,columns=['pid','median_target_dummy'])
#test_dataset=pd.get_dummies(test_dataset,columns=['pid','median_target_dummy'])
y_train = train_dataset[['pid','target']]
train_dataset = train_dataset.drop(columns = ['target'])
features = [c for c in train_dataset.columns if c !="target"]
label = ["target"]
def getTopFeatures(train_x, train_y, n_features=15):
f_val_dict = {}
p_val_dict = {}
f_val, p_val = f_regression(train_x,train_y)
for i in range(len(f_val)):
if math.isnan(f_val[i]):
f_val[i] = 0.0
f_val_dict[i] = f_val[i]
if math.isnan(p_val[i]):
p_val[i] = 0.0
p_val_dict[i] = p_val[i]
sorted_f = sorted(f_val_dict.items(), key=lambda item: item[1],reverse=True)
sorted_p = sorted(p_val_dict.items(), key=lambda item: item[1],reverse=True)
feature_indexs = []
for i in range(0,n_features):
feature_indexs.append(sorted_f[i][0])
return feature_indexs
Selected_features = getTopFeatures(train_dataset,y_train['target'])
Selected_features = np.array(Selected_features)
train_dataset.iloc[:,Selected_features]
NLV | sum_ret | median_ret | median_target_dummy | std_ret | return_nan | avg_day_rel_vol0 | median_day_rel_vol0 | std_day_rel_vol0 | max_ret | day | max_vol | rel_vol0 | abs_ret15 | abs_ret9 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.646580 | 0.739680 | 0.000000 | 0 | 0.022135 | 0 | 0.049047 | 0.039607 | 0.045797 | 0.102399 | 0 | 0.076994 | 0.017012 | 0.000000 | 0.073260 |
1 | 0.835479 | 1.878094 | 0.000000 | 0 | 0.047647 | 0 | 0.049047 | 0.039607 | 0.045797 | 0.218818 | 0 | 0.135543 | 0.086902 | 0.088771 | 0.110302 |
2 | 1.270225 | 1.492592 | 0.020502 | 0 | 0.028894 | 0 | 0.049047 | 0.039607 | 0.045797 | 0.109649 | 0 | 0.056237 | 0.050771 | 0.000000 | 0.082079 |
3 | 1.288022 | 1.120654 | 0.012723 | 0 | 0.021687 | 0 | 0.049047 | 0.039607 | 0.045797 | 0.102119 | 0 | 0.038170 | 0.033444 | 0.025569 | 0.050988 |
4 | 0.553135 | 1.284760 | 0.021200 | 0 | 0.024410 | 0 | 0.049047 | 0.039607 | 0.045797 | 0.106270 | 0 | 0.082588 | 0.071315 | 0.063884 | 0.000000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
684477 | -1.292121 | 16.987811 | 0.252966 | 1 | 0.272910 | 2 | 0.045077 | 0.035815 | 0.035454 | 1.434599 | 770 | 0.151295 | 0.019060 | 0.665004 | 0.672834 |
684478 | -1.104483 | 8.717839 | 0.102250 | 1 | 0.143534 | 5 | 0.045077 | 0.035815 | 0.035454 | 0.754875 | 770 | 0.086605 | 0.024668 | 0.040858 | 0.000000 |
684479 | -1.179556 | 5.703267 | 0.000000 | 1 | 0.127382 | 3 | 0.045077 | 0.035815 | 0.035454 | 0.431034 | 770 | 0.098189 | 0.030452 | 0.000000 | 0.428266 |
684480 | -1.370424 | 8.491105 | 0.074686 | 1 | 0.149656 | 1 | 0.045077 | 0.035815 | 0.035454 | 0.601052 | 770 | 0.092877 | 0.045159 | 0.000000 | 0.370370 |
684481 | -1.490859 | 9.398699 | 0.112740 | 1 | 0.200471 | 6 | 0.045077 | 0.035815 | 0.035454 | 1.186441 | 770 | 0.071583 | 0.007989 | 0.000000 | 0.396601 |
684482 rows × 15 columns
l_reg = LinearRegression()
rf_reg = RandomForestRegressor(max_depth = 5,max_features = 'sqrt',n_estimators = 10, random_state = 42)
gb_reg = GradientBoostingRegressor(random_state = 42)
tre_reg =DecisionTreeRegressor(random_state=42)
knn_clf = KNeighborsRegressor(n_neighbors=3)
def evaluate_pred_reg(clf, features, labels, num_iters=5, test_size=0.3):
print (clf)
mean_squared_error_score = []
first = True
for trial in range(num_iters):
features_train, features_test, labels_train, labels_test =\
train_test_split(features, labels, test_size=test_size)
clf.fit(features_train,labels_train)
predictions = clf.predict(features_test)
mean_squared_error_score.append(mean_squared_error(labels_test, predictions))
if trial % 10 == 0:
if first:
sys.stdout.write('\nProcessing')
sys.stdout.write('.')
sys.stdout.flush()
first = False
#print ("done.\n")
#print ("mse: {}".format(mean(mean_squared_error_score)))
return mean(mean_squared_error_score)
%time
#evaluate_pred_reg(l_reg, train_dataset.iloc[:,Selected_features], y_train['target'])
CPU times: user 2 µs, sys: 0 ns, total: 2 µs Wall time: 4.29 µs
#evaluate_pred_reg(rf_reg, train_dataset.iloc[:,Selected_features], y_train['target'])
#evaluate_pred_reg(gb_reg, train_dataset.iloc[:,Selected_features], y_train['target'])
#evaluate_pred_reg(tre_reg, train_dataset.iloc[:,Selected_features], y_train['target'])
#evaluate_pred_reg(knn_clf,train_dataset.iloc[:,Selected_features],y_train['target'])
Get unique pid
pid = list(train_dataset['pid'].unique())
train_dataset_2 = train_dataset
test_dataset_2 = test_dataset
features = [c for c in train_dataset.columns if ((c !="target") &(c !="pid") & (c !="day") & (c !="median_target_dummy"))]
scaler = preprocessing.StandardScaler()
train_dataset_2[features] = scaler.fit_transform(train_dataset_2[features])
test_dataset_2[features]=scaler.fit_transform(test_dataset_2[features])
resultat = []
i=0
for pid_i in pid:
resultat.append(evaluate_pred_reg(l_reg,train_dataset_2[train_dataset_2['pid']==pid_i].iloc[:,Selected_features], y_train[y_train['pid']==pid_i]['target']))
print(i)
i+=1
LinearRegression() Processing.0 LinearRegression() Processing.1 LinearRegression() Processing.2 LinearRegression() Processing.3 LinearRegression() Processing.4 LinearRegression() Processing.5 LinearRegression() Processing.6 LinearRegression() Processing.7 LinearRegression() Processing.8 LinearRegression() Processing.9 LinearRegression() Processing.10 LinearRegression() Processing.11 LinearRegression() Processing.12 LinearRegression() Processing.13 LinearRegression() Processing.14 LinearRegression() Processing.15 LinearRegression() Processing.16 LinearRegression() Processing.17 LinearRegression() Processing.18 LinearRegression() Processing.19 LinearRegression() Processing.20 LinearRegression() Processing.21 LinearRegression() Processing.22 LinearRegression() Processing.23 LinearRegression() Processing.24 LinearRegression() Processing.25 LinearRegression() Processing.26 LinearRegression() Processing.27 LinearRegression() Processing.28 LinearRegression() Processing.29 LinearRegression() Processing.30 LinearRegression() Processing.31 LinearRegression() Processing.32 LinearRegression() Processing.33 LinearRegression() Processing.34 LinearRegression() Processing.35 LinearRegression() Processing.36 LinearRegression() Processing.37 LinearRegression() Processing.38 LinearRegression() Processing.39 LinearRegression() Processing.40 LinearRegression() Processing.41 LinearRegression() Processing.42 LinearRegression() Processing.43 LinearRegression() Processing.44 LinearRegression() Processing.45 LinearRegression() Processing.46 LinearRegression() Processing.47 LinearRegression() Processing.48 LinearRegression() Processing.49 LinearRegression() Processing.50 LinearRegression() Processing.51 LinearRegression() Processing.52 LinearRegression() Processing.53 LinearRegression() Processing.54 LinearRegression() Processing.55 LinearRegression() Processing.56 LinearRegression() Processing.57 LinearRegression() Processing.58 LinearRegression() Processing.59 LinearRegression() Processing.60 LinearRegression() Processing.61 LinearRegression() Processing.62 LinearRegression() Processing.63 LinearRegression() Processing.64 LinearRegression() Processing.65 LinearRegression() Processing.66 LinearRegression() Processing.67 LinearRegression() Processing.68 LinearRegression() Processing.69 LinearRegression() Processing.70 LinearRegression() Processing.71 LinearRegression() Processing.72 LinearRegression() Processing.73 LinearRegression() Processing.74 LinearRegression() Processing.75 LinearRegression() Processing.76 LinearRegression() Processing.77 LinearRegression() Processing.78 LinearRegression() Processing.79 LinearRegression() Processing.80 LinearRegression() Processing.81 LinearRegression() Processing.82 LinearRegression() Processing.83 LinearRegression() Processing.84 LinearRegression() Processing.85 LinearRegression() Processing.86 LinearRegression() Processing.87 LinearRegression() Processing.88 LinearRegression() Processing.89 LinearRegression() Processing.90 LinearRegression() Processing.91 LinearRegression() Processing.92 LinearRegression() Processing.93 LinearRegression() Processing.94 LinearRegression() Processing.95 LinearRegression() Processing.96 LinearRegression() Processing.97 LinearRegression() Processing.98 LinearRegression() Processing.99 LinearRegression() Processing.100 LinearRegression() Processing.101 LinearRegression() Processing.102 LinearRegression() Processing.103 LinearRegression() Processing.104 LinearRegression() Processing.105 LinearRegression() Processing.106 LinearRegression() Processing.107 LinearRegression() Processing.108 LinearRegression() Processing.109 LinearRegression() Processing.110 LinearRegression() Processing.111 LinearRegression() Processing.112 LinearRegression() Processing.113 LinearRegression() Processing.114 LinearRegression() Processing.115 LinearRegression() Processing.116 LinearRegression() Processing.117 LinearRegression() Processing.118 LinearRegression() Processing.119 LinearRegression() Processing.120 LinearRegression() Processing.121 LinearRegression() Processing.122 LinearRegression() Processing.123 LinearRegression() Processing.124 LinearRegression() Processing.125 LinearRegression() Processing.126 LinearRegression() Processing.127 LinearRegression() Processing.128 LinearRegression() Processing.129 LinearRegression() Processing.130 LinearRegression() Processing.131 LinearRegression() Processing.132 LinearRegression() Processing.133 LinearRegression() Processing.134 LinearRegression() Processing.135 LinearRegression() Processing.136 LinearRegression() Processing.137 LinearRegression() Processing.138 LinearRegression() Processing.139 LinearRegression() Processing.140 LinearRegression() Processing.141 LinearRegression() Processing.142 LinearRegression() Processing.143 LinearRegression() Processing.144 LinearRegression() Processing.145 LinearRegression() Processing.146 LinearRegression() Processing.147 LinearRegression() Processing.148 LinearRegression() Processing.149 LinearRegression() Processing.150 LinearRegression() Processing.151 LinearRegression() Processing.152 LinearRegression() Processing.153 LinearRegression() Processing.154 LinearRegression() Processing.155 LinearRegression() Processing.156 LinearRegression() Processing.157 LinearRegression() Processing.158 LinearRegression() Processing.159 LinearRegression() Processing.160 LinearRegression() Processing.161 LinearRegression() Processing.162 LinearRegression() Processing.163 LinearRegression() Processing.164 LinearRegression() Processing.165 LinearRegression() Processing.166 LinearRegression() Processing.167 LinearRegression() Processing.168 LinearRegression() Processing.169 LinearRegression() Processing.170 LinearRegression() Processing.171 LinearRegression() Processing.172 LinearRegression() Processing.173 LinearRegression() Processing.174 LinearRegression() Processing.175 LinearRegression() Processing.176 LinearRegression() Processing.177 LinearRegression() Processing.178 LinearRegression() Processing.179 LinearRegression() Processing.180 LinearRegression() Processing.181 LinearRegression() Processing.182 LinearRegression() Processing.183 LinearRegression() Processing.184 LinearRegression() Processing.185 LinearRegression() Processing.186 LinearRegression() Processing.187 LinearRegression() Processing.188 LinearRegression() Processing.189 LinearRegression() Processing.190 LinearRegression() Processing.191 LinearRegression() Processing.192 LinearRegression() Processing.193 LinearRegression() Processing.194 LinearRegression() Processing.195 LinearRegression() Processing.196 LinearRegression() Processing.197 LinearRegression() Processing.198 LinearRegression() Processing.199 LinearRegression() Processing.200 LinearRegression() Processing.201 LinearRegression() Processing.202 LinearRegression() Processing.203 LinearRegression() Processing.204 LinearRegression() Processing.205 LinearRegression() Processing.206 LinearRegression() Processing.207 LinearRegression() Processing.208 LinearRegression() Processing.209 LinearRegression() Processing.210 LinearRegression() Processing.211 LinearRegression() Processing.212 LinearRegression() Processing.213 LinearRegression() Processing.214 LinearRegression() Processing.215 LinearRegression() Processing.216 LinearRegression() Processing.217 LinearRegression() Processing.218 LinearRegression() Processing.219 LinearRegression() Processing.220 LinearRegression() Processing.221 LinearRegression() Processing.222 LinearRegression() Processing.223 LinearRegression() Processing.224 LinearRegression() Processing.225 LinearRegression() Processing.226 LinearRegression() Processing.227 LinearRegression() Processing.228 LinearRegression() Processing.229 LinearRegression() Processing.230 LinearRegression() Processing.231 LinearRegression() Processing.232 LinearRegression() Processing.233 LinearRegression() Processing.234 LinearRegression() Processing.235 LinearRegression() Processing.236 LinearRegression() Processing.237 LinearRegression() Processing.238 LinearRegression() Processing.239 LinearRegression() Processing.240 LinearRegression() Processing.241 LinearRegression() Processing.242 LinearRegression() Processing.243 LinearRegression() Processing.244 LinearRegression() Processing.245 LinearRegression() Processing.246 LinearRegression() Processing.247 LinearRegression() Processing.248 LinearRegression() Processing.249 LinearRegression() Processing.250 LinearRegression() Processing.251 LinearRegression() Processing.252 LinearRegression() Processing.253 LinearRegression() Processing.254 LinearRegression() Processing.255 LinearRegression() Processing.256 LinearRegression() Processing.257 LinearRegression() Processing.258 LinearRegression() Processing.259 LinearRegression() Processing.260 LinearRegression() Processing.261 LinearRegression() Processing.262 LinearRegression() Processing.263 LinearRegression() Processing.264 LinearRegression() Processing.265 LinearRegression() Processing.266 LinearRegression() Processing.267 LinearRegression() Processing.268 LinearRegression() Processing.269 LinearRegression() Processing.270 LinearRegression() Processing.271 LinearRegression() Processing.272 LinearRegression() Processing.273 LinearRegression() Processing.274 LinearRegression() Processing.275 LinearRegression() Processing.276 LinearRegression() Processing.277 LinearRegression() Processing.278 LinearRegression() Processing.279 LinearRegression() Processing.280 LinearRegression() Processing.281 LinearRegression() Processing.282 LinearRegression() Processing.283 LinearRegression() Processing.284 LinearRegression() Processing.285 LinearRegression() Processing.286 LinearRegression() Processing.287 LinearRegression() Processing.288 LinearRegression() Processing.289 LinearRegression() Processing.290 LinearRegression() Processing.291 LinearRegression() Processing.292 LinearRegression() Processing.293 LinearRegression() Processing.294 LinearRegression() Processing.295 LinearRegression() Processing.296 LinearRegression() Processing.297 LinearRegression() Processing.298 LinearRegression() Processing.299 LinearRegression() Processing.300 LinearRegression() Processing.301 LinearRegression() Processing.302 LinearRegression() Processing.303 LinearRegression() Processing.304 LinearRegression() Processing.305 LinearRegression() Processing.306 LinearRegression() Processing.307 LinearRegression() Processing.308 LinearRegression() Processing.309 LinearRegression() Processing.310 LinearRegression() Processing.311 LinearRegression() Processing.312 LinearRegression() Processing.313 LinearRegression() Processing.314 LinearRegression() Processing.315 LinearRegression() Processing.316 LinearRegression() Processing.317 LinearRegression() Processing.318 LinearRegression() Processing.319 LinearRegression() Processing.320 LinearRegression() Processing.321 LinearRegression() Processing.322 LinearRegression() Processing.323 LinearRegression() Processing.324 LinearRegression() Processing.325 LinearRegression() Processing.326 LinearRegression() Processing.327 LinearRegression() Processing.328 LinearRegression() Processing.329 LinearRegression() Processing.330 LinearRegression() Processing.331 LinearRegression() Processing.332 LinearRegression() Processing.333 LinearRegression() Processing.334 LinearRegression() Processing.335 LinearRegression() Processing.336 LinearRegression() Processing.337 LinearRegression() Processing.338 LinearRegression() Processing.339 LinearRegression() Processing.340 LinearRegression() Processing.341 LinearRegression() Processing.342 LinearRegression() Processing.343 LinearRegression() Processing.344 LinearRegression() Processing.345 LinearRegression() Processing.346 LinearRegression() Processing.347 LinearRegression() Processing.348 LinearRegression() Processing.349 LinearRegression() Processing.350 LinearRegression() Processing.351 LinearRegression() Processing.352 LinearRegression() Processing.353 LinearRegression() Processing.354 LinearRegression() Processing.355 LinearRegression() Processing.356 LinearRegression() Processing.357 LinearRegression() Processing.358 LinearRegression() Processing.359 LinearRegression() Processing.360 LinearRegression() Processing.361 LinearRegression() Processing.362 LinearRegression() Processing.363 LinearRegression() Processing.364 LinearRegression() Processing.365 LinearRegression() Processing.366 LinearRegression() Processing.367 LinearRegression() Processing.368 LinearRegression() Processing.369 LinearRegression() Processing.370 LinearRegression() Processing.371 LinearRegression() Processing.372 LinearRegression() Processing.373 LinearRegression() Processing.374 LinearRegression() Processing.375 LinearRegression() Processing.376 LinearRegression() Processing.377 LinearRegression() Processing.378 LinearRegression() Processing.379 LinearRegression() Processing.380 LinearRegression() Processing.381 LinearRegression() Processing.382 LinearRegression() Processing.383 LinearRegression() Processing.384 LinearRegression() Processing.385 LinearRegression() Processing.386 LinearRegression() Processing.387 LinearRegression() Processing.388 LinearRegression() Processing.389 LinearRegression() Processing.390 LinearRegression() Processing.391 LinearRegression() Processing.392 LinearRegression() Processing.393 LinearRegression() Processing.394 LinearRegression() Processing.395 LinearRegression() Processing.396 LinearRegression() Processing.397 LinearRegression() Processing.398 LinearRegression() Processing.399 LinearRegression() Processing.400 LinearRegression() Processing.401 LinearRegression() Processing.402 LinearRegression() Processing.403 LinearRegression() Processing.404 LinearRegression() Processing.405 LinearRegression() Processing.406 LinearRegression() Processing.407 LinearRegression() Processing.408 LinearRegression() Processing.409 LinearRegression() Processing.410 LinearRegression() Processing.411 LinearRegression() Processing.412 LinearRegression() Processing.413 LinearRegression() Processing.414 LinearRegression() Processing.415 LinearRegression() Processing.416 LinearRegression() Processing.417 LinearRegression() Processing.418 LinearRegression() Processing.419 LinearRegression() Processing.420 LinearRegression() Processing.421 LinearRegression() Processing.422 LinearRegression() Processing.423 LinearRegression() Processing.424 LinearRegression() Processing.425 LinearRegression() Processing.426 LinearRegression() Processing.427 LinearRegression() Processing.428 LinearRegression() Processing.429 LinearRegression() Processing.430 LinearRegression() Processing.431 LinearRegression() Processing.432 LinearRegression() Processing.433 LinearRegression() Processing.434 LinearRegression() Processing.435 LinearRegression() Processing.436 LinearRegression() Processing.437 LinearRegression() Processing.438 LinearRegression() Processing.439 LinearRegression() Processing.440 LinearRegression() Processing.441 LinearRegression() Processing.442 LinearRegression() Processing.443 LinearRegression() Processing.444 LinearRegression() Processing.445 LinearRegression() Processing.446 LinearRegression() Processing.447 LinearRegression() Processing.448 LinearRegression() Processing.449 LinearRegression() Processing.450 LinearRegression() Processing.451 LinearRegression() Processing.452 LinearRegression() Processing.453 LinearRegression() Processing.454 LinearRegression() Processing.455 LinearRegression() Processing.456 LinearRegression() Processing.457 LinearRegression() Processing.458 LinearRegression() Processing.459 LinearRegression() Processing.460 LinearRegression() Processing.461 LinearRegression() Processing.462 LinearRegression() Processing.463 LinearRegression() Processing.464 LinearRegression() Processing.465 LinearRegression() Processing.466 LinearRegression() Processing.467 LinearRegression() Processing.468 LinearRegression() Processing.469 LinearRegression() Processing.470 LinearRegression() Processing.471 LinearRegression() Processing.472 LinearRegression() Processing.473 LinearRegression() Processing.474 LinearRegression() Processing.475 LinearRegression() Processing.476 LinearRegression() Processing.477 LinearRegression() Processing.478 LinearRegression() Processing.479 LinearRegression() Processing.480 LinearRegression() Processing.481 LinearRegression() Processing.482 LinearRegression() Processing.483 LinearRegression() Processing.484 LinearRegression() Processing.485 LinearRegression() Processing.486 LinearRegression() Processing.487 LinearRegression() Processing.488 LinearRegression() Processing.489 LinearRegression() Processing.490 LinearRegression() Processing.491 LinearRegression() Processing.492 LinearRegression() Processing.493 LinearRegression() Processing.494 LinearRegression() Processing.495 LinearRegression() Processing.496 LinearRegression() Processing.497 LinearRegression() Processing.498 LinearRegression() Processing.499 LinearRegression() Processing.500 LinearRegression() Processing.501 LinearRegression() Processing.502 LinearRegression() Processing.503 LinearRegression() Processing.504 LinearRegression() Processing.505 LinearRegression() Processing.506 LinearRegression() Processing.507 LinearRegression() Processing.508 LinearRegression() Processing.509 LinearRegression() Processing.510 LinearRegression() Processing.511 LinearRegression() Processing.512 LinearRegression() Processing.513 LinearRegression() Processing.514 LinearRegression() Processing.515 LinearRegression() Processing.516 LinearRegression() Processing.517 LinearRegression() Processing.518 LinearRegression() Processing.519 LinearRegression() Processing.520 LinearRegression() Processing.521 LinearRegression() Processing.522 LinearRegression() Processing.523 LinearRegression() Processing.524 LinearRegression() Processing.525 LinearRegression() Processing.526 LinearRegression() Processing.527 LinearRegression() Processing.528 LinearRegression() Processing.529 LinearRegression() Processing.530 LinearRegression() Processing.531 LinearRegression() Processing.532 LinearRegression() Processing.533 LinearRegression() Processing.534 LinearRegression() Processing.535 LinearRegression() Processing.536 LinearRegression() Processing.537 LinearRegression() Processing.538 LinearRegression() Processing.539 LinearRegression() Processing.540 LinearRegression() Processing.541 LinearRegression() Processing.542 LinearRegression() Processing.543 LinearRegression() Processing.544 LinearRegression() Processing.545 LinearRegression() Processing.546 LinearRegression() Processing.547 LinearRegression() Processing.548 LinearRegression() Processing.549 LinearRegression() Processing.550 LinearRegression() Processing.551 LinearRegression() Processing.552 LinearRegression() Processing.553 LinearRegression() Processing.554 LinearRegression() Processing.555 LinearRegression() Processing.556 LinearRegression() Processing.557 LinearRegression() Processing.558 LinearRegression() Processing.559 LinearRegression() Processing.560 LinearRegression() Processing.561 LinearRegression() Processing.562 LinearRegression() Processing.563 LinearRegression() Processing.564 LinearRegression() Processing.565 LinearRegression() Processing.566 LinearRegression() Processing.567 LinearRegression() Processing.568 LinearRegression() Processing.569 LinearRegression() Processing.570 LinearRegression() Processing.571 LinearRegression() Processing.572 LinearRegression() Processing.573 LinearRegression() Processing.574 LinearRegression() Processing.575 LinearRegression() Processing.576 LinearRegression() Processing.577 LinearRegression() Processing.578 LinearRegression() Processing.579 LinearRegression() Processing.580 LinearRegression() Processing.581 LinearRegression() Processing.582 LinearRegression() Processing.583 LinearRegression() Processing.584 LinearRegression() Processing.585 LinearRegression() Processing.586 LinearRegression() Processing.587 LinearRegression() Processing.588 LinearRegression() Processing.589 LinearRegression() Processing.590 LinearRegression() Processing.591 LinearRegression() Processing.592 LinearRegression() Processing.593 LinearRegression() Processing.594 LinearRegression() Processing.595 LinearRegression() Processing.596 LinearRegression() Processing.597 LinearRegression() Processing.598 LinearRegression() Processing.599 LinearRegression() Processing.600 LinearRegression() Processing.601 LinearRegression() Processing.602 LinearRegression() Processing.603 LinearRegression() Processing.604 LinearRegression() Processing.605 LinearRegression() Processing.606 LinearRegression() Processing.607 LinearRegression() Processing.608 LinearRegression() Processing.609 LinearRegression() Processing.610 LinearRegression() Processing.611 LinearRegression() Processing.612 LinearRegression() Processing.613 LinearRegression() Processing.614 LinearRegression() Processing.615 LinearRegression() Processing.616 LinearRegression() Processing.617 LinearRegression() Processing.618 LinearRegression() Processing.619 LinearRegression() Processing.620 LinearRegression() Processing.621 LinearRegression() Processing.622 LinearRegression() Processing.623 LinearRegression() Processing.624 LinearRegression() Processing.625 LinearRegression() Processing.626 LinearRegression() Processing.627 LinearRegression() Processing.628 LinearRegression() Processing.629 LinearRegression() Processing.630 LinearRegression() Processing.631 LinearRegression() Processing.632 LinearRegression() Processing.633 LinearRegression() Processing.634 LinearRegression() Processing.635 LinearRegression() Processing.636 LinearRegression() Processing.637 LinearRegression() Processing.638 LinearRegression() Processing.639 LinearRegression() Processing.640 LinearRegression() Processing.641 LinearRegression() Processing.642 LinearRegression() Processing.643 LinearRegression() Processing.644 LinearRegression() Processing.645 LinearRegression() Processing.646 LinearRegression() Processing.647 LinearRegression() Processing.648 LinearRegression() Processing.649 LinearRegression() Processing.650 LinearRegression() Processing.651 LinearRegression() Processing.652 LinearRegression() Processing.653 LinearRegression() Processing.654 LinearRegression() Processing.655 LinearRegression() Processing.656 LinearRegression() Processing.657 LinearRegression() Processing.658 LinearRegression() Processing.659 LinearRegression() Processing.660 LinearRegression() Processing.661 LinearRegression() Processing.662 LinearRegression() Processing.663 LinearRegression() Processing.664 LinearRegression() Processing.665 LinearRegression() Processing.666 LinearRegression() Processing.667 LinearRegression() Processing.668 LinearRegression() Processing.669 LinearRegression() Processing.670 LinearRegression() Processing.671 LinearRegression() Processing.672 LinearRegression() Processing.673 LinearRegression() Processing.674 LinearRegression() Processing.675 LinearRegression() Processing.676 LinearRegression() Processing.677 LinearRegression() Processing.678 LinearRegression() Processing.679 LinearRegression() Processing.680 LinearRegression() Processing.681 LinearRegression() Processing.682 LinearRegression() Processing.683 LinearRegression() Processing.684 LinearRegression() Processing.685 LinearRegression() Processing.686 LinearRegression() Processing.687 LinearRegression() Processing.688 LinearRegression() Processing.689 LinearRegression() Processing.690 LinearRegression() Processing.691 LinearRegression() Processing.692 LinearRegression() Processing.693 LinearRegression() Processing.694 LinearRegression() Processing.695 LinearRegression() Processing.696 LinearRegression() Processing.697 LinearRegression() Processing.698 LinearRegression() Processing.699 LinearRegression() Processing.700 LinearRegression() Processing.701 LinearRegression() Processing.702 LinearRegression() Processing.703 LinearRegression() Processing.704 LinearRegression() Processing.705 LinearRegression() Processing.706 LinearRegression() Processing.707 LinearRegression() Processing.708 LinearRegression() Processing.709 LinearRegression() Processing.710 LinearRegression() Processing.711 LinearRegression() Processing.712 LinearRegression() Processing.713 LinearRegression() Processing.714 LinearRegression() Processing.715 LinearRegression() Processing.716 LinearRegression() Processing.717 LinearRegression() Processing.718 LinearRegression() Processing.719 LinearRegression() Processing.720 LinearRegression() Processing.721 LinearRegression() Processing.722 LinearRegression() Processing.723 LinearRegression() Processing.724 LinearRegression() Processing.725 LinearRegression() Processing.726 LinearRegression() Processing.727 LinearRegression() Processing.728 LinearRegression() Processing.729 LinearRegression() Processing.730 LinearRegression() Processing.731 LinearRegression() Processing.732 LinearRegression() Processing.733 LinearRegression() Processing.734 LinearRegression() Processing.735 LinearRegression() Processing.736 LinearRegression() Processing.737 LinearRegression() Processing.738 LinearRegression() Processing.739 LinearRegression() Processing.740 LinearRegression() Processing.741 LinearRegression() Processing.742 LinearRegression() Processing.743 LinearRegression() Processing.744 LinearRegression() Processing.745 LinearRegression() Processing.746 LinearRegression() Processing.747 LinearRegression() Processing.748 LinearRegression() Processing.749 LinearRegression() Processing.750 LinearRegression() Processing.751 LinearRegression() Processing.752 LinearRegression() Processing.753 LinearRegression() Processing.754 LinearRegression() Processing.755 LinearRegression() Processing.756 LinearRegression() Processing.757 LinearRegression() Processing.758 LinearRegression() Processing.759 LinearRegression() Processing.760 LinearRegression() Processing.761 LinearRegression() Processing.762 LinearRegression() Processing.763 LinearRegression() Processing.764 LinearRegression() Processing.765 LinearRegression() Processing.766 LinearRegression() Processing.767 LinearRegression() Processing.768 LinearRegression() Processing.769 LinearRegression() Processing.770 LinearRegression() Processing.771 LinearRegression() Processing.772 LinearRegression() Processing.773 LinearRegression() Processing.774 LinearRegression() Processing.775 LinearRegression() Processing.776 LinearRegression() Processing.777 LinearRegression() Processing.778 LinearRegression() Processing.779 LinearRegression() Processing.780 LinearRegression() Processing.781 LinearRegression() Processing.782 LinearRegression() Processing.783 LinearRegression() Processing.784 LinearRegression() Processing.785 LinearRegression() Processing.786 LinearRegression() Processing.787 LinearRegression() Processing.788 LinearRegression() Processing.789 LinearRegression() Processing.790 LinearRegression() Processing.791 LinearRegression() Processing.792 LinearRegression() Processing.793 LinearRegression() Processing.794 LinearRegression() Processing.795 LinearRegression() Processing.796 LinearRegression() Processing.797 LinearRegression() Processing.798 LinearRegression() Processing.799 LinearRegression() Processing.800 LinearRegression() Processing.801 LinearRegression() Processing.802 LinearRegression() Processing.803 LinearRegression() Processing.804 LinearRegression() Processing.805 LinearRegression() Processing.806 LinearRegression() Processing.807 LinearRegression() Processing.808 LinearRegression() Processing.809 LinearRegression() Processing.810 LinearRegression() Processing.811 LinearRegression() Processing.812 LinearRegression() Processing.813 LinearRegression() Processing.814 LinearRegression() Processing.815 LinearRegression() Processing.816 LinearRegression() Processing.817 LinearRegression() Processing.818 LinearRegression() Processing.819 LinearRegression() Processing.820 LinearRegression() Processing.821 LinearRegression() Processing.822 LinearRegression() Processing.823 LinearRegression() Processing.824 LinearRegression() Processing.825 LinearRegression() Processing.826 LinearRegression() Processing.827 LinearRegression() Processing.828 LinearRegression() Processing.829 LinearRegression() Processing.830 LinearRegression() Processing.831 LinearRegression() Processing.832 LinearRegression() Processing.833 LinearRegression() Processing.834 LinearRegression() Processing.835 LinearRegression() Processing.836 LinearRegression() Processing.837 LinearRegression() Processing.838 LinearRegression() Processing.839 LinearRegression() Processing.840 LinearRegression() Processing.841 LinearRegression() Processing.842 LinearRegression() Processing.843 LinearRegression() Processing.844 LinearRegression() Processing.845 LinearRegression() Processing.846 LinearRegression() Processing.847 LinearRegression() Processing.848 LinearRegression() Processing.849 LinearRegression() Processing.850 LinearRegression() Processing.851 LinearRegression() Processing.852 LinearRegression() Processing.853 LinearRegression() Processing.854 LinearRegression() Processing.855 LinearRegression() Processing.856 LinearRegression() Processing.857 LinearRegression() Processing.858 LinearRegression() Processing.859 LinearRegression() Processing.860 LinearRegression() Processing.861 LinearRegression() Processing.862 LinearRegression() Processing.863 LinearRegression() Processing.864 LinearRegression() Processing.865 LinearRegression() Processing.866 LinearRegression() Processing.867 LinearRegression() Processing.868 LinearRegression() Processing.869 LinearRegression() Processing.870 LinearRegression() Processing.871 LinearRegression() Processing.872 LinearRegression() Processing.873 LinearRegression() Processing.874 LinearRegression() Processing.875 LinearRegression() Processing.876 LinearRegression() Processing.877 LinearRegression() Processing.878 LinearRegression() Processing.879 LinearRegression() Processing.880 LinearRegression() Processing.881 LinearRegression() Processing.882 LinearRegression() Processing.883 LinearRegression() Processing.884 LinearRegression() Processing.885 LinearRegression() Processing.886 LinearRegression() Processing.887 LinearRegression() Processing.888 LinearRegression() Processing.889 LinearRegression() Processing.890 LinearRegression() Processing.891 LinearRegression() Processing.892 LinearRegression() Processing.893 LinearRegression() Processing.894 LinearRegression() Processing.895 LinearRegression() Processing.896 LinearRegression() Processing.897 LinearRegression() Processing.898 LinearRegression() Processing.899
mean(resultat)
#list(train_dataset['pid'].unique())
0.3702027724976742
50 features 0.39 25 features 0.37 20 features 0.37 15 features 0.36 10 features 0.4
resultat_df = pd.DataFrame(resultat, columns = ['MSE'])
resultat_df['pid'] = pid
bad_pid = list(resultat_df[resultat_df['MSE']>0.4]['pid'])
len(bad_pid)
252
len(test_dataset)
311744
for pid in list(train_dataset['pid'].unique()):
l_reg_2 = LinearRegression()
l_reg_2.fit(train_dataset_2[train_dataset_2['pid']==pid].iloc[:,Selected_features], y_train[y_train['pid']==pid]['target'])
y_test = pd.DataFrame(l_reg_2.predict(test_dataset_2[test_dataset_2['pid'] == pid].iloc[:,Selected_features]),columns=['target'])
y_test = y_test.set_index(test_dataset_2[test_dataset_2['pid'] == pid].index)
if pid == 360 :
y_final = y_test
else :
y_final = pd.concat([y_final,y_test], axis = 0)
y_final = y_final.sort_index()
y_train['day'] = train_dataset['day']
train_X_error, test_X_error, train_y_error, test_y_error = train_test_split(train_dataset_2, y_train, test_size=0.5, random_state=42)
date_series = test_X_error['day']
pid = list(train_X_error['pid'].unique())
pid[0]
183
for pid_id in pid:
print(pid_id)
l_reg_error = LinearRegression()
l_reg_error.fit(train_X_error[train_X_error['pid']==pid_id].iloc[:,Selected_features], train_y_error[train_y_error['pid']==pid_id]['target'])
y_test_error = pd.DataFrame(l_reg_error.predict(test_X_error[test_X_error['pid'] == pid_id].iloc[:,Selected_features]),columns=['target'])
y_test_error = y_test_error.set_index(test_X_error[test_X_error['pid'] == pid_id].index)
if pid_id == pid[0] :
y_final_error = y_test_error
else :
y_final_error = pd.concat([y_final_error,y_test_error], axis = 0)
y_final_error = y_final_error.sort_index()
test_X_error['predict'] = y_final_error
183 420 288 526 584 431 211 128 787 800 115 491 360 591 434 522 164 768 445 320 593 603 667 6 629 781 858 429 862 404 334 465 775 459 387 656 699 126 330 482 821 300 783 275 873 177 72 375 143 241 67 368 351 536 289 344 875 296 762 439 730 118 610 163 601 835 56 333 132 273 356 712 42 23 3 637 337 774 406 525 838 548 841 823 581 19 52 598 469 127 702 283 680 579 569 88 694 11 634 236 716 818 711 641 530 287 870 693 407 811 399 379 502 395 717 578 773 57 473 701 367 252 815 223 695 450 721 291 378 137 709 438 377 276 69 74 604 410 323 87 796 310 597 863 311 322 595 529 107 40 867 644 31 899 883 540 170 845 706 200 85 176 890 263 20 562 192 652 130 561 462 669 51 219 878 585 338 91 369 196 93 519 157 755 664 560 658 346 791 175 227 686 278 99 766 405 847 806 770 8 317 831 454 654 495 757 358 659 626 146 851 778 760 371 734 341 708 17 551 691 138 607 476 398 826 879 233 43 411 190 385 872 259 169 882 214 225 83 160 824 627 340 628 417 537 651 850 180 820 150 231 147 498 141 671 240 545 832 123 457 688 661 490 403 419 27 788 361 253 674 242 884 754 179 359 381 18 247 507 96 864 487 504 442 885 206 14 621 782 345 573 750 113 103 586 298 892 517 765 362 553 501 186 612 165 613 302 638 825 28 515 380 489 763 402 565 350 295 262 400 682 881 166 391 769 66 795 780 308 846 497 139 220 810 767 393 348 700 282 814 22 110 557 704 152 195 784 383 819 347 161 25 218 834 49 234 270 257 264 363 707 723 798 460 527 50 514 564 441 600 853 134 316 58 534 305 676 136 26 327 808 86 249 94 325 472 318 887 772 142 173 437 554 313 418 209 486 32 106 552 500 301 449 622 558 443 45 479 267 421 430 447 422 666 738 599 722 424 494 744 10 620 187 79 743 41 785 412 335 148 92 739 856 81 505 827 435 174 746 98 16 759 566 632 724 523 238 224 29 162 159 511 648 452 374 729 639 886 294 0 269 63 484 172 574 7 254 869 580 655 39 271 321 777 605 328 512 475 349 392 592 880 745 740 657 747 805 188 279 268 828 801 101 102 829 568 408 228 797 839 212 199 266 100 70 854 30 703 590 193 451 842 307 649 73 596 731 154 12 284 61 119 645 384 776 518 36 394 245 633 877 237 567 894 444 893 681 520 105 89 277 616 488 201 9 129 608 898 499 749 844 857 222 896 248 265 104 413 532 668 764 733 736 144 822 389 306 409 319 521 261 292 478 550 315 258 640 232 609 155 202 833 397 376 339 871 215 425 725 650 414 689 354 235 849 386 803 285 503 714 121 582 76 477 888 860 753 226 415 135 35 286 684 21 576 120 726 427 243 891 830 124 448 643 556 531 611 239 464 203 874 817 53 312 538 687 77 326 309 167 697 614 587 416 732 198 727 793 541 365 2 204 653 456 189 647 543 71 470 812 280 453 255 624 549 606 895 112 583 184 4 24 868 304 631 168 794 329 46 843 250 324 563 720 572 836 848 742 122 370 426 314 210 185 471 299 662 577 194 493 432 216 761 771 837 636 272 809 13 756 55 594 840 692 117 15 62 458 90 735 68 618 672 151 33 428 467 665 483 366 646 630 617 388 575 423 510 37 718 182 588 542 373 673 357 485 353 792 84 466 642 807 524 741 713 589 111 571 623 802 181 804 861 131 737 547 461 125 446 116 696 133 65 748 752 690 48 663 343 516 213 660 509 789 401 5 336 390 492 859 244 710 496 440 463 480 59 342 615 506 178 559 191 281 535 382 555 396 331 230 246 786 779 619 274 685 533 546 865 153 75 433 352 156 719 1 97 293 751 171 251 60 372 625 80 635 468 816 889 436 114 602 715 513 364 95 109 679 866 897 705 544 303 876 207 64 728 528 297 813 790 140 145 677 197 474 855 38 355 158 670 481 108 34 260 217 678 78 799 54 221 455 208 44 332 675 698 82 570 149 47 758 290 205 852 539 683 508 229 256
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:13: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy del sys.path[0]
(mean_squared_error(test_y_error['target'], test_X_error['predict']))
0.3777353663860586
y_test_error = pd.DataFrame(y_test_error)
test_X_error['error']=test_y_error["target"] - test_X_error['predict']
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy """Entry point for launching an IPython kernel.
min_date = date_series.min()
max_date = date_series.max()
dates = list(range(min_date, max_date + 1))
len(dates)
805
X_error_train = test_X_error[test_X_error['day']<=600].drop(['error'], axis = 1)
y_error_train = test_X_error[test_X_error['day']<=600]['error']
X_error_test = test_X_error[test_X_error['day']>600].drop(['error'], axis = 1)
y_error_test = test_X_error[test_X_error['day']>600]['error']
Selected_features2 = [125, 497, 496, 506, 495, 126, 127, 371, 249, 494, 0, 499, 63,
17, 11,507]
X_error_train['median_target_dummy'] = X_error_train['median_target_dummy'].astype('int')
test_X_error.iloc[:,Selected_features2]
test_X_error['error']
99589 -0.236138 535740 -1.854041 228479 2.750132 214338 0.557808 242520 0.979978 ... 343544 0.557250 646428 0.265962 142340 -0.130043 143115 -0.377147 446301 0.105719 Name: error, Length: 342241, dtype: float64
def learning_rate_010_decay_power_099(current_iter):
base_learning_rate = 0.1
lr = base_learning_rate * np.power(.99, current_iter)
return lr if lr > 1e-3 else 1e-3
def learning_rate_010_decay_power_0995(current_iter):
base_learning_rate = 0.1
lr = base_learning_rate * np.power(.995, current_iter)
return lr if lr > 1e-3 else 1e-3
def learning_rate_005_decay_power_099(current_iter):
base_learning_rate = 0.05
lr = base_learning_rate * np.power(.99, current_iter)
return lr if lr > 1e-3 else 1e-3
fit_params={"early_stopping_rounds":30,
"eval_metric" : 'neg_mean_squared_error',
"eval_set" : [(X_error_test.iloc[:,Selected_features2],y_error_test)],
'eval_names': ['valid'],
'callbacks': [lgb.reset_parameter(learning_rate=learning_rate_010_decay_power_099)],
'verbose': 100,
'categorical_feature':'auto'}
from scipy.stats import randint as sp_randint
from scipy.stats import uniform as sp_uniform
param_test ={'num_leaves': sp_randint(6, 50),
'min_child_samples': sp_randint(100, 500),
'min_child_weight': [1e-5, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4],
'subsample': sp_uniform(loc=0.2, scale=0.8),
'colsample_bytree': sp_uniform(loc=0.4, scale=0.6),
'reg_alpha': [0, 1e-1, 1, 2, 5, 7, 10, 50, 100],
'reg_lambda': [0, 1e-1, 1, 5, 10, 20, 50, 100]}
model = LGBMRegressor( random_state=314, n_estimators=1000, device='gpu')
tscv = TimeSeriesSplit(n_splits=2)
n_HP_points_to_test = 100
gs = RandomizedSearchCV(
estimator=model,
param_distributions=param_test,
n_iter= n_HP_points_to_test,
scoring= 'neg_mean_squared_error',
cv= tscv,
refit=True,
random_state=314,
verbose=True)
#gs.fit(X_error_train.iloc[:,Selected_features2], y_error_train)
print('Best score reached: {} with params: {} '.format(gs.best_score_, gs.best_params_))
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-60-ed0585547c05> in <module>() 1 #gs.fit(X_error_train.iloc[:,Selected_features2], y_error_train) ----> 2 print('Best score reached: {} with params: {} '.format(gs.best_score_, gs.best_params_)) AttributeError: 'RandomizedSearchCV' object has no attribute 'best_score_'
opt_parameters = {'colsample_bytree': 0.7076074093370144,
'min_child_samples': 105,
'min_child_weight': 1e-05,
'num_leaves': 26,
'reg_alpha': 5,
'reg_lambda': 5,
'subsample': 0.7468773130235173}
LGBM_final = LGBMRegressor(**model.get_params())
evaluate_pred_reg(LGBM_final, test_X_error.iloc[:,Selected_features2], test_X_error['error'])
Add error just for bad predict
test_dataset_2['predict'] = y_final
test_dataset_2['median_target_dummy']=test_dataset_2['median_target_dummy'].astype('category')
test_dataset_2['error'] = LGBM_final.predict(test_dataset_2.iloc[:,Selected_features2])
pid_test = list(test_dataset_2['pid'].unique())
i = 0
for pid_id in pid_test:
print(i)
if pid_id == pid_test[0] :
#if pid_id in bad_pid :
# y_predict_final = test_dataset_2[test_dataset_2['pid'] == pid_id]['predict'] + test_dataset_2[test_dataset_2['pid'] == pid_id]['error']
#else :
y_predict_final= test_dataset_2[test_dataset_2['pid'] == pid_id]['predict']
y_predict_final['index'] = test_dataset_2[test_dataset_2['pid'] == pid_id].index
else :
if pid_id in bad_pid :
y_predict = test_dataset_2[test_dataset_2['pid'] == pid_id]['predict']
y_predict = y_predict + test_dataset_2[test_dataset_2['pid'] == pid_id]['error']
else :
y_predict = test_dataset_2[test_dataset_2['pid'] == pid_id]['predict']
y_predict['index'] = test_dataset_2[test_dataset_2['pid'] == pid_id].index
y_predict_final = pd.concat([y_predict_final,y_predict], axis = 0)
i +=1
predictions
predictions
%reset -f
import pandas as pd
import numpy as np
import os
import glob
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import TimeSeriesSplit
import xgboost
from sklearn.model_selection import train_test_split
#import delayed
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score
import lightgbm as lgb
from lightgbm import LGBMRegressor
from sklearn.datasets import make_moons
from sklearn.model_selection import RandomizedSearchCV
import math
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.tree import DecisionTreeClassifier
import sys
from numpy import mean
import pickle
from sklearn.feature_selection import f_regression
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn import preprocessing
from torch.autograd import Variable
import seaborn as sns
from matplotlib import colors
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
from tqdm import tqdm_notebook as tqdm
import joblib
import torch
from torch import nn
from fastprogress import master_bar, progress_bar
import random
data_dir = "/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/dataset/clean_dataset"
data_list = glob.glob(os.path.join(data_dir, '**.csv'))
train_dataset = pd.read_csv("%s/train_dataset.csv" % data_dir, sep=",")
test_dataset = pd.read_csv("%s/test_dataset.csv" % data_dir, sep=",")
train_dataset.head()
day | ID | pid | abs_ret0 | abs_ret1 | abs_ret2 | abs_ret3 | abs_ret4 | abs_ret5 | abs_ret6 | abs_ret7 | abs_ret8 | abs_ret9 | abs_ret10 | abs_ret11 | abs_ret12 | abs_ret13 | abs_ret14 | abs_ret15 | abs_ret16 | abs_ret17 | abs_ret18 | abs_ret19 | abs_ret20 | abs_ret21 | abs_ret22 | abs_ret23 | abs_ret24 | abs_ret25 | abs_ret26 | abs_ret27 | abs_ret28 | abs_ret29 | abs_ret30 | abs_ret31 | abs_ret32 | abs_ret33 | abs_ret34 | abs_ret35 | abs_ret36 | ... | median_day_abs_ret36 | median_day_abs_ret37 | median_day_abs_ret38 | median_day_abs_ret39 | median_day_abs_ret40 | median_day_abs_ret41 | median_day_abs_ret42 | median_day_abs_ret43 | median_day_abs_ret44 | median_day_abs_ret45 | median_day_abs_ret46 | median_day_abs_ret47 | median_day_abs_ret48 | median_day_abs_ret49 | median_day_abs_ret50 | median_day_abs_ret51 | median_day_abs_ret52 | median_day_abs_ret53 | median_day_abs_ret54 | median_day_abs_ret55 | median_day_abs_ret56 | median_day_abs_ret57 | median_day_abs_ret58 | median_day_abs_ret59 | median_day_abs_ret60 | min_ret | max_ret | std_ret | median_ret | sum_ret | min_vol | max_vol | std_vol | median_vol | median_day_sum_ret | median_day_sum_ret_before | kmeans_cluster_median_day_sum_ret_before | abs_kmeans_cluster_median_day_sum_ret_before | median_target_dummy | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 148 | 360 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.029317 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.073260 | 0.073265 | 0.000000 | 0.036601 | 0.102399 | 0.029261 | 0.000000 | 0.073206 | 0.032942 | 0.036609 | 0.000000 | 0.014641 | 0.036643 | 0.000000 | 0.036630 | 0.007326 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.014656 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.021989 | 0.036627 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.102399 | 0.022135 | 0.000000 | 0.739680 | 0.001629 | 0.076994 | 0.014242 | 0.011438 | 1.120654 | 0.0 | 0 | 0 | 0 | -3.403606 |
1 | 0 | 444 | 203 | 0.176289 | 0.000000 | 0.087951 | 0.000000 | 0.044033 | 0.000000 | 0.000000 | 0.218818 | 0.000000 | 0.110302 | 0.088496 | 0.044307 | 0.154902 | 0.000000 | 0.000000 | 0.088771 | 0.000000 | 0.044307 | 0.000000 | 0.000000 | 0.066622 | 0.000000 | 0.044405 | 0.088692 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.044287 | 0.022139 | 0.0 | 0.066357 | 0.066372 | 0.0 | 0.066386 | 0.088633 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.218818 | 0.047647 | 0.000000 | 1.878094 | 0.001803 | 0.135543 | 0.022036 | 0.009472 | 1.120654 | 0.0 | 0 | 0 | 0 | -2.193810 |
2 | 0 | 592 | 398 | 0.000000 | 0.027337 | 0.109649 | 0.027345 | 0.027255 | 0.054675 | 0.054585 | 0.000000 | 0.027307 | 0.082079 | 0.027375 | 0.054675 | 0.082147 | 0.041051 | 0.054780 | 0.000000 | 0.082102 | 0.000000 | 0.027322 | 0.027360 | 0.000000 | 0.013676 | 0.000000 | 0.054750 | 0.054735 | 0.027367 | 0.068428 | 0.027337 | 0.013669 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.054705 | 0.0 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.109649 | 0.028894 | 0.020502 | 1.492592 | 0.001977 | 0.056237 | 0.010132 | 0.013988 | 1.120654 | 0.0 | 0 | 0 | 0 | -2.387511 |
3 | 0 | 740 | 258 | 0.063403 | 0.031752 | 0.006354 | 0.063573 | 0.006350 | 0.076316 | 0.025436 | 0.076423 | 0.000000 | 0.050988 | 0.012767 | 0.038324 | 0.012765 | 0.000000 | 0.012767 | 0.025569 | 0.000000 | 0.025507 | 0.025523 | 0.000000 | 0.012755 | 0.006383 | 0.038300 | 0.102119 | 0.000000 | 0.038260 | 0.025484 | 0.000000 | 0.038192 | 0.000000 | 0.025468 | 0.0 | 0.012731 | 0.025458 | 0.0 | 0.006363 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.102119 | 0.021687 | 0.012723 | 1.120654 | 0.006477 | 0.038170 | 0.006395 | 0.015730 | 1.120654 | 0.0 | 0 | 0 | 0 | -2.467485 |
4 | 0 | 888 | 444 | 0.063318 | 0.000000 | 0.042230 | 0.021155 | 0.063532 | 0.106270 | 0.042535 | 0.063939 | 0.063735 | 0.000000 | 0.021268 | 0.063830 | 0.021259 | 0.000000 | 0.063816 | 0.063884 | 0.000000 | 0.021299 | 0.021295 | 0.021254 | 0.021254 | 0.000000 | 0.042526 | 0.021249 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.021245 | 0.021263 | 0.000000 | 0.0 | 0.042517 | 0.000000 | 0.0 | 0.000000 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.106270 | 0.024410 | 0.021200 | 1.284760 | 0.004231 | 0.082588 | 0.013899 | 0.012186 | 1.120654 | 0.0 | 0 | 0 | 0 | -2.380318 |
5 rows × 509 columns
We don't need ID in ML model
ID_train=train_dataset['ID']
ID_test=test_dataset['ID']
train_dataset=train_dataset.drop("ID",axis=1)
test_dataset=test_dataset.drop("ID",axis=1)
y_train = train_dataset[['pid','target']]
#features = [c for c in train_dataset.columns if ((c !="target") &(c !="pid") & (c !="day") & (c !="median_target_dummy"))]
#scaler = preprocessing.StandardScaler()
#train_dataset[features] = scaler.fit_transform(train_dataset[features])
#test_dataset[features]=scaler.fit_transform(test_dataset[features])
features = [c for c in train_dataset.columns if ((c !="target") & (c !="pid") & (c !="day") & (c !="median_target_dummy"))]
x_scaler = preprocessing.MinMaxScaler((-1,1))
train_dataset[features] = x_scaler.fit_transform(train_dataset[features])
target = [c for c in train_dataset.columns if (c =="target")]
y_scaler = preprocessing.MinMaxScaler((-1,1))
train_dataset['target'] = y_scaler.fit_transform(train_dataset[target])
#test_dataset['target']=0
test_dataset[features]=x_scaler.fit_transform(test_dataset[features])
#test_dataset[target]=y_scaler.fit_transform(test_dataset[target])
pid_selection = pd.DataFrame(train_dataset['pid'].value_counts()).sort_values(by=['pid'], ascending=False)
346+10
356
pid_selection=list(pid_selection[pid_selection['pid']>790].index)
SEED = 42
def seed_everything(seed):
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.deterministic = True
seed_everything(SEED)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
### This function creates a sliding window or sequences of seq_length days and labels_length days label ####
def sliding_windows(data, seq_length,labels_length):
x = []
y = []
z = []
for i in range(len(data)-(seq_length+labels_length)):
_x = data.iloc[i:(i+seq_length),:]
_y = data.iloc[(i+seq_length):(i+seq_length+labels_length),506:507]
_z = data.iloc[(i+seq_length):(i+seq_length+labels_length),:506]
x.append(np.array(_x))
y.append(np.array(_y))
z.append(np.array(_z))
return x,y,z
class Encoder(nn.Module):
def __init__(self, seq_len, n_features, embedding_dim=64):
super(Encoder, self).__init__()
self.seq_len, self.n_features = seq_len, n_features
self.embedding_dim, self.hidden_dim = embedding_dim, embedding_dim
self.num_layers = 3
self.rnn1 = nn.LSTM(
input_size=n_features,
hidden_size=self.hidden_dim,
num_layers=3,
batch_first=True,
dropout = 0.35
)
def forward(self, x):
x = x.reshape((1, self.seq_len, self.n_features))
h_1 = Variable(torch.zeros(
self.num_layers, x.size(0), self.hidden_dim).to(device))
c_1 = Variable(torch.zeros(
self.num_layers, x.size(0), self.hidden_dim).to(device))
x, (hidden, cell) = self.rnn1(x,(h_1, c_1))
#return hidden_n.reshape((self.n_features, self.embedding_dim))
return hidden , cell
class Decoder(nn.Module):
def __init__(self, seq_len, input_dim=64, n_features=1):
super(Decoder, self).__init__()
self.seq_len, self.input_dim = seq_len, input_dim
self.hidden_dim, self.n_features = input_dim, n_features
self.rnn1 = nn.LSTM(
input_size=n_features,
hidden_size=input_dim,
num_layers=3,
batch_first=True,
dropout = 0.35
)
self.output_layer = nn.Linear(self.hidden_dim, n_features)
def forward(self, x,input_hidden,input_cell):
x = x.reshape((1,1,self.n_features ))
#print("decode input",x.size())
x, (hidden_n, cell_n) = self.rnn1(x,(input_hidden,input_cell))
x = self.output_layer(x)
return x, hidden_n, cell_n
class Seq2Seq(nn.Module):
def __init__(self, seq_len, n_features, embedding_dim=64,output_length = 0):
super(Seq2Seq, self).__init__()
self.encoder = Encoder(seq_len, n_features, embedding_dim).to(device)
self.n_features = n_features
self.output_length = output_length
self.decoder = Decoder(seq_len, embedding_dim, n_features).to(device)
def forward(self,x, prev_y,features):
hidden,cell = self.encoder(x)
#Prepare place holder for decoder output
targets_ta = []
#prev_output become the next input to the LSTM cell
dec_input = prev_y
#dec_input = torch.cat([prev_output, curr_features], dim=1)
#itearate over LSTM - according to the required output days
for out_days in range(self.output_length) :
prev_x,prev_hidden,prev_cell = self.decoder(dec_input,hidden,cell)
hidden,cell = prev_hidden,prev_cell
prev_x = prev_x[:,:,0:1]
#print("preve x shape is:",prev_x.size())
#print("features shape is:",features[out_days+1].size())
if out_days+1 < self.output_length :
dec_input = torch.cat([prev_x,features[out_days+1].reshape(1,1,506)], dim=2)
targets_ta.append(prev_x.reshape(1))
targets = torch.stack(targets_ta)
return targets
def init_weights(m):
for name, param in m.named_parameters():
nn.init.uniform_(param.data, -0.08, 0.08)
def train_model(model, TrainX,Trainy,ValidX,Validy, Valid_features, seq_length, n_epochs, train_features, optimizer, criterion, scheduler):
history = dict(train=[], val=[])
#best_model_wts = copy.deepcopy(model.state_dict())
best_loss = 10000.0
mb = master_bar(range(1, n_epochs + 1))
for epoch in mb:
model = model.train()
train_losses = []
for i in progress_bar(range(TrainX.size()[0]),parent=mb):
seq_inp = TrainX[i,:,:].to(device)
seq_true = Trainy[i,:,:].to(device)
features = train_features[i,:,:].to(device)
optimizer.zero_grad()
seq_pred = model(seq_inp,seq_inp[seq_length-1:seq_length,:],features)
loss = criterion(seq_pred, seq_true)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1)
optimizer.step()
train_losses.append(loss.item())
val_losses = []
model = model.eval()
with torch.no_grad():
for i in progress_bar(range(ValidX.size()[0]),parent=mb):
seq_inp = ValidX[i,:,:].to(device)
seq_true = Validy[i,:,:].to(device)
features = Valid_features[i,:,:].to(device)
seq_pred = model(seq_inp,seq_inp[seq_length-1:seq_length,:],features)
loss = criterion(seq_pred, seq_true)
val_losses.append(loss.item())
train_loss = np.mean(train_losses)
val_loss = np.mean(val_losses)
history['train'].append(train_loss)
history['val'].append(val_loss)
if val_loss < best_loss:
best_loss = val_loss
torch.save(model.state_dict(), 'best_model_n_features.pt')
print("saved best model epoch:",epoch,"val loss is:",val_loss)
print(f'Epoch {epoch}: train loss {train_loss} val loss {val_loss}')
scheduler.step(val_loss)
#model.load_state_dict(best_model_wts)
return model.eval(), history
def seq2seq_global(pid,seq_length,epoch):
test_pid=test_dataset[test_dataset['pid']==pid]
test_pid["target"]=None
train_pid = train_dataset[train_dataset['pid'] == pid]
all_pid=pd.concat([train_pid,test_pid])
print(len (all_pid))
if (len(all_pid)==0):
return ('No value for this pid')
all_pid = all_pid.set_index(['day'])
train_size = int((all_pid.shape[0]-len(test_pid)) * 0.55)
valid_size = (all_pid.shape[0]-len(test_pid))- train_size
print("train size is:",train_size)
print("validation size is:",valid_size)
train_data = all_pid.iloc[0:train_size,:]
valid_data = all_pid.iloc[train_size:train_size+valid_size,:]
print("train data shape is:",train_data.shape)
print("validation data shape is:",valid_data.shape)
labels_length = len(test_pid)
train_X, train_y,train_features = sliding_windows(train_data, seq_length,labels_length)
print("train X has:", len(train_X) , "series")
print("train labels has:", len(train_y) , "series")
valid_X, valid_y,valid_features = sliding_windows(valid_data, seq_length,labels_length)
if (len(valid_X)==0):
return ('no validation dataset for this pid')
print("validiation X has:", len(valid_X) , "series")
print("Validiation labels has:" ,len(valid_y) , "series")
trainX = Variable(torch.Tensor(train_X))
trainy = Variable(torch.Tensor(train_y))
train_features = Variable(torch.Tensor(train_features))
validX = Variable(torch.Tensor(valid_X))
validy= Variable(torch.Tensor(valid_y))
valid_features = Variable(torch.Tensor(valid_features))
print ("trainX shape is:",trainX.size())
print ("trainy shape is:",trainy.size())
print ("train features shape is:",train_features.size())
print ("validX shape is:",validX.size())
print ("validy shape is:",validy.size())
print ("valid features shape is:",valid_features.size())
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
n_features = trainX.shape[2]
model = Seq2Seq(seq_length, n_features, 512,output_length=len(test_pid))
model = model.to(device)
model
print(model)
model.apply(init_weights)
#optimizer = torch.optim.RMSprop(model.parameters())
#optimizer = torch.optim.Adam(model.parameters(), lr=4e-3,weight_decay=1e-5)
#optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = torch.nn.MSELoss().to(device)
#lambda1 = lambda epoch: 0.65 ** epoch
#scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda1)
#scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max = 5e-3, eta_min=1e-8, last_epoch=-1)
#scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=10, factor =0.5 ,min_lr=1e-7, eps=1e-08)
optimizer = torch.optim.AdamW(model.parameters(), lr=0.0001)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, mode='min', factor=0.7, verbose=True, min_lr=1e-5)
model, history = train_model(
model = model,
TrainX = trainX, Trainy = trainy,
ValidX= validX, Validy = validy,
Valid_features = valid_features,
seq_length = seq_length,
n_epochs = epoch,
train_features = train_features,
optimizer = optimizer,
criterion = criterion,
scheduler= scheduler
)
TestX = np.array(all_pid.iloc[-2*len(test_pid):-len(test_pid),:])
Testy = np.array(all_pid.iloc[-len(test_pid):,:])
TestX = Variable(torch.Tensor(TestX))
Testy = Variable(torch.Tensor(Testy))
model.eval()
with torch.no_grad():
seq_inp = TestX.to(device)
seq_pred = model(TestX[0:seq_length,:].to(device),seq_inp[seq_length-1:seq_length,:],seq_inp[:,:506])
#seq_pred = model(TestX[-seq_length:,:].to(device),seq_inp[seq_length-1:seq_length,:],seq_inp[:,:506])
data_predict = seq_pred.cpu().numpy()
#labels = Testy
#data_predict.flatten()
#original_data = all_pid.iloc[-len(test_pid):,:]
#final = pd.DataFrame(original_data['target'])
pred = data_predict.flatten()
pred_df = pd.DataFrame(pred)
pred_df = pred_df.set_index([test_pid.index])
test_pid['target'] = pred_df
return test_pid
df_princip = pd.DataFrame([])
df_princip = pd.concat([df_princip, df_princip], axis = 0)
pid_selection[0:3]
[214, 663, 291]
df_princip = pd.DataFrame([])
for pid in progress_bar(pid_selection[250:300]):
df_aux = seq2seq_global(pid = pid, seq_length = 10, epoch = 8)
df_princip = pd.concat([df_princip, df_aux], axis = 0)
df_princip.to_csv('/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/prediction/lstm_pid/250_300.csv', sep=',', index=True)
1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.028080455586314203 Epoch 1: train loss 0.019764571494999387 val loss 0.028080455586314203 Epoch 2: train loss 0.01706123480661994 val loss 0.029568711668252944 saved best model epoch: 3 val loss is: 0.02789255976676941 Epoch 3: train loss 0.016911968322736875 val loss 0.02789255976676941 saved best model epoch: 4 val loss is: 0.02671697363257408 Epoch 4: train loss 0.016716912876637208 val loss 0.02671697363257408 saved best model epoch: 5 val loss is: 0.026326731592416764 Epoch 5: train loss 0.01660903383578573 val loss 0.026326731592416764 saved best model epoch: 6 val loss is: 0.026260706409811974 Epoch 6: train loss 0.016468249394425323 val loss 0.026260706409811974 saved best model epoch: 7 val loss is: 0.023813247680664062 Epoch 7: train loss 0.016216207983060962 val loss 0.023813247680664062 saved best model epoch: 8 val loss is: 0.02143314518034458 Epoch 8: train loss 0.015836964050928753 val loss 0.02143314518034458 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.011327947489917278 Epoch 1: train loss 0.012214347631448791 val loss 0.011327947489917278 Epoch 2: train loss 0.00810855800574202 val loss 0.011970473080873489 Epoch 3: train loss 0.007856647350958415 val loss 0.01162523590028286 Epoch 4: train loss 0.007718614873565024 val loss 0.01171762365847826 Epoch 5: train loss 0.007589248801758956 val loss 0.01197731588035822 saved best model epoch: 6 val loss is: 0.010991557873785496 Epoch 6: train loss 0.007444710864330686 val loss 0.010991557873785496 Epoch 7: train loss 0.00750346757316341 val loss 0.011635248921811581 Epoch 8: train loss 0.007314317427309496 val loss 0.011668768711388111 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.013277865014970303 Epoch 1: train loss 0.015245014014432118 val loss 0.013277865014970303 Epoch 2: train loss 0.010211751918264088 val loss 0.013313020020723343 Epoch 3: train loss 0.009644249620448266 val loss 0.013592981919646262 Epoch 4: train loss 0.009349611654345478 val loss 0.014355033449828625 Epoch 5: train loss 0.009122669996161546 val loss 0.013327371887862682 Epoch 6: train loss 0.008935393710132866 val loss 0.014136564731597901 Epoch 7: train loss 0.008637360934655936 val loss 0.013432040996849536 Epoch 7: reducing learning rate of group 0 to 7.0000e-05. Epoch 8: train loss 0.008405601766536989 val loss 0.014005350694060326 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.017718494683504105 Epoch 1: train loss 0.022278298779080313 val loss 0.017718494683504105 saved best model epoch: 2 val loss is: 0.016651373729109763 Epoch 2: train loss 0.018277325412435902 val loss 0.016651373729109763 saved best model epoch: 3 val loss is: 0.01570676378905773 Epoch 3: train loss 0.0172684896027758 val loss 0.01570676378905773 saved best model epoch: 4 val loss is: 0.015470663458108902 Epoch 4: train loss 0.016454931902920918 val loss 0.015470663458108902 Epoch 5: train loss 0.01571124329763864 val loss 0.019908280670642854 Epoch 6: train loss 0.01504922790142397 val loss 0.016607903689146043 Epoch 7: train loss 0.014018256815948657 val loss 0.019020943343639372 Epoch 8: train loss 0.012982066776159974 val loss 0.018600340560078622 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.03361293524503708 Epoch 1: train loss 0.01931382039384473 val loss 0.03361293524503708 saved best model epoch: 2 val loss is: 0.02447049580514431 Epoch 2: train loss 0.01651423821403157 val loss 0.02447049580514431 Epoch 3: train loss 0.015647535733435126 val loss 0.025379291176795958 Epoch 4: train loss 0.015109613792793382 val loss 0.03883090317249298 saved best model epoch: 5 val loss is: 0.02348432131111622 Epoch 5: train loss 0.014564823980132738 val loss 0.02348432131111622 Epoch 6: train loss 0.014725842025308382 val loss 0.029959313571453094 Epoch 7: train loss 0.014320628152095847 val loss 0.04060457125306129 Epoch 8: train loss 0.013472325989000854 val loss 0.035287410765886304 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.019363691098988056 Epoch 1: train loss 0.021374206047460258 val loss 0.019363691098988056 saved best model epoch: 2 val loss is: 0.018128428608179092 Epoch 2: train loss 0.01857131124889276 val loss 0.018128428608179092 saved best model epoch: 3 val loss is: 0.01715635322034359 Epoch 3: train loss 0.01792759861213615 val loss 0.01715635322034359 saved best model epoch: 4 val loss is: 0.017115753144025803 Epoch 4: train loss 0.016913928304720355 val loss 0.017115753144025803 Epoch 5: train loss 0.01639766915959407 val loss 0.017281072214245796 Epoch 6: train loss 0.01567707960325551 val loss 0.017246904782950878 Epoch 7: train loss 0.01575755396386586 val loss 0.02024575136601925 Epoch 8: train loss 0.01492330465600433 val loss 0.01785749662667513 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.017528698593378068 Epoch 1: train loss 0.022876135472740446 val loss 0.017528698593378068 Epoch 2: train loss 0.018139864850257124 val loss 0.017664847895503044 Epoch 3: train loss 0.01765818115589874 val loss 0.01753946878015995 saved best model epoch: 4 val loss is: 0.017473863437771797 Epoch 4: train loss 0.01753288596158936 val loss 0.017473863437771797 saved best model epoch: 5 val loss is: 0.017458992823958398 Epoch 5: train loss 0.017359834341775803 val loss 0.017458992823958398 Epoch 6: train loss 0.01707248156890273 val loss 0.01791900098323822 Epoch 7: train loss 0.016746964837823595 val loss 0.018676574900746347 Epoch 8: train loss 0.01628853477138494 val loss 0.01847469061613083 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.01578306104056537 Epoch 1: train loss 0.01567047446158277 val loss 0.01578306104056537 Epoch 2: train loss 0.012556606611932617 val loss 0.016721712425351143 Epoch 3: train loss 0.012210296107882476 val loss 0.016738916747272015 Epoch 4: train loss 0.012141240412840643 val loss 0.016766623593866825 Epoch 5: train loss 0.01179402143167085 val loss 0.01748674688860774 Epoch 6: train loss 0.0114718713427344 val loss 0.018716204445809126 Epoch 7: train loss 0.011191005055534553 val loss 0.021570734214037657 Epoch 7: reducing learning rate of group 0 to 7.0000e-05. Epoch 8: train loss 0.010583080934831896 val loss 0.01999108400195837 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.016957843210548162 Epoch 1: train loss 0.019175430456827205 val loss 0.016957843210548162 Epoch 2: train loss 0.014895406226257244 val loss 0.017006683629006147 Epoch 3: train loss 0.014312912934426084 val loss 0.01721080904826522 saved best model epoch: 4 val loss is: 0.016811753623187542 Epoch 4: train loss 0.014173786458839854 val loss 0.016811753623187542 Epoch 5: train loss 0.013746491049876413 val loss 0.01788472244516015 Epoch 6: train loss 0.013594412300960127 val loss 0.018143864814192057 Epoch 7: train loss 0.013011043128299427 val loss 0.021077302750200033 Epoch 8: train loss 0.012505752303783434 val loss 0.019864988047629595 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.03359435126185417 Epoch 1: train loss 0.022663147873189075 val loss 0.03359435126185417 Epoch 2: train loss 0.020388164703386377 val loss 0.0363938445225358 Epoch 3: train loss 0.01996464028685208 val loss 0.033854239620268345 Epoch 4: train loss 0.01966727533972407 val loss 0.03582765068858862 saved best model epoch: 5 val loss is: 0.029920848086476326 Epoch 5: train loss 0.01932297422584281 val loss 0.029920848086476326 Epoch 6: train loss 0.019144724269049715 val loss 0.031327840872108936 Epoch 7: train loss 0.018781426710536682 val loss 0.032052322290837765 Epoch 8: train loss 0.018316775863339383 val loss 0.031925950199365616 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.028208947740495205 Epoch 1: train loss 0.02291963339210993 val loss 0.028208947740495205 saved best model epoch: 2 val loss is: 0.023678440134972334 Epoch 2: train loss 0.016866547302399056 val loss 0.023678440134972334 saved best model epoch: 3 val loss is: 0.023154047783464193 Epoch 3: train loss 0.016349748169711555 val loss 0.023154047783464193 saved best model epoch: 4 val loss is: 0.023025997448712587 Epoch 4: train loss 0.01597025729042579 val loss 0.023025997448712587 saved best model epoch: 5 val loss is: 0.02211884642019868 Epoch 5: train loss 0.015653272585786252 val loss 0.02211884642019868 Epoch 6: train loss 0.015489373131688819 val loss 0.022311296314001083 Epoch 7: train loss 0.015206953656511852 val loss 0.022514001466333866 Epoch 8: train loss 0.014872799930443248 val loss 0.025458417367190123 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.03537100926041603 Epoch 1: train loss 0.030064393418381012 val loss 0.03537100926041603 saved best model epoch: 2 val loss is: 0.034991996362805367 Epoch 2: train loss 0.02686948261885758 val loss 0.034991996362805367 Epoch 3: train loss 0.02590120204511177 val loss 0.039479827508330345 saved best model epoch: 4 val loss is: 0.02715587243437767 Epoch 4: train loss 0.025260020047426224 val loss 0.02715587243437767 Epoch 5: train loss 0.024614322751042354 val loss 0.04102988447993994 Epoch 6: train loss 0.02300553580364549 val loss 0.03833800554275513 Epoch 7: train loss 0.021777798096279066 val loss 0.03011307306587696 Epoch 8: train loss 0.021579535139432872 val loss 0.04582468047738075 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.014879291504621505 Epoch 1: train loss 0.020002006819205626 val loss 0.014879291504621505 Epoch 2: train loss 0.01228581775822455 val loss 0.01639693006873131 Epoch 3: train loss 0.012039564372528167 val loss 0.019748421013355257 Epoch 4: train loss 0.011679155242052815 val loss 0.020164672285318375 Epoch 5: train loss 0.011461204560917048 val loss 0.020652127265930176 Epoch 6: train loss 0.011338309085528766 val loss 0.02031138353049755 Epoch 7: train loss 0.011020391248166561 val loss 0.021400478109717368 Epoch 7: reducing learning rate of group 0 to 7.0000e-05. Epoch 8: train loss 0.010711800672912173 val loss 0.021735168248414993 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.03737213276326656 Epoch 1: train loss 0.020545908126486354 val loss 0.03737213276326656 saved best model epoch: 2 val loss is: 0.03229719679802656 Epoch 2: train loss 0.017555617267288357 val loss 0.03229719679802656 saved best model epoch: 3 val loss is: 0.031322211027145386 Epoch 3: train loss 0.016891657767525638 val loss 0.031322211027145386 saved best model epoch: 4 val loss is: 0.028722341638058424 Epoch 4: train loss 0.016349302616015256 val loss 0.028722341638058424 Epoch 5: train loss 0.015760431952594994 val loss 0.0324331633746624 Epoch 6: train loss 0.015143397634467447 val loss 0.031183375045657158 Epoch 7: train loss 0.014571655810956496 val loss 0.0297341151162982 saved best model epoch: 8 val loss is: 0.027016229927539825 Epoch 8: train loss 0.013613614029284701 val loss 0.027016229927539825 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.02187782572582364 Epoch 1: train loss 0.02211770489363067 val loss 0.02187782572582364 saved best model epoch: 2 val loss is: 0.018624216318130493 Epoch 2: train loss 0.019367091380149484 val loss 0.018624216318130493 Epoch 3: train loss 0.01893615785492472 val loss 0.019200393930077553 saved best model epoch: 4 val loss is: 0.0183032532222569 Epoch 4: train loss 0.018386769860264766 val loss 0.0183032532222569 saved best model epoch: 5 val loss is: 0.01807159511372447 Epoch 5: train loss 0.01793222008160798 val loss 0.01807159511372447 Epoch 6: train loss 0.017627792859292894 val loss 0.0188482403755188 Epoch 7: train loss 0.017397026183554924 val loss 0.023190381471067667 Epoch 8: train loss 0.016544786573623318 val loss 0.01915559684857726 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.008417773433029652 Epoch 1: train loss 0.013052795060156356 val loss 0.008417773433029652 Epoch 2: train loss 0.008705360498944563 val loss 0.010032922960817814 Epoch 3: train loss 0.007910652962025432 val loss 0.010796316154301166 Epoch 4: train loss 0.007330850382069392 val loss 0.009928953275084496 Epoch 5: train loss 0.00653320054213206 val loss 0.00984885785728693 Epoch 6: train loss 0.005739756072649644 val loss 0.010563935898244382 Epoch 7: train loss 0.0051531981721165635 val loss 0.010375549457967282 Epoch 7: reducing learning rate of group 0 to 7.0000e-05. Epoch 8: train loss 0.004529053894137698 val loss 0.010331232845783234 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.010966712236404419 Epoch 1: train loss 0.013024195241520093 val loss 0.010966712236404419 Epoch 2: train loss 0.009970704115749825 val loss 0.012485087849199772 Epoch 3: train loss 0.009387591232856115 val loss 0.01384250670671463 Epoch 4: train loss 0.008786952028804947 val loss 0.013580685667693614 Epoch 5: train loss 0.008546433162077196 val loss 0.011857693083584309 Epoch 6: train loss 0.00837054360835325 val loss 0.01354286503046751 Epoch 7: train loss 0.008265527913213841 val loss 0.012693470157682896 Epoch 7: reducing learning rate of group 0 to 7.0000e-05. Epoch 8: train loss 0.008176606536532441 val loss 0.014306812919676304 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.014959665946662426 Epoch 1: train loss 0.021845070822607903 val loss 0.014959665946662426 saved best model epoch: 2 val loss is: 0.014542113989591599 Epoch 2: train loss 0.01460221926459954 val loss 0.014542113989591599 saved best model epoch: 3 val loss is: 0.014184136874973774 Epoch 3: train loss 0.014069749264135248 val loss 0.014184136874973774 Epoch 4: train loss 0.0135891757506345 val loss 0.014396696910262108 Epoch 5: train loss 0.012970597627350972 val loss 0.015514268167316913 Epoch 6: train loss 0.012858074180604447 val loss 0.015198434889316558 Epoch 7: train loss 0.012175099713550437 val loss 0.016466397792100906 Epoch 8: train loss 0.01223073469563609 val loss 0.014763793349266053 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.020190811716020107 Epoch 1: train loss 0.022231685670923037 val loss 0.020190811716020107 Epoch 2: train loss 0.019295742735266685 val loss 0.020251433365046978 Epoch 3: train loss 0.018910494532032186 val loss 0.020892996340990067 Epoch 4: train loss 0.018144074401043982 val loss 0.021352517418563366 Epoch 5: train loss 0.017045841989926546 val loss 0.02140090800821781 Epoch 6: train loss 0.016400209807577622 val loss 0.022557467687875032 Epoch 7: train loss 0.01602406778473811 val loss 0.02212440362200141 Epoch 7: reducing learning rate of group 0 to 7.0000e-05. Epoch 8: train loss 0.015437888867973563 val loss 0.024813936557620764 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.013164098374545574 Epoch 1: train loss 0.01489217880935896 val loss 0.013164098374545574 Epoch 2: train loss 0.012754622325744657 val loss 0.014462347328662872 Epoch 3: train loss 0.012433396785386972 val loss 0.016141487285494804 Epoch 4: train loss 0.012274353720602534 val loss 0.016288101300597192 Epoch 5: train loss 0.012099924275562876 val loss 0.01772027388215065 Epoch 6: train loss 0.01184494013986772 val loss 0.017316967621445654 Epoch 7: train loss 0.011586978089153058 val loss 0.01975127197802067 Epoch 7: reducing learning rate of group 0 to 7.0000e-05. Epoch 8: train loss 0.01131754141256568 val loss 0.018372759222984314 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.017112010344862937 Epoch 1: train loss 0.02151948382102308 val loss 0.017112010344862937 saved best model epoch: 2 val loss is: 0.0166277751326561 Epoch 2: train loss 0.014921831155550621 val loss 0.0166277751326561 Epoch 3: train loss 0.014666681211175663 val loss 0.01667882949113846 Epoch 4: train loss 0.014535534361909543 val loss 0.01667424701154232 Epoch 5: train loss 0.014308384752699308 val loss 0.01675509437918663 Epoch 6: train loss 0.014046995856222651 val loss 0.017667219042778015 Epoch 7: train loss 0.01396703772202489 val loss 0.01668851003050804 Epoch 8: train loss 0.013682414899535832 val loss 0.017268473282456398 Epoch 8: reducing learning rate of group 0 to 7.0000e-05. 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.015205624746158719 Epoch 1: train loss 0.016135292363364296 val loss 0.015205624746158719 saved best model epoch: 2 val loss is: 0.015169401420280337 Epoch 2: train loss 0.013607606864029384 val loss 0.015169401420280337 Epoch 3: train loss 0.012927827144782227 val loss 0.015285322442650795 saved best model epoch: 4 val loss is: 0.01469058683142066 Epoch 4: train loss 0.012478400100336736 val loss 0.01469058683142066 saved best model epoch: 5 val loss is: 0.014537865994498134 Epoch 5: train loss 0.012093275695680136 val loss 0.014537865994498134 Epoch 6: train loss 0.011671197600662708 val loss 0.01489582797512412 saved best model epoch: 7 val loss is: 0.014214539900422096 Epoch 7: train loss 0.01113715536712882 val loss 0.014214539900422096 Epoch 8: train loss 0.010382379966238177 val loss 0.015344538725912571 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.02098910929635167 Epoch 1: train loss 0.02183373187141246 val loss 0.02098910929635167 saved best model epoch: 2 val loss is: 0.020979054272174835 Epoch 2: train loss 0.018669372805988932 val loss 0.020979054272174835 saved best model epoch: 3 val loss is: 0.017973075155168772 Epoch 3: train loss 0.018113320334309554 val loss 0.017973075155168772 saved best model epoch: 4 val loss is: 0.01739439321681857 Epoch 4: train loss 0.017396412587848055 val loss 0.01739439321681857 saved best model epoch: 5 val loss is: 0.016439820174127817 Epoch 5: train loss 0.01707279433058687 val loss 0.016439820174127817 Epoch 6: train loss 0.01650673265467925 val loss 0.018318782094866037 Epoch 7: train loss 0.016218577534050108 val loss 0.017679209355264902 Epoch 8: train loss 0.01660407993224371 val loss 0.019378353841602802 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.03442043624818325 Epoch 1: train loss 0.021191165098045247 val loss 0.03442043624818325 saved best model epoch: 2 val loss is: 0.031434847973287106 Epoch 2: train loss 0.018877854166799282 val loss 0.031434847973287106 saved best model epoch: 3 val loss is: 0.0286637875251472 Epoch 3: train loss 0.018387557972357214 val loss 0.0286637875251472 saved best model epoch: 4 val loss is: 0.028158247005194426 Epoch 4: train loss 0.018098168041422426 val loss 0.028158247005194426 Epoch 5: train loss 0.018004386775554663 val loss 0.02838826458901167 Epoch 6: train loss 0.017828026719122047 val loss 0.034308554604649544 Epoch 7: train loss 0.017714400383004224 val loss 0.030002973042428493 Epoch 8: train loss 0.017138237840529667 val loss 0.02885685209184885 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.015227733412757516 Epoch 1: train loss 0.018918600898370684 val loss 0.015227733412757516 saved best model epoch: 2 val loss is: 0.01513720746152103 Epoch 2: train loss 0.015379210608641067 val loss 0.01513720746152103 Epoch 3: train loss 0.015104946119329297 val loss 0.015402469784021378 Epoch 4: train loss 0.014825314687318113 val loss 0.015212445054203272 Epoch 5: train loss 0.014558545998241528 val loss 0.015390918590128422 Epoch 6: train loss 0.014323898434010615 val loss 0.015498350374400616 Epoch 7: train loss 0.014269261834133103 val loss 0.018121593166142702 Epoch 8: train loss 0.013668799380132234 val loss 0.021221890579909086 Epoch 8: reducing learning rate of group 0 to 7.0000e-05. 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.013226635754108429 Epoch 1: train loss 0.016044948092964757 val loss 0.013226635754108429 saved best model epoch: 2 val loss is: 0.012970123440027237 Epoch 2: train loss 0.013096151200224119 val loss 0.012970123440027237 Epoch 3: train loss 0.012722889275615474 val loss 0.013319585705175996 saved best model epoch: 4 val loss is: 0.012775912648066878 Epoch 4: train loss 0.012560197623469025 val loss 0.012775912648066878 Epoch 5: train loss 0.012282265103366002 val loss 0.012980219675228 saved best model epoch: 6 val loss is: 0.012427295092493296 Epoch 6: train loss 0.011875061037759465 val loss 0.012427295092493296 Epoch 7: train loss 0.011166983575795788 val loss 0.013699022587388754 Epoch 8: train loss 0.010854268569992968 val loss 0.015928468201309443 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.028280925005674362 Epoch 1: train loss 0.019775247434154153 val loss 0.028280925005674362 Epoch 2: train loss 0.01567284884818253 val loss 0.030015609040856362 saved best model epoch: 3 val loss is: 0.027744071930646895 Epoch 3: train loss 0.015309822202349702 val loss 0.027744071930646895 saved best model epoch: 4 val loss is: 0.025708923861384392 Epoch 4: train loss 0.014967368195010792 val loss 0.025708923861384392 Epoch 5: train loss 0.014803968358873612 val loss 0.029012543708086015 Epoch 6: train loss 0.014158132652352964 val loss 0.0329549215734005 saved best model epoch: 7 val loss is: 0.023839811235666274 Epoch 7: train loss 0.013760681641066358 val loss 0.023839811235666274 Epoch 8: train loss 0.013555467672025165 val loss 0.02549779824912548 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.017300904262810946 Epoch 1: train loss 0.021283909005213934 val loss 0.017300904262810946 saved best model epoch: 2 val loss is: 0.017181545961648226 Epoch 2: train loss 0.01619386536080435 val loss 0.017181545961648226 Epoch 3: train loss 0.015703756429524308 val loss 0.01739394525066018 Epoch 4: train loss 0.015579846137797976 val loss 0.017365333158522844 Epoch 5: train loss 0.015354345900466642 val loss 0.01760115148499608 Epoch 6: train loss 0.015276684739952346 val loss 0.017653692048043013 Epoch 7: train loss 0.01510164814900203 val loss 0.018763678148388863 Epoch 8: train loss 0.014948111207011235 val loss 0.017416466027498245 Epoch 8: reducing learning rate of group 0 to 7.0000e-05. 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.019285273738205433 Epoch 1: train loss 0.016546568070280265 val loss 0.019285273738205433 saved best model epoch: 2 val loss is: 0.018670279998332262 Epoch 2: train loss 0.012371358248483703 val loss 0.018670279998332262 saved best model epoch: 3 val loss is: 0.018648356664925814 Epoch 3: train loss 0.012011683770421758 val loss 0.018648356664925814 saved best model epoch: 4 val loss is: 0.017541783396154642 Epoch 4: train loss 0.011482842625623726 val loss 0.017541783396154642 saved best model epoch: 5 val loss is: 0.0171653819270432 Epoch 5: train loss 0.011396192575255072 val loss 0.0171653819270432 Epoch 6: train loss 0.011005618385072932 val loss 0.018164426553994417 Epoch 7: train loss 0.010295962432332068 val loss 0.02018228964880109 Epoch 8: train loss 0.008886760588825107 val loss 0.01846550963819027 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.018206078093498945 Epoch 1: train loss 0.020126427961401194 val loss 0.018206078093498945 saved best model epoch: 2 val loss is: 0.018024377524852753 Epoch 2: train loss 0.016365895519055516 val loss 0.018024377524852753 Epoch 3: train loss 0.015658448528812593 val loss 0.01807125099003315 Epoch 4: train loss 0.014866173065390932 val loss 0.018032348714768887 Epoch 5: train loss 0.013914803054228604 val loss 0.01895546680316329 Epoch 6: train loss 0.012724222772451768 val loss 0.01840045629069209 Epoch 7: train loss 0.011589826320309237 val loss 0.020018348935991526 Epoch 8: train loss 0.010181635034730635 val loss 0.020148200914263725 Epoch 8: reducing learning rate of group 0 to 7.0000e-05. 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.019210746977478266 Epoch 1: train loss 0.018789479778294104 val loss 0.019210746977478266 saved best model epoch: 2 val loss is: 0.018433145247399807 Epoch 2: train loss 0.014104839908071312 val loss 0.018433145247399807 Epoch 3: train loss 0.013502216796918088 val loss 0.02058341819792986 Epoch 4: train loss 0.01316139320786818 val loss 0.01920843217521906 Epoch 5: train loss 0.012600415498467094 val loss 0.022248287685215473 Epoch 6: train loss 0.012116464831114533 val loss 0.018949071411043406 Epoch 7: train loss 0.011513305435248887 val loss 0.0227605695836246 Epoch 8: train loss 0.010824513655290547 val loss 0.023020644672214985 Epoch 8: reducing learning rate of group 0 to 7.0000e-05. 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.014182951068505645 Epoch 1: train loss 0.015157732181251049 val loss 0.014182951068505645 saved best model epoch: 2 val loss is: 0.014170864131301641 Epoch 2: train loss 0.013253443010420684 val loss 0.014170864131301641 saved best model epoch: 3 val loss is: 0.014102759072557092 Epoch 3: train loss 0.013016681074647301 val loss 0.014102759072557092 Epoch 4: train loss 0.012868770229888249 val loss 0.01411436963826418 saved best model epoch: 5 val loss is: 0.013993639731779695 Epoch 5: train loss 0.012842784400086805 val loss 0.013993639731779695 Epoch 6: train loss 0.01283735841378031 val loss 0.014033997198566794 Epoch 7: train loss 0.012698653434593993 val loss 0.014030550606548786 Epoch 8: train loss 0.012480765771883798 val loss 0.01427493616938591 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.018830459099262953 Epoch 1: train loss 0.014067703162331179 val loss 0.018830459099262953 Epoch 2: train loss 0.011883765043892774 val loss 0.018914734944701195 saved best model epoch: 3 val loss is: 0.01856868714094162 Epoch 3: train loss 0.011499060205666416 val loss 0.01856868714094162 Epoch 4: train loss 0.01123984513736992 val loss 0.018892429769039154 Epoch 5: train loss 0.011030105997370669 val loss 0.018817367497831583 saved best model epoch: 6 val loss is: 0.01723636779934168 Epoch 6: train loss 0.01070099224123251 val loss 0.01723636779934168 Epoch 7: train loss 0.010196247093199965 val loss 0.017323991283774376 Epoch 8: train loss 0.010130242018186185 val loss 0.021211490500718355 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.034313428401947024 Epoch 1: train loss 0.017842648617391075 val loss 0.034313428401947024 saved best model epoch: 2 val loss is: 0.03355739638209343 Epoch 2: train loss 0.015239586177769871 val loss 0.03355739638209343 saved best model epoch: 3 val loss is: 0.03147742860019207 Epoch 3: train loss 0.014864106429740787 val loss 0.03147742860019207 saved best model epoch: 4 val loss is: 0.03106333427131176 Epoch 4: train loss 0.01467611456644677 val loss 0.03106333427131176 Epoch 5: train loss 0.014498403772623056 val loss 0.03191235587000847 Epoch 6: train loss 0.014108227011525915 val loss 0.03132810294628143 saved best model epoch: 7 val loss is: 0.030810684338212012 Epoch 7: train loss 0.01377756605368285 val loss 0.030810684338212012 Epoch 8: train loss 0.013790523001391972 val loss 0.03489984646439552 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.02379463752731681 Epoch 1: train loss 0.016721529167998267 val loss 0.02379463752731681 Epoch 2: train loss 0.013306526074477708 val loss 0.023939634207636118 saved best model epoch: 3 val loss is: 0.022545862942934036 Epoch 3: train loss 0.012797650522047496 val loss 0.022545862942934036 saved best model epoch: 4 val loss is: 0.020886266138404608 Epoch 4: train loss 0.012556395426123258 val loss 0.020886266138404608 saved best model epoch: 5 val loss is: 0.020839196164160967 Epoch 5: train loss 0.01228491671338498 val loss 0.020839196164160967 Epoch 6: train loss 0.012056406250739672 val loss 0.020919280126690865 Epoch 7: train loss 0.011700195697955338 val loss 0.023905588779598475 Epoch 8: train loss 0.011422486785036254 val loss 0.022687969263643026 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.028548785019665956 Epoch 1: train loss 0.01918166811880936 val loss 0.028548785019665956 saved best model epoch: 2 val loss is: 0.027547269128262997 Epoch 2: train loss 0.016320437013384807 val loss 0.027547269128262997 saved best model epoch: 3 val loss is: 0.023621768224984407 Epoch 3: train loss 0.015461720930165556 val loss 0.023621768224984407 saved best model epoch: 4 val loss is: 0.022199569270014763 Epoch 4: train loss 0.014599622597536409 val loss 0.022199569270014763 saved best model epoch: 5 val loss is: 0.01994157861918211 Epoch 5: train loss 0.013857599628348666 val loss 0.01994157861918211 saved best model epoch: 6 val loss is: 0.01993605261668563 Epoch 6: train loss 0.013460675433996212 val loss 0.01993605261668563 Epoch 7: train loss 0.012920259733695582 val loss 0.028154442086815834 Epoch 8: train loss 0.012712180782782745 val loss 0.02579958690330386 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.05394638795405626 Epoch 1: train loss 0.02217924217861819 val loss 0.05394638795405626 saved best model epoch: 2 val loss is: 0.03731164522469044 Epoch 2: train loss 0.017024208533476633 val loss 0.03731164522469044 Epoch 3: train loss 0.016386669757495444 val loss 0.039628478698432446 Epoch 4: train loss 0.016159391086772983 val loss 0.04215957410633564 Epoch 5: train loss 0.015610584113971296 val loss 0.04547310620546341 Epoch 6: train loss 0.015018225120133665 val loss 0.0421921219676733 Epoch 7: train loss 0.013906018034252057 val loss 0.04685465432703495 Epoch 8: train loss 0.012762270778058523 val loss 0.05135950446128845 Epoch 8: reducing learning rate of group 0 to 7.0000e-05. 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.020092541724443434 Epoch 1: train loss 0.03458834228859771 val loss 0.020092541724443434 saved best model epoch: 2 val loss is: 0.01966281794011593 Epoch 2: train loss 0.028968674995537316 val loss 0.01966281794011593 Epoch 3: train loss 0.02847803798725917 val loss 0.019898752495646477 Epoch 4: train loss 0.02783358704653524 val loss 0.020083936303853987 Epoch 5: train loss 0.02685919643512794 val loss 0.022667815908789635 Epoch 6: train loss 0.02578524701918165 val loss 0.02254798971116543 Epoch 7: train loss 0.025301491509058645 val loss 0.024956594035029412 Epoch 8: train loss 0.023965115437195414 val loss 0.02243170849978924 Epoch 8: reducing learning rate of group 0 to 7.0000e-05. 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.018691856414079666 Epoch 1: train loss 0.015075799687543795 val loss 0.018691856414079666 saved best model epoch: 2 val loss is: 0.016743604838848115 Epoch 2: train loss 0.011274311658261078 val loss 0.016743604838848115 saved best model epoch: 3 val loss is: 0.015733280405402183 Epoch 3: train loss 0.010784541389771871 val loss 0.015733280405402183 saved best model epoch: 4 val loss is: 0.01493067853152752 Epoch 4: train loss 0.010468072512940992 val loss 0.01493067853152752 saved best model epoch: 5 val loss is: 0.01396320629864931 Epoch 5: train loss 0.010273774198832967 val loss 0.01396320629864931 Epoch 6: train loss 0.010152335478258985 val loss 0.01509640421718359 Epoch 7: train loss 0.009939371153623575 val loss 0.015360203571617603 Epoch 8: train loss 0.009734672849022206 val loss 0.016934173926711083 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.009812348708510398 Epoch 1: train loss 0.011285956860298202 val loss 0.009812348708510398 saved best model epoch: 2 val loss is: 0.009745930135250092 Epoch 2: train loss 0.008836348430209216 val loss 0.009745930135250092 saved best model epoch: 3 val loss is: 0.008233178034424781 Epoch 3: train loss 0.008487518688309052 val loss 0.008233178034424781 Epoch 4: train loss 0.008354479280699576 val loss 0.008608040399849415 Epoch 5: train loss 0.008203677733295731 val loss 0.009136944264173507 Epoch 6: train loss 0.008047335454085399 val loss 0.01005605049431324 Epoch 7: train loss 0.00795249206878777 val loss 0.009007071517407894 Epoch 8: train loss 0.007779829442456719 val loss 0.008799588494002818 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.046714216470718384 Epoch 1: train loss 0.025826233106532266 val loss 0.046714216470718384 saved best model epoch: 2 val loss is: 0.036852990835905076 Epoch 2: train loss 0.017339234961019384 val loss 0.036852990835905076 Epoch 3: train loss 0.016987837496257964 val loss 0.03941105902194977 Epoch 4: train loss 0.016534364349874004 val loss 0.038830258697271344 Epoch 5: train loss 0.016321501849840086 val loss 0.04223173260688782 Epoch 6: train loss 0.015760072229784868 val loss 0.04350303635001183 saved best model epoch: 7 val loss is: 0.031138669326901437 Epoch 7: train loss 0.015607024698207775 val loss 0.031138669326901437 Epoch 8: train loss 0.015360589615911954 val loss 0.04178188741207123 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.015667536482214927 Epoch 1: train loss 0.019867362498882272 val loss 0.015667536482214927 Epoch 2: train loss 0.016448528018026126 val loss 0.016121906042099 Epoch 3: train loss 0.015962623469975023 val loss 0.01637333519756794 Epoch 4: train loss 0.015212009771771375 val loss 0.01812833398580551 Epoch 5: train loss 0.01442949089132959 val loss 0.017379732429981233 Epoch 6: train loss 0.01342326432599553 val loss 0.018800373747944833 saved best model epoch: 7 val loss is: 0.015595637075603009 Epoch 7: train loss 0.012723049825234782 val loss 0.015595637075603009 Epoch 8: train loss 0.011465448320710234 val loss 0.017078741267323495 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.020668099448084832 Epoch 1: train loss 0.02967769932001829 val loss 0.020668099448084832 Epoch 2: train loss 0.02613047111247267 val loss 0.02074289359152317 saved best model epoch: 3 val loss is: 0.017906352505087854 Epoch 3: train loss 0.02546164056374913 val loss 0.017906352505087854 Epoch 4: train loss 0.025093235141996826 val loss 0.01892692781984806 Epoch 5: train loss 0.024894387949080693 val loss 0.018244649469852447 Epoch 6: train loss 0.024683797865041664 val loss 0.01858431287109852 Epoch 7: train loss 0.02448495506264624 val loss 0.019112270697951315 Epoch 8: train loss 0.024250379852240996 val loss 0.019647125527262686 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.015214960556477308 Epoch 1: train loss 0.023879382961306227 val loss 0.015214960556477308 Epoch 2: train loss 0.021978633276310312 val loss 0.015262369066476822 Epoch 3: train loss 0.021634442776621104 val loss 0.01572700683027506 Epoch 4: train loss 0.021202881876603668 val loss 0.015354686183854938 saved best model epoch: 5 val loss is: 0.014800063567236066 Epoch 5: train loss 0.02075151594198612 val loss 0.014800063567236066 Epoch 6: train loss 0.0203720742544855 val loss 0.014824989484623075 Epoch 7: train loss 0.019777210743491907 val loss 0.014900928363204002 saved best model epoch: 8 val loss is: 0.014570496045053005 Epoch 8: train loss 0.019429599148142768 val loss 0.014570496045053005 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.027693807613104582 Epoch 1: train loss 0.023926184748310642 val loss 0.027693807613104582 saved best model epoch: 2 val loss is: 0.026748798321932554 Epoch 2: train loss 0.019899560183466197 val loss 0.026748798321932554 Epoch 3: train loss 0.019001476332006686 val loss 0.028921309392899275 saved best model epoch: 4 val loss is: 0.02486493205651641 Epoch 4: train loss 0.01856664794845035 val loss 0.02486493205651641 Epoch 5: train loss 0.018266919798341143 val loss 0.028284423518925905 Epoch 6: train loss 0.01773496434451586 val loss 0.025459649972617626 Epoch 7: train loss 0.017064981050340527 val loss 0.02514745807275176 Epoch 8: train loss 0.017074198136667173 val loss 0.026253627613186836 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.022796295955777167 Epoch 1: train loss 0.02261271919789059 val loss 0.022796295955777167 saved best model epoch: 2 val loss is: 0.021967212110757826 Epoch 2: train loss 0.018470588073666607 val loss 0.021967212110757826 Epoch 3: train loss 0.017743334218504884 val loss 0.02328215353190899 saved best model epoch: 4 val loss is: 0.019774146005511285 Epoch 4: train loss 0.016960191595855923 val loss 0.019774146005511285 Epoch 5: train loss 0.01626850844227842 val loss 0.019993208721280097 Epoch 6: train loss 0.015780294535770303 val loss 0.02285667508840561 Epoch 7: train loss 0.01506876507552252 val loss 0.02358766607940197 saved best model epoch: 8 val loss is: 0.019446101784706116 Epoch 8: train loss 0.014813025304604144 val loss 0.019446101784706116 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.014570785872638225 Epoch 1: train loss 0.023787150691662515 val loss 0.014570785872638225 saved best model epoch: 2 val loss is: 0.013694306090474128 Epoch 2: train loss 0.021030711564457134 val loss 0.013694306090474128 Epoch 3: train loss 0.01997078371988166 val loss 0.01402215901762247 Epoch 4: train loss 0.01928362755903176 val loss 0.013727952912449836 Epoch 5: train loss 0.01871166302866879 val loss 0.01389888245612383 Epoch 6: train loss 0.018077501761061803 val loss 0.014110727608203888 Epoch 7: train loss 0.017437264256711518 val loss 0.014011814445257186 Epoch 8: train loss 0.01718548558918493 val loss 0.014346869476139546 Epoch 8: reducing learning rate of group 0 to 7.0000e-05. 1148 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 83 series train labels has: 83 series validiation X has: 4 series Validiation labels has: 4 series trainX shape is: torch.Size([83, 10, 507]) trainy shape is: torch.Size([83, 347, 1]) train features shape is: torch.Size([83, 347, 506]) validX shape is: torch.Size([4, 10, 507]) validy shape is: torch.Size([4, 347, 1]) valid features shape is: torch.Size([4, 347, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
saved best model epoch: 1 val loss is: 0.016903340816497803 Epoch 1: train loss 0.015456319057259214 val loss 0.016903340816497803 saved best model epoch: 2 val loss is: 0.016595954075455666 Epoch 2: train loss 0.012360795131738645 val loss 0.016595954075455666 Epoch 3: train loss 0.011992699556411749 val loss 0.017002688720822334 saved best model epoch: 4 val loss is: 0.016476061660796404 Epoch 4: train loss 0.011878999028669065 val loss 0.016476061660796404 Epoch 5: train loss 0.011669660660337252 val loss 0.01649800967425108 Epoch 6: train loss 0.011226874114160078 val loss 0.01774146594107151 Epoch 7: train loss 0.010646454922585603 val loss 0.018164494074881077 Epoch 8: train loss 0.009747146247291422 val loss 0.016854883637279272 1147 train size is: 440 validation size is: 361 train data shape is: (440, 507) validation data shape is: (361, 507) train X has: 84 series train labels has: 84 series validiation X has: 5 series Validiation labels has: 5 series trainX shape is: torch.Size([84, 10, 507]) trainy shape is: torch.Size([84, 346, 1]) train features shape is: torch.Size([84, 346, 506]) validX shape is: torch.Size([5, 10, 507]) validy shape is: torch.Size([5, 346, 1]) valid features shape is: torch.Size([5, 346, 506]) Seq2Seq( (encoder): Encoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) ) (decoder): Decoder( (rnn1): LSTM(507, 512, num_layers=3, batch_first=True, dropout=0.35) (output_layer): Linear(in_features=512, out_features=507, bias=True) ) )
df_princip = pd.DataFrame([])
for pid in progress_bar(list(test_dataset['pid'].unique())[100:200]):
df_aux = seq2seq_global(pid = pid, seq_length = 10, epoch = 5)
df_princip = pd.concat([df_princip, df_aux], axis = 0)
df_princip.to_csv('princip_200.csv', sep=',', index=False)
df_princip = pd.DataFrame([])
for pid in progress_bar(list(test_dataset['pid'].unique())[200:300]):
df_aux = seq2seq_global(pid = pid, seq_length = 10, epoch = 5)
df_princip = pd.concat([df_princip, df_aux], axis = 0)
df_princip.to_csv('princip_300.csv', sep=',', index=False)
df_princip = pd.DataFrame([])
for pid in progress_bar(list(test_dataset['pid'].unique())[300:400]):
df_aux = seq2seq_global(pid = pid, seq_length = 10, epoch = 5)
df_princip = pd.concat([df_princip, df_aux], axis = 0)
df_princip.to_csv('princip_400.csv', sep=',', index=False)
lstm_0_50 = pd.read_csv("/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/prediction/lstm_pid/0_50.csv", sep=",")
lstm_50_100 = pd.read_csv("/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/prediction/lstm_pid/50_100.csv", sep=",")
lstm_100_150 = pd.read_csv("/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/prediction/lstm_pid/100_150.csv", sep=",")
lstm_150_200 = pd.read_csv("/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/prediction/lstm_pid/150_200.csv", sep=",")
lstm_200_250 = pd.read_csv("/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/prediction/lstm_pid/200_250.csv", sep=",")
def prep_bf_stacking(lstm_prep):
lstm_prep = lstm_prep.set_index([lstm_prep['Unnamed: 0']])
lstm_prep = lstm_prep.drop('Unnamed: 0',axis = 1)
lstm_prep.index.name=""
return lstm_prep
lstm_0_50 = prep_bf_stacking(lstm_0_50)
lstm_50_100 = prep_bf_stacking(lstm_50_100)
lstm_100_150 = prep_bf_stacking(lstm_100_150)
lstm_150_200 = prep_bf_stacking(lstm_150_200)
lstm_200_250 = prep_bf_stacking(lstm_200_250)
lstm_total = pd.concat([lstm_0_50, lstm_50_100, lstm_100_150, lstm_150_200, lstm_200_250], axis=0)
lstm_total[target] = y_scaler.inverse_transform(lstm_total[target])
lstm_total = lstm_total[['pid', 'target']]
lstm_total['target'][60]
-1.6157601871487626
lstm_total
day | pid | abs_ret0 | abs_ret1 | abs_ret2 | abs_ret3 | abs_ret4 | abs_ret5 | abs_ret6 | abs_ret7 | abs_ret8 | abs_ret9 | abs_ret10 | abs_ret11 | abs_ret12 | abs_ret13 | abs_ret14 | abs_ret15 | abs_ret16 | abs_ret17 | abs_ret18 | abs_ret19 | abs_ret20 | abs_ret21 | abs_ret22 | abs_ret23 | abs_ret24 | abs_ret25 | abs_ret26 | abs_ret27 | abs_ret28 | abs_ret29 | abs_ret30 | abs_ret31 | abs_ret32 | abs_ret33 | abs_ret34 | abs_ret35 | abs_ret36 | abs_ret37 | ... | median_day_abs_ret36 | median_day_abs_ret37 | median_day_abs_ret38 | median_day_abs_ret39 | median_day_abs_ret40 | median_day_abs_ret41 | median_day_abs_ret42 | median_day_abs_ret43 | median_day_abs_ret44 | median_day_abs_ret45 | median_day_abs_ret46 | median_day_abs_ret47 | median_day_abs_ret48 | median_day_abs_ret49 | median_day_abs_ret50 | median_day_abs_ret51 | median_day_abs_ret52 | median_day_abs_ret53 | median_day_abs_ret54 | median_day_abs_ret55 | median_day_abs_ret56 | median_day_abs_ret57 | median_day_abs_ret58 | median_day_abs_ret59 | median_day_abs_ret60 | min_ret | max_ret | std_ret | median_ret | sum_ret | min_vol | max_vol | std_vol | median_vol | median_day_sum_ret | median_day_sum_ret_before | kmeans_cluster_median_day_sum_ret_before | abs_kmeans_cluster_median_day_sum_ret_before | median_target_dummy | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
60 | 805 | 214 | -0.998792 | -0.996975 | -0.994605 | -0.967703 | -0.982166 | -0.985294 | -0.996117 | -0.975179 | -0.987705 | -0.989708 | -0.994561 | -0.997324 | -0.985755 | -0.993519 | -0.990352 | -0.985058 | -0.975604 | -0.946162 | -0.987141 | -0.983203 | -0.997235 | -0.944920 | -0.975361 | -0.966539 | -0.961316 | -0.979002 | -0.993838 | -0.994253 | -0.989112 | -0.970188 | -0.994048 | -0.955722 | -0.997139 | -1.000000 | -0.982392 | -0.981662 | -1.000000 | -1.000000 | ... | -0.918722 | -0.916961 | -0.841969 | -0.775322 | -0.891495 | -0.898596 | -0.948592 | -0.869917 | -0.888285 | -0.893857 | -0.957270 | -0.897335 | -0.909165 | -0.911040 | -0.913363 | -0.880827 | -0.869772 | -0.921078 | -0.976912 | -0.900332 | -0.951813 | -0.921976 | -0.872759 | -0.960537 | -0.922098 | -1.0 | -0.996068 | -0.994955 | -0.929561 | -0.969322 | -0.916057 | -0.837442 | -0.931418 | -0.902111 | -0.827446 | -0.609281 | 1.0 | -1.0 | 0 | -1.615760 |
960 | 806 | 214 | -0.999396 | -0.998791 | -0.994623 | -0.975238 | -0.989876 | -0.976661 | -0.988469 | -0.971904 | -0.969477 | -0.993173 | -1.000000 | -0.993346 | -0.981117 | -0.990335 | -0.985612 | -1.000000 | -0.990304 | -1.000000 | -0.961704 | -1.000000 | -0.994495 | -1.000000 | -0.980398 | -0.993334 | -0.987179 | -0.989568 | -1.000000 | -1.000000 | -0.989176 | -0.970358 | -0.964492 | -0.980442 | -0.994307 | -0.958707 | -0.982511 | -0.996356 | -1.000000 | -0.996933 | ... | -0.935412 | -0.959510 | -0.886201 | -0.912009 | -0.919540 | -0.914398 | -0.560111 | -0.871354 | -0.925036 | -0.914469 | -0.909200 | -0.919728 | -0.920893 | -0.955698 | -0.913621 | -0.937541 | -0.904180 | -0.951593 | -0.967610 | -0.928309 | -0.918630 | -0.931308 | -0.939301 | -0.921071 | -0.926814 | -1.0 | -0.996688 | -0.995560 | -0.943922 | -0.975497 | -0.766372 | -0.789268 | -0.903389 | -0.926190 | -0.871451 | -0.456159 | 1.0 | -1.0 | 0 | -1.751287 |
1860 | 807 | 214 | -0.998491 | -0.998188 | -0.974930 | -0.965436 | -0.992405 | -0.994162 | -0.992296 | -0.975391 | -0.963428 | -1.000000 | -0.989187 | -0.998671 | -0.990561 | -0.996777 | -1.000000 | -0.985136 | -1.000000 | -0.976179 | -0.993612 | -0.966602 | -0.997244 | -0.969505 | -0.985277 | -0.973327 | -0.993593 | -0.979149 | -1.000000 | -0.985723 | -0.994588 | -0.975313 | -1.000000 | -0.985309 | -1.000000 | -0.988727 | -0.994157 | -0.985387 | -0.987831 | -1.000000 | ... | -0.904738 | -0.926459 | -0.855547 | -0.865862 | -0.746405 | -0.900262 | -0.941300 | -0.913384 | -0.341424 | -0.767287 | -0.853354 | -0.760114 | -0.845250 | -0.942385 | -0.904648 | -0.928085 | -0.887867 | -0.835214 | -0.950566 | -0.878652 | -0.880590 | -0.614363 | -0.674310 | -0.897351 | -0.818738 | -1.0 | -0.995782 | -0.994614 | -0.915877 | -0.971216 | -0.812191 | -0.761734 | -0.910984 | -0.914299 | -0.872113 | -0.361213 | -1.0 | -1.0 | 0 | -1.937945 |
2760 | 808 | 214 | -0.996371 | -0.999395 | -0.978450 | -0.970281 | -0.982292 | -0.988317 | -0.953737 | -0.989417 | -0.993872 | -0.986314 | -0.994569 | -0.997328 | -0.914630 | -0.987080 | -0.927790 | -0.977594 | -0.985365 | -0.976075 | -0.922936 | -0.987440 | -0.994476 | -0.951106 | -0.960638 | -1.000000 | -0.942036 | -0.994755 | -1.000000 | -0.985639 | -0.991832 | -0.995035 | -0.964278 | -0.975409 | -1.000000 | -0.996220 | -0.982361 | -1.000000 | -0.995924 | -1.000000 | ... | -0.846363 | -0.894526 | -0.854136 | -0.859500 | -0.673064 | -0.887111 | -0.714475 | -0.759479 | -0.717495 | -0.765939 | -0.860156 | -0.890289 | -0.865347 | -0.819580 | -0.872172 | -0.859830 | -0.863936 | -0.879732 | -0.905963 | -0.946556 | -0.864930 | -0.577143 | -0.845183 | -0.833443 | -0.849790 | -1.0 | -0.994554 | -0.993190 | -0.915140 | -0.960635 | -0.439156 | -0.860797 | -0.936142 | -0.914892 | -0.661516 | 0.127179 | -0.5 | 1.0 | 0 | -2.067138 |
3659 | 809 | 214 | -0.997556 | -0.996324 | -0.994536 | -0.989934 | -0.979377 | -0.952498 | -0.980380 | -0.971361 | -0.972002 | -1.000000 | -0.977932 | -1.000000 | -0.985550 | -0.993425 | -0.980403 | -1.000000 | -0.980176 | -0.987844 | -1.000000 | -0.991473 | -0.994376 | -0.950179 | -0.979994 | -0.979623 | -0.980420 | -0.989383 | -0.981293 | -0.994183 | -0.997247 | -0.994978 | -0.951848 | -0.990039 | -0.991298 | -0.984696 | -0.994046 | -0.985108 | -0.991745 | -0.993739 | ... | -0.867292 | -0.906663 | -0.676915 | -0.690599 | -0.907964 | -0.862460 | -0.739261 | -0.893753 | -0.887520 | -0.876114 | -0.873154 | -0.880359 | -0.895330 | -0.669400 | -0.865957 | -0.886505 | -0.919783 | -0.773840 | -0.968083 | -0.911080 | -0.931815 | -0.951224 | -0.933929 | -0.902397 | -0.904556 | -1.0 | -0.995101 | -0.994709 | -0.942801 | -0.970560 | -0.645645 | -0.459052 | -0.815742 | -0.930379 | -0.749276 | -0.521098 | 1.0 | -1.0 | 0 | -2.118526 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
308303 | 1148 | 816 | -0.988902 | -0.996943 | -0.898575 | -0.744733 | -0.869373 | -1.000000 | -0.980317 | -0.826938 | -0.901061 | -0.907459 | -0.963142 | -0.997740 | -0.935811 | -0.920509 | -0.828483 | -1.000000 | -0.941814 | -0.959284 | -0.934377 | -0.957050 | -0.981149 | -0.927059 | -0.974819 | -0.960097 | -0.967040 | -0.919622 | -0.884404 | -0.941048 | -0.962668 | -0.957346 | -0.969275 | -0.923740 | -0.990106 | -0.986962 | -0.979730 | -0.949286 | -0.971920 | -0.989368 | ... | -0.462782 | -0.434240 | -0.254473 | 0.134350 | 0.023775 | 0.403296 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.930054 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | -0.245930 | 1.000000 | 1.000000 | 0.719844 | 1.000000 | 0.176973 | 1.000000 | -1.0 | -0.968925 | -0.971523 | -0.831285 | -0.882056 | -0.998104 | -0.785168 | -0.734404 | -0.775849 | 1.000000 | 0.275295 | -0.5 | 1.0 | 0 | -1.715138 |
309203 | 1149 | 816 | -0.974261 | -0.983288 | -0.936148 | -1.000000 | -0.905785 | -0.918508 | -0.912293 | -0.929086 | -0.970475 | -0.945829 | -0.895029 | -0.969583 | -0.995835 | -0.957328 | -0.974606 | -0.816049 | -0.923142 | -0.747968 | -0.920594 | -0.977802 | -0.995125 | -0.989207 | -0.956597 | -0.988216 | -0.926387 | -0.917029 | -0.848602 | -0.924642 | -0.971575 | -0.948062 | -0.875618 | -0.845273 | -0.995023 | -0.888747 | -0.959131 | -0.955346 | -0.943267 | -0.973121 | ... | -0.344313 | -0.335965 | -0.508588 | 0.007597 | 0.129695 | -0.707337 | -0.149776 | -0.636016 | -0.182767 | -0.346883 | -0.518401 | -0.569716 | -0.682240 | -0.448331 | -0.638369 | -0.069430 | 0.277347 | -0.572872 | -0.863817 | -0.741330 | -0.701464 | -0.234399 | -0.659514 | -0.791788 | -0.046810 | -1.0 | -0.974261 | -0.975985 | -0.653955 | -0.850259 | -0.977982 | -0.794262 | -0.904196 | -0.927531 | 0.182952 | -0.934862 | 1.0 | 1.0 | 0 | -1.706263 |
310103 | 1150 | 816 | -1.000000 | -0.962225 | -0.877061 | -0.970017 | -0.912505 | -0.949280 | -0.963168 | -0.852739 | -0.962545 | -0.946479 | -0.773129 | -0.992982 | -0.962669 | -0.988676 | -0.974712 | -1.000000 | -0.923243 | -0.916298 | -0.820077 | -0.963102 | -0.922042 | -0.860397 | -0.905341 | -0.903812 | -0.944164 | -1.000000 | -0.872156 | -0.980067 | -1.000000 | -0.995691 | -0.943163 | -0.923081 | -0.970135 | -0.980300 | -0.867209 | -0.853127 | -0.950097 | -0.959487 | ... | -0.640014 | -0.487649 | -0.158274 | -0.099397 | -0.006392 | -0.759825 | -0.643929 | -0.479827 | -0.282993 | -0.552543 | -0.719045 | -0.095346 | -0.224267 | -0.578415 | -0.778971 | 0.522835 | -0.303654 | -0.338124 | -0.900506 | -0.194498 | -0.693373 | -0.540350 | -0.573285 | -0.612770 | -0.428338 | -1.0 | -0.962284 | -0.968009 | -0.618345 | -0.848266 | -0.997417 | -0.860894 | -0.914206 | -0.926418 | 0.286611 | -0.254908 | -1.0 | -1.0 | 0 | -1.699045 |
311003 | 1151 | 816 | -1.000000 | -0.989725 | -0.966860 | -0.937718 | -1.000000 | -0.980414 | -0.954833 | -0.881935 | -0.923530 | -0.994284 | -0.936455 | -0.995528 | -0.976178 | -0.891467 | -0.951403 | -0.849723 | -0.991842 | -0.889888 | -0.903029 | -0.936487 | -1.000000 | -0.958897 | -0.933933 | -0.971957 | -0.989193 | -0.929688 | -0.866102 | -0.918066 | -0.977082 | -0.974945 | -0.849686 | -0.966996 | -0.995200 | -0.980992 | -0.950732 | -0.975391 | -0.986336 | -0.989634 | ... | -0.747407 | -0.569196 | -0.692772 | 0.004465 | -0.587183 | 0.011805 | -0.685823 | -0.427228 | -0.658543 | -0.352214 | -0.075219 | -0.718618 | -0.550562 | -0.655667 | -0.772794 | 0.079369 | -0.670612 | -0.761274 | -0.711509 | -0.157695 | -0.358010 | -0.704198 | -0.642796 | -0.755170 | -0.411631 | -1.0 | -0.989741 | -0.984299 | -0.762756 | -0.898068 | -0.998067 | -0.869935 | -0.921197 | -0.927111 | -0.026124 | -0.659225 | 1.0 | -1.0 | 0 | -1.693075 |
311612 | 1128 | 816 | -1.000000 | -1.000000 | -0.967490 | -0.910143 | -0.862053 | -0.885492 | -0.907785 | -1.000000 | -0.922057 | -0.984578 | -0.930665 | -0.971915 | -0.992883 | -1.000000 | -0.906380 | -0.932928 | -0.920089 | -0.964439 | -0.990466 | -0.931466 | -0.995902 | -0.900181 | -0.912654 | -0.985099 | -0.990438 | -0.984446 | -0.972597 | -0.987197 | -1.000000 | -0.882033 | -0.982252 | -0.955964 | -0.987162 | -0.943578 | -0.982466 | -0.972597 | -0.981747 | -0.997695 | ... | -0.701248 | -0.594348 | -0.628593 | 0.012691 | 0.334055 | -0.294596 | -0.004128 | 0.002900 | 0.006637 | 0.000066 | -0.034039 | 0.003856 | 0.005251 | -0.006055 | -0.002882 | -0.007120 | 0.001369 | -0.011290 | -0.629587 | -0.007951 | -0.001893 | -0.135828 | -0.001658 | -0.408540 | 0.009783 | -1.0 | -0.983592 | -0.982456 | -0.830703 | -0.913254 | -0.997266 | -0.811982 | -0.808236 | -0.807458 | 0.081386 | 0.194020 | -0.5 | 1.0 | 0 | -1.684866 |
86549 rows × 508 columns
pred_lstm = pd.DataFrame(lstm_total['target'])
pred_lstm['target'][60]
0.030344121
pred_edg = pd.read_csv('/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/prediction/predictions_ed_8.csv',sep=',')
for index in pred_edg.index:
if index in lstm_total.index:
pred_edg['target'][index] = lstm_total['target'][index]
pred_edg
pred_edg.to_csv('/content/drive/MyDrive/U4_Prediction_stock_auction_volumes/prediction/predictions_stacking.csv', sep=',', index=False)