As COVID-19 spreads around the world, it is clear that the use of the web and online services is accelerating, confirming the importance of this new technology in our modern world.
In a March letter on emergency preparedness in the context of Covid, the European Central Bank warns that the number of cyberthreats has increased dramatically and stresses that the time has come to “assessing risks of increased cyber security related fraud, aimed both to customers or to the institution via phishing mails, etc.”
One of the most widely recognized online security dangers is Phishing attack. The purpose of this fraud is to imitate a real website, for example, internet banking, e-eCommerce, or social networking so as to acquire confidential data such as user-names, passwords, financial and health-related information from potential victims.
What is Phishing ?
Phishing sites are crafted to lure users into thinking they are on a legitimate website. The goal of a website Phishing is thus to appear as credible as possible so that it is indistinguishable from legitimate websites.
The coarser website phishing will have a distinctive visual sign of the legitimate site as the example below of an Amazon login page. The most successful ones are only recognizable by other characteristics of their own web page, such as the url address, which will not correspond to the server of the legitimate site.
An Amazon Pishing website (coarser) ?
It is in this context that we will use the database built by professors Mohammad Rami, McCluskey T.L. of University of Huddersfield and Thabtah Fadi of the Canadian University of Duba and published on the famous UCI Machine Learning Repository.
The database is a collection of website URLs for 11055 websites.
Each sample has 30 website parameters (features) that have proved to be sound and effective in predicting phishing websites and a Result label(target) identifying it as a phishing website or not (respectively -1 or 1).
Our problem is thus a supervised binary classification problem. We will divide our dataset into a train and a test samples so as to train models on the dataset and find which models will give us simply the best accuracy score in detecting if a website is a phishing one or not.
The categories features in our database are divided into 4 main groups.
For each feature described below, the variable construction value followed a rule if else and took either value 1,-1,or 0. 0 is when a feature is considered SUSPICIOUS that means it can be either phishy or legitimate.
Let us describe the features of the first categorie used by following the feature descriptions provided by the authors of the dataset.
Address Bar based Features
We hope, that thanks to this part, the reader will better understand the database we use here.
In order not to add too much information in this notebook, since a comprehension of the variables requires a particular technical explanation, we redirect the reader to the complete descrption provided by the authors.
In order to address the problem described in point 1.4,, I have implemented 3 main types classification algorithms(Trees Classifier, Logistic Regression, Neural Networks). First, we will process the data and do some EDA. Then we will create models and tun our hyperparamter and then we will assess our models and compare it in order to find the best models. From all the models developed , Boosted Tree model has highest accuracy and is followed by Random Forest Classifier and Logistic Regression. So, according to our project, boosted tree would best predict if a website is a phishing website or not.
Have a good reading !
library(RWeka)
library(BCA)
library(car)
library(xgboost)
library(ggplot2)
library(randomForest)
library(DataExplorer)
library(caret)
library(tree)
library(extraTrees)
library(xgboost)
library(h2o)
library(nnet)
library(corrplot)
library(Hmisc)
library(rpart)
library(plyr)
library(DT)
dataset <-read.arff(url("https://archive.ics.uci.edu/ml/machine-learning-databases/00327/Training%20Dataset.arff"))
head(dataset)
## having_IP_Address URL_Length Shortining_Service having_At_Symbol
## 1 -1 1 1 1
## 2 1 1 1 1
## 3 1 0 1 1
## 4 1 0 1 1
## 5 1 0 -1 1
## 6 -1 0 -1 1
## double_slash_redirecting Prefix_Suffix having_Sub_Domain SSLfinal_State
## 1 -1 -1 -1 -1
## 2 1 -1 0 1
## 3 1 -1 -1 -1
## 4 1 -1 -1 -1
## 5 1 -1 1 1
## 6 -1 -1 1 1
## Domain_registeration_length Favicon port HTTPS_token Request_URL
## 1 -1 1 1 -1 1
## 2 -1 1 1 -1 1
## 3 -1 1 1 -1 1
## 4 1 1 1 -1 -1
## 5 -1 1 1 1 1
## 6 -1 1 1 -1 1
## URL_of_Anchor Links_in_tags SFH Submitting_to_email Abnormal_URL Redirect
## 1 -1 1 -1 -1 -1 0
## 2 0 -1 -1 1 1 0
## 3 0 -1 -1 -1 -1 0
## 4 0 0 -1 1 1 0
## 5 0 0 -1 1 1 0
## 6 0 0 -1 -1 -1 0
## on_mouseover RightClick popUpWidnow Iframe age_of_domain DNSRecord
## 1 1 1 1 1 -1 -1
## 2 1 1 1 1 -1 -1
## 3 1 1 1 1 1 -1
## 4 1 1 1 1 -1 -1
## 5 -1 1 -1 1 -1 -1
## 6 1 1 1 1 1 1
## web_traffic Page_Rank Google_Index Links_pointing_to_page Statistical_report
## 1 -1 -1 1 1 -1
## 2 0 -1 1 1 1
## 3 1 -1 1 0 -1
## 4 1 -1 1 -1 1
## 5 0 -1 1 1 1
## 6 1 -1 1 -1 -1
## Result
## 1 -1
## 2 -1
## 3 -1
## 4 -1
## 5 1
## 6 1
str(dataset)
## 'data.frame': 11055 obs. of 31 variables:
## $ having_IP_Address : Factor w/ 2 levels "-1","1": 1 2 2 2 2 1 2 2 2 2 ...
## $ URL_Length : Factor w/ 3 levels "1","0","-1": 1 1 2 2 2 2 2 2 2 1 ...
## $ Shortining_Service : Factor w/ 2 levels "1","-1": 1 1 1 1 2 2 2 1 2 2 ...
## $ having_At_Symbol : Factor w/ 2 levels "1","-1": 1 1 1 1 1 1 1 1 1 1 ...
## $ double_slash_redirecting : Factor w/ 2 levels "-1","1": 1 2 2 2 2 1 2 2 2 2 ...
## $ Prefix_Suffix : Factor w/ 2 levels "-1","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ having_Sub_Domain : Factor w/ 3 levels "-1","0","1": 1 2 1 1 3 3 1 1 3 1 ...
## $ SSLfinal_State : Factor w/ 3 levels "-1","1","0": 1 2 1 1 2 2 1 1 2 2 ...
## $ Domain_registeration_length: Factor w/ 2 levels "-1","1": 1 1 1 2 1 1 2 2 1 1 ...
## $ Favicon : Factor w/ 2 levels "1","-1": 1 1 1 1 1 1 1 1 1 1 ...
## $ port : Factor w/ 2 levels "1","-1": 1 1 1 1 1 1 1 1 1 1 ...
## $ HTTPS_token : Factor w/ 2 levels "-1","1": 1 1 1 1 2 1 2 1 1 2 ...
## $ Request_URL : Factor w/ 2 levels "1","-1": 1 1 1 2 1 1 2 2 1 1 ...
## $ URL_of_Anchor : Factor w/ 3 levels "-1","0","1": 1 2 2 2 2 2 1 2 2 2 ...
## $ Links_in_tags : Factor w/ 3 levels "1","-1","0": 1 2 2 3 3 3 3 2 1 1 ...
## $ SFH : Factor w/ 3 levels "-1","1","0": 1 1 1 1 1 1 1 1 1 1 ...
## $ Submitting_to_email : Factor w/ 2 levels "-1","1": 1 2 1 2 2 1 1 2 2 2 ...
## $ Abnormal_URL : Factor w/ 2 levels "-1","1": 1 2 1 2 2 1 1 2 2 2 ...
## $ Redirect : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
## $ on_mouseover : Factor w/ 2 levels "1","-1": 1 1 1 1 2 1 1 1 1 1 ...
## $ RightClick : Factor w/ 2 levels "1","-1": 1 1 1 1 1 1 1 1 1 1 ...
## $ popUpWidnow : Factor w/ 2 levels "1","-1": 1 1 1 1 2 1 1 1 1 1 ...
## $ Iframe : Factor w/ 2 levels "1","-1": 1 1 1 1 1 1 1 1 1 1 ...
## $ age_of_domain : Factor w/ 2 levels "-1","1": 1 1 2 1 1 2 2 1 2 2 ...
## $ DNSRecord : Factor w/ 2 levels "-1","1": 1 1 1 1 1 2 1 1 1 1 ...
## $ web_traffic : Factor w/ 3 levels "-1","0","1": 1 2 3 3 2 3 1 2 3 2 ...
## $ Page_Rank : Factor w/ 2 levels "-1","1": 1 1 1 1 1 1 1 1 2 1 ...
## $ Google_Index : Factor w/ 2 levels "1","-1": 1 1 1 1 1 1 1 1 1 1 ...
## $ Links_pointing_to_page : Factor w/ 3 levels "1","0","-1": 1 1 2 3 1 3 2 2 2 2 ...
## $ Statistical_report : Factor w/ 2 levels "-1","1": 1 2 1 2 2 1 1 2 2 2 ...
## $ Result : Factor w/ 2 levels "-1","1": 1 1 1 1 2 2 1 1 2 1 ...
datatable(dataset, filter = 'top',options = list())
plot_intro(dataset)
Like we introduced in 1.5, all our variables are categorical variable and discrete variable. Our dataset has 31 columns and 11,055 lines.
For sake of simplicity, we rename our columns by removing the spaces in their names and homogenizing them.
colnames(dataset)
## [1] "having_IP_Address" "URL_Length"
## [3] "Shortining_Service" "having_At_Symbol"
## [5] "double_slash_redirecting" "Prefix_Suffix"
## [7] "having_Sub_Domain" "SSLfinal_State"
## [9] "Domain_registeration_length" "Favicon"
## [11] "port" "HTTPS_token"
## [13] "Request_URL" "URL_of_Anchor"
## [15] "Links_in_tags" "SFH"
## [17] "Submitting_to_email" "Abnormal_URL"
## [19] "Redirect" "on_mouseover"
## [21] "RightClick" "popUpWidnow"
## [23] "Iframe" "age_of_domain"
## [25] "DNSRecord" "web_traffic"
## [27] "Page_Rank" "Google_Index"
## [29] "Links_pointing_to_page" "Statistical_report"
## [31] "Result"
cols<-c("HavingIP","LongURL","ShortURL","Symbol","ddRedirecting","PrefixSuffix","SubDomain","HTTPS","DomainRegLen","Favicon","Port","HTTPsToken","RequestURL","AnchorURL", "LinksInTag","SFH","SubEmail","AbnormalURL","Redirect","OnMouseover","RightClick","PopUp","Iframe","AgeOfDomain","DNSRecord","WebTraffic","PageRank","GoogleIndex","LinkToPage","StatsReport","Class")
names(dataset)<-cols
colnames(dataset)
## [1] "HavingIP" "LongURL" "ShortURL" "Symbol"
## [5] "ddRedirecting" "PrefixSuffix" "SubDomain" "HTTPS"
## [9] "DomainRegLen" "Favicon" "Port" "HTTPsToken"
## [13] "RequestURL" "AnchorURL" "LinksInTag" "SFH"
## [17] "SubEmail" "AbnormalURL" "Redirect" "OnMouseover"
## [21] "RightClick" "PopUp" "Iframe" "AgeOfDomain"
## [25] "DNSRecord" "WebTraffic" "PageRank" "GoogleIndex"
## [29] "LinkToPage" "StatsReport" "Class"
Let us first check if there is any missing value in our dataset.
introduce(dataset)
## rows columns discrete_columns continuous_columns all_missing_columns
## 1 11055 31 31 0 0
## total_missing_values complete_rows total_observations memory_usage
## 1 0 11055 342705 1394416
There is no Missing Value.
table(dataset$Class)
##
## -1 1
## 4898 6157
prop.table(table(dataset$Class))
##
## -1 1
## 0.4430574 0.5569426
The balance of the classes is not bad. We will not have to implement method to deal with imbalanced dataset like SMOTE.
We will see in our third part and with our first trees the importance of variables in our prediction models. We can firstly notice here that this dataset has been build intentionally with features which collectively contribute to deciding if a website is phishing or not. Thus, feature selection will probably not be mandatory.
corr<-rcorr(as.matrix(dataset))
dataset_coeff = corr$r
corrplot(dataset_coeff, method="square",type="upper", order="hclust", tl.col="black", tl.srt=45)
sort(dataset_coeff[,31],decreasing= TRUE )
## Class HTTPS AnchorURL PrefixSuffix WebTraffic
## 1.000000e+00 7.147412e-01 6.929345e-01 3.486056e-01 3.461031e-01
## SubDomain RequestURL LinksInTag SFH GoogleIndex
## 2.983233e-01 2.533723e-01 2.482285e-01 2.214190e-01 1.289505e-01
## AgeOfDomain PageRank HavingIP StatsReport DNSRecord
## 1.214964e-01 1.046449e-01 9.416009e-02 7.985672e-02 7.571775e-02
## LongURL Symbol OnMouseover Port LinkToPage
## 5.742963e-02 5.294779e-02 4.183844e-02 3.641885e-02 3.257390e-02
## SubEmail RightClick PopUp Favicon Iframe
## 1.824901e-02 1.265323e-02 8.588679e-05 -2.795247e-04 -3.393524e-03
## Redirect ddRedirecting HTTPsToken AbnormalURL ShortURL
## -2.011346e-02 -3.860761e-02 -3.985390e-02 -6.048764e-02 -6.796589e-02
## DomainRegLen
## -2.257895e-01
We use the first graph and the attached table to identify the variables most correlated with the target.
Although we can notice that some features are highly correlated with each other (>0.5), we choose to keep them for more precision in our model.
We observe that the variables HTTPS and AnchorULR are most the correlated to the target.
Let us plot distribution of Class for the most correlated features to the the target (HTTPS, AnchorURL,PrefixSuffix).
qplot(HTTPS, data=dataset, geom="bar", fill=Class) +
theme(legend.position = "top") +
theme(axis.text.x=element_text(angle = -20, hjust = 0))
We can see from this graph that the distribution of classes follows a fairly good logic : Phishing Website fall mainly into suspicious website according to Using https and Issuer characteristic. However, we can notice that a small proportion of e phishing website are legitimate according to this feature (i.e. there is fortunately work for our models ! ). Finally, we can describe that all suspicious HTTPS (0) were Phishing Website (class -1)
Let us analyse what happens for our second most correlated features to target AnchorURL.
qplot(AnchorURL, data=dataset, geom="bar", fill=Class) +
theme(legend.position = "top") +
theme(axis.text.x=element_text(angle = -20, hjust = 0))
The distribution of Class follows mainly the same rules as before. We can notice that the sites considered as phishing according to the feature are all phishing. A small number of legitimate sites are misclassified with this variable. The vast majority of sites that are considered suspicious according to the feature are legitimate sites (unlike the feature HTTPS)
qplot(PrefixSuffix, data=dataset, geom="bar", fill=Class) +
theme(legend.position = "top")
qplot(WebTraffic, data=dataset, geom="bar", fill=Class) +
theme(legend.position = "top")
What happens for feature the least correlated to target variable ? Let us plot bar plot for DomainRegLen feature.
qplot(DomainRegLen, data=dataset, geom="bar", fill=Class) +
theme(legend.position = "top")
As we could have expected, we can here notice that there is a majority of missclaffication according to this feature : -1 Class are classed 1 in the feature and reciprocally.
Now that we have described the relation between feature and target variable, let us plot conclude this EDA part by plotting all variables distribution into bar plot in order to see globally the behavior of our variables.
plot_bar(dataset)
This ends our EDA part. We can move to our 4rd part : building models.
Now that we have passed our first 3 parts, let’s build our databases to build our models.
In case of parameterized models, negative label values can make models uneasy. To resolve this problem I converted the -1 values to 0.
dataset <- within(dataset, {
Class <- Recode(Class, '-1=0', as.factor=TRUE)
})
dataset$Sample <- create.samples(dataset, est = 0.70, val = 0.30, rand.seed = 1)
trainingset<-dataset[dataset$Sample=="Estimation",]
testset<-dataset[dataset$Sample=="Validation",]
trainingset<-trainingset[,-32]
testset<-testset[,-32]
Let us build our first classification tree using tree package.
tree1.train <- tree(Class~.,data=trainingset)
summary(tree1.train)
##
## Classification tree:
## tree(formula = Class ~ ., data = trainingset)
## Variables actually used in tree construction:
## [1] "HTTPS" "AnchorURL" "LinksInTag" "WebTraffic"
## Number of terminal nodes: 8
## Residual mean deviance: 0.3906 = 3019 / 7730
## Misclassification error rate: 0.09085 = 703 / 7738
Our first tree reveals the importance of 4 variables in the prediction of our target. HTPS, AnchorURL and WebTraffic were, as we saw previously, the 3 variables the most correlated to the our target class. LinksInTag is also important in the prediction here.
Let us plot our first tree …
plot(tree1.train)
text(tree1.train,pretty = 0)
… and apply our model into test dataset.
tree1.predict<-predict(tree1.train, newdata=testset[,-31], type="class")
Now that we have applied our model, we will plot confusion Matrix and simply use Accuracy score to assess model performance.
c1<-confusionMatrix(factor(tree1.predict),factor(testset$Class))
c1
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1188 44
## 1 292 1792
##
## Accuracy : 0.8987
## 95% CI : (0.8879, 0.9087)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.7916
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.8027
## Specificity : 0.9760
## Pos Pred Value : 0.9643
## Neg Pred Value : 0.8599
## Prevalence : 0.4463
## Detection Rate : 0.3583
## Detection Prevalence : 0.3715
## Balanced Accuracy : 0.8894
##
## 'Positive' Class : 0
##
Let us use cross-validation to prune the tree optimally. We run a K-fold cross-validation using cv.tree experiment to find the deviance or the number of misclassifications as a function of the cost-complexity parameter k.
tree1.val <- tree(Class~.,data=trainingset)
cv.val1.tree = cv.tree(tree1.val, FUN = prune.tree,K=10)
plot(cv.val1.tree)
We can observe that we have a big drop between 1 and 2 of the deviance. We will pick size 4.
tree1_optimal = prune.tree(tree1.train, best=4)
summary(tree1_optimal)
##
## Classification tree:
## snip.tree(tree = tree1.train, nodes = c(5L, 7L))
## Variables actually used in tree construction:
## [1] "HTTPS" "AnchorURL"
## Number of terminal nodes: 4
## Residual mean deviance: 0.5 = 3867 / 7734
## Misclassification error rate: 0.0924 = 715 / 7738
summary(tree1.train)
##
## Classification tree:
## tree(formula = Class ~ ., data = trainingset)
## Variables actually used in tree construction:
## [1] "HTTPS" "AnchorURL" "LinksInTag" "WebTraffic"
## Number of terminal nodes: 8
## Residual mean deviance: 0.3906 = 3019 / 7730
## Misclassification error rate: 0.09085 = 703 / 7738
plot(tree1_optimal)
text(tree1_optimal, pretty=0)
tree1.predict_optimal<-predict(tree1_optimal, newdata=testset[,-31], type="class")
c1<-confusionMatrix(factor(tree1.predict_optimal),factor(testset$Class))
c1
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1323 164
## 1 157 1672
##
## Accuracy : 0.9032
## 95% CI : (0.8926, 0.9131)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8042
##
## Mcnemar's Test P-Value : 0.7377
##
## Sensitivity : 0.8939
## Specificity : 0.9107
## Pos Pred Value : 0.8897
## Neg Pred Value : 0.9142
## Prevalence : 0.4463
## Detection Rate : 0.3990
## Detection Prevalence : 0.4484
## Balanced Accuracy : 0.9023
##
## 'Positive' Class : 0
##
We can observe an increase in our accuracy thank to this pruning !
Let us build a second tree and compare its accuracy score.
Let us use CART algorithm with rpart package. The CART splits variable for classification by minimizing a homogeneity measure (Gini index or entrepoy). Here, we will use Gini index and will start with a complexity parameter (cp) to 0.
tree2.train = rpart(Class~., data=trainingset,cp=0)
plot(tree2.train)
text(tree2.train,pretty=0)
This first graph is quite unreadable.
This can be explained by the fact that rpart integrates all the variables in its model, unlike the previous tree.
tree2.train$cptable
## CP nsplit rel error xerror xstd
## 1 7.507314e-01 0 1.00000000 1.0000000 0.012780313
## 2 4.008192e-02 1 0.24926858 0.2492686 0.008055952
## 3 1.053248e-02 2 0.20918666 0.2091867 0.007452945
## 4 4.681100e-03 5 0.17758923 0.1854886 0.007058456
## 5 3.218256e-03 8 0.16325336 0.1764190 0.006898731
## 6 2.633119e-03 10 0.15681685 0.1632534 0.006657220
## 7 2.340550e-03 16 0.13633704 0.1524283 0.006449273
## 8 2.047981e-03 17 0.13399649 0.1427736 0.006255939
## 9 1.755413e-03 18 0.13194851 0.1418958 0.006237970
## 10 1.170275e-03 19 0.13019310 0.1375073 0.006147096
## 11 8.777063e-04 22 0.12668227 0.1354593 0.006104085
## 12 6.582797e-04 24 0.12492686 0.1325336 0.006041955
## 13 5.851375e-04 29 0.12053833 0.1296080 0.005978994
## 14 4.388531e-04 45 0.10883558 0.1272674 0.005928009
## 15 3.900917e-04 57 0.10181393 0.1237566 0.005850472
## 16 3.510825e-04 63 0.09947338 0.1240492 0.005856983
## 17 2.925688e-04 68 0.09771796 0.1225863 0.005824337
## 18 1.755413e-04 72 0.09654769 0.1234640 0.005843952
## 19 9.752292e-05 77 0.09566998 0.1228789 0.005830884
## 20 5.851375e-05 83 0.09508484 0.1237566 0.005850472
## 21 0.000000e+00 88 0.09479228 0.1260971 0.005902307
Here we can see complexity parameter which goes down with the number of splits n. We will look relative error : first value here is 1. Every next value is compared after this relative error.
Xerror : cross validation error of multiple train test.
xstd : standard variation.
Our task here is to pick the lowest cp with the lowest relative error.
plotcp(tree2.train)
A good choice of cp for pruning is often the leftmost value for which the mean lies below the horizontal line.
cp_tree2.train = tree2.train$cptable[which(tree2.train$cptable[,"xerror"]==min(tree2.train$cptable[,"xerror"])),"CP"]
cp_tree2.train
## [1] 0.0002925688
tree2=rpart(Class~., data=trainingset)
tree2_optimal = prune(tree2, cp=cp_tree2.train)
tree2.predict<-predict(tree2_optimal, newdata=testset[,-31], type="class")
c2<-confusionMatrix(factor(tree2.predict),factor(testset$Class))
c2
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1246 68
## 1 234 1768
##
## Accuracy : 0.9089
## 95% CI : (0.8986, 0.9185)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8137
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.8419
## Specificity : 0.9630
## Pos Pred Value : 0.9482
## Neg Pred Value : 0.8831
## Prevalence : 0.4463
## Detection Rate : 0.3758
## Detection Prevalence : 0.3963
## Balanced Accuracy : 0.9024
##
## 'Positive' Class : 0
##
From our second matrix confusion, we can observe a very slight improvement in our accuracy score.
Random Forest reduces the variance of forecasts in a decision tree alone, thus improving performance. It does this by combining n decision trees in a bagging approach. We don’t prune the tree.
Each tree in the random forest is trained on a random subset of data. The predictions are then averaged.
tree3.train = randomForest(Class~., data=trainingset, ntree=1000, do.trace=T)
## ntree OOB 1 2
## 1: 7.15% 7.61% 6.78%
## 2: 6.63% 6.89% 6.42%
## 3: 6.50% 6.63% 6.40%
## 4: 5.80% 6.01% 5.62%
## 5: 5.94% 6.06% 5.85%
## 6: 5.46% 5.73% 5.24%
## 7: 5.40% 5.96% 4.96%
## 8: 5.27% 5.85% 4.81%
## 9: 4.98% 5.55% 4.53%
## 10: 4.87% 5.49% 4.38%
## 11: 4.44% 5.15% 3.88%
## 12: 4.28% 4.82% 3.86%
## 13: 4.43% 4.89% 4.06%
## 14: 4.26% 4.80% 3.83%
## 15: 4.14% 4.80% 3.62%
## 16: 3.96% 4.62% 3.43%
## 17: 4.06% 4.77% 3.50%
## 18: 3.88% 4.51% 3.38%
## 19: 3.85% 4.48% 3.36%
## 20: 3.74% 4.48% 3.15%
## 21: 3.73% 4.53% 3.10%
## 22: 3.66% 4.56% 2.94%
## 23: 3.75% 4.65% 3.03%
## 24: 3.68% 4.51% 3.03%
## 25: 3.63% 4.51% 2.94%
## 26: 3.63% 4.68% 2.80%
## 27: 3.63% 4.62% 2.85%
## 28: 3.66% 4.59% 2.92%
## 29: 3.66% 4.71% 2.82%
## 30: 3.68% 4.77% 2.82%
## 31: 3.62% 4.59% 2.85%
## 32: 3.55% 4.68% 2.66%
## 33: 3.54% 4.56% 2.73%
## 34: 3.49% 4.62% 2.59%
## 35: 3.53% 4.74% 2.57%
## 36: 3.50% 4.65% 2.59%
## 37: 3.58% 4.83% 2.59%
## 38: 3.53% 4.80% 2.52%
## 39: 3.42% 4.53% 2.55%
## 40: 3.45% 4.59% 2.55%
## 41: 3.54% 4.68% 2.64%
## 42: 3.44% 4.51% 2.59%
## 43: 3.55% 4.68% 2.66%
## 44: 3.57% 4.65% 2.71%
## 45: 3.52% 4.68% 2.59%
## 46: 3.55% 4.59% 2.73%
## 47: 3.50% 4.51% 2.71%
## 48: 3.49% 4.59% 2.62%
## 49: 3.58% 4.65% 2.73%
## 50: 3.50% 4.53% 2.69%
## 51: 3.50% 4.51% 2.71%
## 52: 3.54% 4.65% 2.66%
## 53: 3.57% 4.68% 2.69%
## 54: 3.61% 4.71% 2.73%
## 55: 3.58% 4.68% 2.71%
## 56: 3.57% 4.74% 2.64%
## 57: 3.57% 4.71% 2.66%
## 58: 3.59% 4.74% 2.69%
## 59: 3.58% 4.68% 2.71%
## 60: 3.53% 4.65% 2.64%
## 61: 3.57% 4.74% 2.64%
## 62: 3.54% 4.74% 2.59%
## 63: 3.50% 4.62% 2.62%
## 64: 3.45% 4.56% 2.57%
## 65: 3.48% 4.65% 2.55%
## 66: 3.49% 4.62% 2.59%
## 67: 3.49% 4.62% 2.59%
## 68: 3.53% 4.56% 2.71%
## 69: 3.49% 4.59% 2.62%
## 70: 3.49% 4.65% 2.57%
## 71: 3.46% 4.53% 2.62%
## 72: 3.48% 4.62% 2.57%
## 73: 3.49% 4.62% 2.59%
## 74: 3.50% 4.62% 2.62%
## 75: 3.54% 4.62% 2.69%
## 76: 3.52% 4.62% 2.64%
## 77: 3.49% 4.56% 2.64%
## 78: 3.52% 4.62% 2.64%
## 79: 3.55% 4.65% 2.69%
## 80: 3.54% 4.62% 2.69%
## 81: 3.46% 4.53% 2.62%
## 82: 3.53% 4.59% 2.69%
## 83: 3.50% 4.48% 2.73%
## 84: 3.49% 4.56% 2.64%
## 85: 3.55% 4.62% 2.71%
## 86: 3.54% 4.53% 2.75%
## 87: 3.48% 4.48% 2.69%
## 88: 3.48% 4.36% 2.78%
## 89: 3.48% 4.45% 2.71%
## 90: 3.46% 4.45% 2.69%
## 91: 3.50% 4.51% 2.71%
## 92: 3.49% 4.53% 2.66%
## 93: 3.53% 4.59% 2.69%
## 94: 3.54% 4.59% 2.71%
## 95: 3.53% 4.56% 2.71%
## 96: 3.50% 4.53% 2.69%
## 97: 3.50% 4.59% 2.64%
## 98: 3.52% 4.53% 2.71%
## 99: 3.54% 4.56% 2.73%
## 100: 3.55% 4.62% 2.71%
## 101: 3.48% 4.53% 2.64%
## 102: 3.50% 4.48% 2.73%
## 103: 3.53% 4.53% 2.73%
## 104: 3.54% 4.53% 2.75%
## 105: 3.55% 4.53% 2.78%
## 106: 3.53% 4.51% 2.75%
## 107: 3.50% 4.45% 2.75%
## 108: 3.53% 4.56% 2.71%
## 109: 3.52% 4.56% 2.69%
## 110: 3.57% 4.65% 2.71%
## 111: 3.54% 4.56% 2.73%
## 112: 3.54% 4.59% 2.71%
## 113: 3.52% 4.59% 2.66%
## 114: 3.52% 4.62% 2.64%
## 115: 3.52% 4.56% 2.69%
## 116: 3.53% 4.65% 2.64%
## 117: 3.53% 4.62% 2.66%
## 118: 3.57% 4.65% 2.71%
## 119: 3.54% 4.62% 2.69%
## 120: 3.54% 4.62% 2.69%
## 121: 3.52% 4.59% 2.66%
## 122: 3.49% 4.59% 2.62%
## 123: 3.54% 4.62% 2.69%
## 124: 3.52% 4.56% 2.69%
## 125: 3.53% 4.62% 2.66%
## 126: 3.52% 4.53% 2.71%
## 127: 3.49% 4.53% 2.66%
## 128: 3.46% 4.51% 2.64%
## 129: 3.49% 4.56% 2.64%
## 130: 3.52% 4.59% 2.66%
## 131: 3.53% 4.62% 2.66%
## 132: 3.50% 4.59% 2.64%
## 133: 3.46% 4.56% 2.59%
## 134: 3.49% 4.62% 2.59%
## 135: 3.52% 4.59% 2.66%
## 136: 3.52% 4.62% 2.64%
## 137: 3.50% 4.62% 2.62%
## 138: 3.50% 4.68% 2.57%
## 139: 3.48% 4.59% 2.59%
## 140: 3.44% 4.56% 2.55%
## 141: 3.44% 4.53% 2.57%
## 142: 3.44% 4.53% 2.57%
## 143: 3.44% 4.51% 2.59%
## 144: 3.46% 4.51% 2.64%
## 145: 3.48% 4.51% 2.66%
## 146: 3.48% 4.53% 2.64%
## 147: 3.48% 4.51% 2.66%
## 148: 3.49% 4.51% 2.69%
## 149: 3.48% 4.51% 2.66%
## 150: 3.48% 4.56% 2.62%
## 151: 3.46% 4.53% 2.62%
## 152: 3.48% 4.56% 2.62%
## 153: 3.50% 4.59% 2.64%
## 154: 3.49% 4.59% 2.62%
## 155: 3.48% 4.59% 2.59%
## 156: 3.46% 4.56% 2.59%
## 157: 3.45% 4.53% 2.59%
## 158: 3.46% 4.53% 2.62%
## 159: 3.49% 4.56% 2.64%
## 160: 3.50% 4.59% 2.64%
## 161: 3.50% 4.62% 2.62%
## 162: 3.48% 4.59% 2.59%
## 163: 3.46% 4.62% 2.55%
## 164: 3.44% 4.59% 2.52%
## 165: 3.44% 4.59% 2.52%
## 166: 3.46% 4.59% 2.57%
## 167: 3.45% 4.59% 2.55%
## 168: 3.48% 4.62% 2.57%
## 169: 3.48% 4.65% 2.55%
## 170: 3.46% 4.62% 2.55%
## 171: 3.48% 4.62% 2.57%
## 172: 3.52% 4.65% 2.62%
## 173: 3.50% 4.65% 2.59%
## 174: 3.50% 4.65% 2.59%
## 175: 3.49% 4.65% 2.57%
## 176: 3.50% 4.62% 2.62%
## 177: 3.52% 4.68% 2.59%
## 178: 3.48% 4.62% 2.57%
## 179: 3.50% 4.62% 2.62%
## 180: 3.50% 4.59% 2.64%
## 181: 3.53% 4.65% 2.64%
## 182: 3.50% 4.68% 2.57%
## 183: 3.53% 4.65% 2.64%
## 184: 3.52% 4.65% 2.62%
## 185: 3.49% 4.62% 2.59%
## 186: 3.52% 4.59% 2.66%
## 187: 3.53% 4.65% 2.64%
## 188: 3.52% 4.62% 2.64%
## 189: 3.53% 4.71% 2.59%
## 190: 3.49% 4.62% 2.59%
## 191: 3.45% 4.59% 2.55%
## 192: 3.50% 4.62% 2.62%
## 193: 3.49% 4.59% 2.62%
## 194: 3.44% 4.51% 2.59%
## 195: 3.49% 4.56% 2.64%
## 196: 3.53% 4.68% 2.62%
## 197: 3.54% 4.74% 2.59%
## 198: 3.50% 4.62% 2.62%
## 199: 3.50% 4.59% 2.64%
## 200: 3.49% 4.62% 2.59%
## 201: 3.50% 4.65% 2.59%
## 202: 3.49% 4.62% 2.59%
## 203: 3.45% 4.59% 2.55%
## 204: 3.49% 4.62% 2.59%
## 205: 3.46% 4.59% 2.57%
## 206: 3.45% 4.53% 2.59%
## 207: 3.41% 4.51% 2.55%
## 208: 3.44% 4.53% 2.57%
## 209: 3.46% 4.53% 2.62%
## 210: 3.41% 4.51% 2.55%
## 211: 3.46% 4.62% 2.55%
## 212: 3.39% 4.48% 2.52%
## 213: 3.42% 4.56% 2.52%
## 214: 3.42% 4.56% 2.52%
## 215: 3.44% 4.59% 2.52%
## 216: 3.41% 4.56% 2.50%
## 217: 3.42% 4.53% 2.55%
## 218: 3.45% 4.59% 2.55%
## 219: 3.41% 4.51% 2.55%
## 220: 3.46% 4.56% 2.59%
## 221: 3.45% 4.53% 2.59%
## 222: 3.45% 4.59% 2.55%
## 223: 3.48% 4.62% 2.57%
## 224: 3.46% 4.59% 2.57%
## 225: 3.45% 4.56% 2.57%
## 226: 3.44% 4.56% 2.55%
## 227: 3.46% 4.65% 2.52%
## 228: 3.49% 4.65% 2.57%
## 229: 3.45% 4.62% 2.52%
## 230: 3.48% 4.62% 2.57%
## 231: 3.45% 4.59% 2.55%
## 232: 3.45% 4.56% 2.57%
## 233: 3.42% 4.56% 2.52%
## 234: 3.41% 4.53% 2.52%
## 235: 3.42% 4.56% 2.52%
## 236: 3.41% 4.56% 2.50%
## 237: 3.42% 4.56% 2.52%
## 238: 3.41% 4.56% 2.50%
## 239: 3.41% 4.59% 2.48%
## 240: 3.42% 4.59% 2.50%
## 241: 3.42% 4.59% 2.50%
## 242: 3.40% 4.56% 2.48%
## 243: 3.42% 4.56% 2.52%
## 244: 3.44% 4.59% 2.52%
## 245: 3.42% 4.53% 2.55%
## 246: 3.42% 4.56% 2.52%
## 247: 3.39% 4.51% 2.50%
## 248: 3.40% 4.53% 2.50%
## 249: 3.41% 4.56% 2.50%
## 250: 3.42% 4.56% 2.52%
## 251: 3.44% 4.56% 2.55%
## 252: 3.41% 4.56% 2.50%
## 253: 3.42% 4.56% 2.52%
## 254: 3.44% 4.59% 2.52%
## 255: 3.42% 4.59% 2.50%
## 256: 3.41% 4.56% 2.50%
## 257: 3.41% 4.59% 2.48%
## 258: 3.41% 4.56% 2.50%
## 259: 3.42% 4.56% 2.52%
## 260: 3.40% 4.53% 2.50%
## 261: 3.41% 4.53% 2.52%
## 262: 3.39% 4.53% 2.48%
## 263: 3.41% 4.59% 2.48%
## 264: 3.41% 4.59% 2.48%
## 265: 3.44% 4.62% 2.50%
## 266: 3.42% 4.59% 2.50%
## 267: 3.39% 4.53% 2.48%
## 268: 3.41% 4.56% 2.50%
## 269: 3.39% 4.59% 2.43%
## 270: 3.41% 4.62% 2.45%
## 271: 3.41% 4.65% 2.43%
## 272: 3.48% 4.68% 2.52%
## 273: 3.41% 4.59% 2.48%
## 274: 3.42% 4.59% 2.50%
## 275: 3.42% 4.59% 2.50%
## 276: 3.41% 4.56% 2.50%
## 277: 3.40% 4.53% 2.50%
## 278: 3.39% 4.51% 2.50%
## 279: 3.41% 4.56% 2.50%
## 280: 3.40% 4.53% 2.50%
## 281: 3.41% 4.59% 2.48%
## 282: 3.40% 4.56% 2.48%
## 283: 3.36% 4.48% 2.48%
## 284: 3.39% 4.53% 2.48%
## 285: 3.36% 4.48% 2.48%
## 286: 3.37% 4.51% 2.48%
## 287: 3.37% 4.51% 2.48%
## 288: 3.36% 4.48% 2.48%
## 289: 3.37% 4.51% 2.48%
## 290: 3.37% 4.51% 2.48%
## 291: 3.36% 4.48% 2.48%
## 292: 3.39% 4.51% 2.50%
## 293: 3.36% 4.48% 2.48%
## 294: 3.36% 4.48% 2.48%
## 295: 3.36% 4.48% 2.48%
## 296: 3.35% 4.48% 2.45%
## 297: 3.35% 4.48% 2.45%
## 298: 3.35% 4.45% 2.48%
## 299: 3.35% 4.48% 2.45%
## 300: 3.33% 4.45% 2.45%
## 301: 3.35% 4.48% 2.45%
## 302: 3.39% 4.53% 2.48%
## 303: 3.35% 4.48% 2.45%
## 304: 3.33% 4.48% 2.43%
## 305: 3.32% 4.48% 2.41%
## 306: 3.33% 4.48% 2.43%
## 307: 3.32% 4.45% 2.43%
## 308: 3.35% 4.45% 2.48%
## 309: 3.33% 4.42% 2.48%
## 310: 3.33% 4.42% 2.48%
## 311: 3.32% 4.45% 2.43%
## 312: 3.32% 4.42% 2.45%
## 313: 3.33% 4.45% 2.45%
## 314: 3.35% 4.45% 2.48%
## 315: 3.33% 4.45% 2.45%
## 316: 3.33% 4.45% 2.45%
## 317: 3.36% 4.48% 2.48%
## 318: 3.33% 4.45% 2.45%
## 319: 3.35% 4.48% 2.45%
## 320: 3.35% 4.48% 2.45%
## 321: 3.35% 4.45% 2.48%
## 322: 3.36% 4.48% 2.48%
## 323: 3.36% 4.48% 2.48%
## 324: 3.35% 4.48% 2.45%
## 325: 3.35% 4.48% 2.45%
## 326: 3.35% 4.48% 2.45%
## 327: 3.33% 4.45% 2.45%
## 328: 3.33% 4.48% 2.43%
## 329: 3.32% 4.42% 2.45%
## 330: 3.33% 4.45% 2.45%
## 331: 3.33% 4.42% 2.48%
## 332: 3.35% 4.45% 2.48%
## 333: 3.39% 4.53% 2.48%
## 334: 3.36% 4.51% 2.45%
## 335: 3.37% 4.53% 2.45%
## 336: 3.37% 4.53% 2.45%
## 337: 3.36% 4.53% 2.43%
## 338: 3.37% 4.53% 2.45%
## 339: 3.36% 4.51% 2.45%
## 340: 3.36% 4.51% 2.45%
## 341: 3.33% 4.48% 2.43%
## 342: 3.33% 4.48% 2.43%
## 343: 3.35% 4.48% 2.45%
## 344: 3.36% 4.51% 2.45%
## 345: 3.36% 4.48% 2.48%
## 346: 3.33% 4.48% 2.43%
## 347: 3.35% 4.51% 2.43%
## 348: 3.36% 4.48% 2.48%
## 349: 3.37% 4.51% 2.48%
## 350: 3.35% 4.48% 2.45%
## 351: 3.33% 4.45% 2.45%
## 352: 3.32% 4.42% 2.45%
## 353: 3.32% 4.42% 2.45%
## 354: 3.33% 4.45% 2.45%
## 355: 3.31% 4.42% 2.43%
## 356: 3.33% 4.48% 2.43%
## 357: 3.33% 4.48% 2.43%
## 358: 3.36% 4.53% 2.43%
## 359: 3.33% 4.48% 2.43%
## 360: 3.35% 4.51% 2.43%
## 361: 3.35% 4.51% 2.43%
## 362: 3.35% 4.51% 2.43%
## 363: 3.36% 4.51% 2.45%
## 364: 3.35% 4.51% 2.43%
## 365: 3.35% 4.53% 2.41%
## 366: 3.36% 4.51% 2.45%
## 367: 3.37% 4.53% 2.45%
## 368: 3.35% 4.51% 2.43%
## 369: 3.37% 4.53% 2.45%
## 370: 3.36% 4.51% 2.45%
## 371: 3.36% 4.51% 2.45%
## 372: 3.36% 4.51% 2.45%
## 373: 3.37% 4.51% 2.48%
## 374: 3.36% 4.51% 2.45%
## 375: 3.36% 4.51% 2.45%
## 376: 3.33% 4.45% 2.45%
## 377: 3.32% 4.42% 2.45%
## 378: 3.33% 4.45% 2.45%
## 379: 3.36% 4.51% 2.45%
## 380: 3.35% 4.48% 2.45%
## 381: 3.33% 4.45% 2.45%
## 382: 3.32% 4.42% 2.45%
## 383: 3.32% 4.42% 2.45%
## 384: 3.32% 4.42% 2.45%
## 385: 3.32% 4.42% 2.45%
## 386: 3.31% 4.42% 2.43%
## 387: 3.30% 4.39% 2.43%
## 388: 3.31% 4.45% 2.41%
## 389: 3.32% 4.45% 2.43%
## 390: 3.32% 4.42% 2.45%
## 391: 3.33% 4.48% 2.43%
## 392: 3.32% 4.45% 2.43%
## 393: 3.35% 4.48% 2.45%
## 394: 3.33% 4.48% 2.43%
## 395: 3.36% 4.51% 2.45%
## 396: 3.33% 4.48% 2.43%
## 397: 3.33% 4.45% 2.45%
## 398: 3.35% 4.48% 2.45%
## 399: 3.33% 4.45% 2.45%
## 400: 3.32% 4.45% 2.43%
## 401: 3.33% 4.48% 2.43%
## 402: 3.35% 4.51% 2.43%
## 403: 3.35% 4.51% 2.43%
## 404: 3.35% 4.51% 2.43%
## 405: 3.33% 4.48% 2.43%
## 406: 3.36% 4.53% 2.43%
## 407: 3.36% 4.51% 2.45%
## 408: 3.37% 4.53% 2.45%
## 409: 3.36% 4.53% 2.43%
## 410: 3.37% 4.53% 2.45%
## 411: 3.35% 4.51% 2.43%
## 412: 3.36% 4.51% 2.45%
## 413: 3.35% 4.48% 2.45%
## 414: 3.36% 4.51% 2.45%
## 415: 3.35% 4.48% 2.45%
## 416: 3.37% 4.53% 2.45%
## 417: 3.36% 4.53% 2.43%
## 418: 3.36% 4.53% 2.43%
## 419: 3.35% 4.51% 2.43%
## 420: 3.35% 4.51% 2.43%
## 421: 3.36% 4.51% 2.45%
## 422: 3.33% 4.45% 2.45%
## 423: 3.37% 4.53% 2.45%
## 424: 3.37% 4.51% 2.48%
## 425: 3.37% 4.51% 2.48%
## 426: 3.37% 4.53% 2.45%
## 427: 3.37% 4.51% 2.48%
## 428: 3.35% 4.51% 2.43%
## 429: 3.40% 4.53% 2.50%
## 430: 3.37% 4.51% 2.48%
## 431: 3.36% 4.51% 2.45%
## 432: 3.39% 4.53% 2.48%
## 433: 3.37% 4.53% 2.45%
## 434: 3.39% 4.53% 2.48%
## 435: 3.39% 4.53% 2.48%
## 436: 3.37% 4.51% 2.48%
## 437: 3.37% 4.53% 2.45%
## 438: 3.42% 4.56% 2.52%
## 439: 3.41% 4.53% 2.52%
## 440: 3.39% 4.53% 2.48%
## 441: 3.40% 4.53% 2.50%
## 442: 3.41% 4.53% 2.52%
## 443: 3.40% 4.56% 2.48%
## 444: 3.40% 4.56% 2.48%
## 445: 3.40% 4.56% 2.48%
## 446: 3.40% 4.56% 2.48%
## 447: 3.41% 4.56% 2.50%
## 448: 3.40% 4.56% 2.48%
## 449: 3.41% 4.56% 2.50%
## 450: 3.41% 4.56% 2.50%
## 451: 3.39% 4.53% 2.48%
## 452: 3.40% 4.56% 2.48%
## 453: 3.41% 4.56% 2.50%
## 454: 3.39% 4.53% 2.48%
## 455: 3.40% 4.53% 2.50%
## 456: 3.41% 4.53% 2.52%
## 457: 3.40% 4.51% 2.52%
## 458: 3.39% 4.51% 2.50%
## 459: 3.39% 4.51% 2.50%
## 460: 3.41% 4.53% 2.52%
## 461: 3.41% 4.53% 2.52%
## 462: 3.40% 4.53% 2.50%
## 463: 3.41% 4.53% 2.52%
## 464: 3.44% 4.59% 2.52%
## 465: 3.40% 4.53% 2.50%
## 466: 3.37% 4.48% 2.50%
## 467: 3.37% 4.48% 2.50%
## 468: 3.37% 4.48% 2.50%
## 469: 3.37% 4.48% 2.50%
## 470: 3.39% 4.51% 2.50%
## 471: 3.39% 4.48% 2.52%
## 472: 3.39% 4.48% 2.52%
## 473: 3.39% 4.48% 2.52%
## 474: 3.40% 4.51% 2.52%
## 475: 3.37% 4.45% 2.52%
## 476: 3.42% 4.56% 2.52%
## 477: 3.40% 4.51% 2.52%
## 478: 3.40% 4.51% 2.52%
## 479: 3.37% 4.48% 2.50%
## 480: 3.37% 4.48% 2.50%
## 481: 3.39% 4.51% 2.50%
## 482: 3.39% 4.51% 2.50%
## 483: 3.39% 4.51% 2.50%
## 484: 3.39% 4.51% 2.50%
## 485: 3.39% 4.51% 2.50%
## 486: 3.39% 4.51% 2.50%
## 487: 3.40% 4.51% 2.52%
## 488: 3.39% 4.53% 2.48%
## 489: 3.39% 4.53% 2.48%
## 490: 3.41% 4.53% 2.52%
## 491: 3.40% 4.53% 2.50%
## 492: 3.42% 4.56% 2.52%
## 493: 3.39% 4.51% 2.50%
## 494: 3.41% 4.51% 2.55%
## 495: 3.40% 4.51% 2.52%
## 496: 3.40% 4.53% 2.50%
## 497: 3.44% 4.59% 2.52%
## 498: 3.41% 4.53% 2.52%
## 499: 3.42% 4.56% 2.52%
## 500: 3.42% 4.56% 2.52%
## 501: 3.41% 4.56% 2.50%
## 502: 3.42% 4.56% 2.52%
## 503: 3.44% 4.59% 2.52%
## 504: 3.42% 4.56% 2.52%
## 505: 3.42% 4.56% 2.52%
## 506: 3.45% 4.59% 2.55%
## 507: 3.44% 4.56% 2.55%
## 508: 3.44% 4.59% 2.52%
## 509: 3.44% 4.56% 2.55%
## 510: 3.45% 4.59% 2.55%
## 511: 3.44% 4.56% 2.55%
## 512: 3.44% 4.56% 2.55%
## 513: 3.44% 4.59% 2.52%
## 514: 3.44% 4.62% 2.50%
## 515: 3.48% 4.65% 2.55%
## 516: 3.44% 4.59% 2.52%
## 517: 3.45% 4.59% 2.55%
## 518: 3.46% 4.62% 2.55%
## 519: 3.46% 4.62% 2.55%
## 520: 3.44% 4.59% 2.52%
## 521: 3.42% 4.56% 2.52%
## 522: 3.41% 4.56% 2.50%
## 523: 3.44% 4.59% 2.52%
## 524: 3.44% 4.59% 2.52%
## 525: 3.44% 4.59% 2.52%
## 526: 3.44% 4.59% 2.52%
## 527: 3.42% 4.59% 2.50%
## 528: 3.42% 4.59% 2.50%
## 529: 3.42% 4.59% 2.50%
## 530: 3.41% 4.56% 2.50%
## 531: 3.41% 4.56% 2.50%
## 532: 3.41% 4.56% 2.50%
## 533: 3.41% 4.56% 2.50%
## 534: 3.41% 4.56% 2.50%
## 535: 3.41% 4.56% 2.50%
## 536: 3.42% 4.59% 2.50%
## 537: 3.41% 4.56% 2.50%
## 538: 3.40% 4.53% 2.50%
## 539: 3.39% 4.51% 2.50%
## 540: 3.39% 4.51% 2.50%
## 541: 3.39% 4.51% 2.50%
## 542: 3.40% 4.53% 2.50%
## 543: 3.40% 4.53% 2.50%
## 544: 3.39% 4.51% 2.50%
## 545: 3.39% 4.51% 2.50%
## 546: 3.40% 4.53% 2.50%
## 547: 3.39% 4.51% 2.50%
## 548: 3.39% 4.51% 2.50%
## 549: 3.39% 4.51% 2.50%
## 550: 3.39% 4.51% 2.50%
## 551: 3.39% 4.51% 2.50%
## 552: 3.40% 4.53% 2.50%
## 553: 3.40% 4.53% 2.50%
## 554: 3.40% 4.53% 2.50%
## 555: 3.39% 4.51% 2.50%
## 556: 3.39% 4.51% 2.50%
## 557: 3.39% 4.53% 2.48%
## 558: 3.39% 4.53% 2.48%
## 559: 3.39% 4.51% 2.50%
## 560: 3.37% 4.51% 2.48%
## 561: 3.39% 4.53% 2.48%
## 562: 3.39% 4.53% 2.48%
## 563: 3.39% 4.51% 2.50%
## 564: 3.39% 4.51% 2.50%
## 565: 3.40% 4.53% 2.50%
## 566: 3.39% 4.51% 2.50%
## 567: 3.37% 4.51% 2.48%
## 568: 3.39% 4.51% 2.50%
## 569: 3.39% 4.51% 2.50%
## 570: 3.36% 4.51% 2.45%
## 571: 3.37% 4.51% 2.48%
## 572: 3.39% 4.51% 2.50%
## 573: 3.39% 4.51% 2.50%
## 574: 3.39% 4.51% 2.50%
## 575: 3.39% 4.51% 2.50%
## 576: 3.39% 4.51% 2.50%
## 577: 3.39% 4.51% 2.50%
## 578: 3.36% 4.51% 2.45%
## 579: 3.36% 4.51% 2.45%
## 580: 3.39% 4.51% 2.50%
## 581: 3.36% 4.51% 2.45%
## 582: 3.37% 4.51% 2.48%
## 583: 3.37% 4.51% 2.48%
## 584: 3.37% 4.51% 2.48%
## 585: 3.39% 4.53% 2.48%
## 586: 3.40% 4.56% 2.48%
## 587: 3.41% 4.56% 2.50%
## 588: 3.41% 4.59% 2.48%
## 589: 3.41% 4.56% 2.50%
## 590: 3.41% 4.59% 2.48%
## 591: 3.41% 4.56% 2.50%
## 592: 3.41% 4.56% 2.50%
## 593: 3.41% 4.56% 2.50%
## 594: 3.40% 4.56% 2.48%
## 595: 3.40% 4.53% 2.50%
## 596: 3.42% 4.56% 2.52%
## 597: 3.42% 4.56% 2.52%
## 598: 3.40% 4.53% 2.50%
## 599: 3.42% 4.56% 2.52%
## 600: 3.41% 4.56% 2.50%
## 601: 3.41% 4.56% 2.50%
## 602: 3.41% 4.56% 2.50%
## 603: 3.41% 4.56% 2.50%
## 604: 3.41% 4.56% 2.50%
## 605: 3.42% 4.59% 2.50%
## 606: 3.41% 4.56% 2.50%
## 607: 3.42% 4.59% 2.50%
## 608: 3.42% 4.59% 2.50%
## 609: 3.42% 4.59% 2.50%
## 610: 3.41% 4.56% 2.50%
## 611: 3.42% 4.59% 2.50%
## 612: 3.44% 4.59% 2.52%
## 613: 3.41% 4.56% 2.50%
## 614: 3.42% 4.56% 2.52%
## 615: 3.42% 4.56% 2.52%
## 616: 3.44% 4.56% 2.55%
## 617: 3.41% 4.56% 2.50%
## 618: 3.41% 4.56% 2.50%
## 619: 3.44% 4.59% 2.52%
## 620: 3.42% 4.56% 2.52%
## 621: 3.42% 4.56% 2.52%
## 622: 3.42% 4.56% 2.52%
## 623: 3.44% 4.59% 2.52%
## 624: 3.42% 4.59% 2.50%
## 625: 3.44% 4.62% 2.50%
## 626: 3.45% 4.62% 2.52%
## 627: 3.44% 4.59% 2.52%
## 628: 3.44% 4.62% 2.50%
## 629: 3.46% 4.62% 2.55%
## 630: 3.44% 4.59% 2.52%
## 631: 3.45% 4.56% 2.57%
## 632: 3.44% 4.56% 2.55%
## 633: 3.44% 4.56% 2.55%
## 634: 3.44% 4.59% 2.52%
## 635: 3.46% 4.59% 2.57%
## 636: 3.44% 4.56% 2.55%
## 637: 3.42% 4.56% 2.52%
## 638: 3.42% 4.56% 2.52%
## 639: 3.44% 4.59% 2.52%
## 640: 3.46% 4.62% 2.55%
## 641: 3.45% 4.62% 2.52%
## 642: 3.44% 4.59% 2.52%
## 643: 3.44% 4.59% 2.52%
## 644: 3.45% 4.62% 2.52%
## 645: 3.46% 4.62% 2.55%
## 646: 3.48% 4.65% 2.55%
## 647: 3.46% 4.65% 2.52%
## 648: 3.45% 4.62% 2.52%
## 649: 3.45% 4.62% 2.52%
## 650: 3.46% 4.65% 2.52%
## 651: 3.46% 4.65% 2.52%
## 652: 3.46% 4.65% 2.52%
## 653: 3.46% 4.65% 2.52%
## 654: 3.45% 4.62% 2.52%
## 655: 3.46% 4.65% 2.52%
## 656: 3.45% 4.62% 2.52%
## 657: 3.46% 4.65% 2.52%
## 658: 3.44% 4.62% 2.50%
## 659: 3.44% 4.62% 2.50%
## 660: 3.44% 4.62% 2.50%
## 661: 3.44% 4.62% 2.50%
## 662: 3.44% 4.62% 2.50%
## 663: 3.44% 4.62% 2.50%
## 664: 3.44% 4.62% 2.50%
## 665: 3.42% 4.59% 2.50%
## 666: 3.44% 4.62% 2.50%
## 667: 3.44% 4.62% 2.50%
## 668: 3.44% 4.62% 2.50%
## 669: 3.42% 4.59% 2.50%
## 670: 3.44% 4.62% 2.50%
## 671: 3.45% 4.65% 2.50%
## 672: 3.45% 4.62% 2.52%
## 673: 3.45% 4.62% 2.52%
## 674: 3.44% 4.59% 2.52%
## 675: 3.45% 4.62% 2.52%
## 676: 3.44% 4.59% 2.52%
## 677: 3.45% 4.62% 2.52%
## 678: 3.45% 4.62% 2.52%
## 679: 3.44% 4.62% 2.50%
## 680: 3.44% 4.59% 2.52%
## 681: 3.42% 4.59% 2.50%
## 682: 3.42% 4.59% 2.50%
## 683: 3.44% 4.59% 2.52%
## 684: 3.45% 4.62% 2.52%
## 685: 3.45% 4.62% 2.52%
## 686: 3.45% 4.59% 2.55%
## 687: 3.45% 4.59% 2.55%
## 688: 3.44% 4.59% 2.52%
## 689: 3.41% 4.53% 2.52%
## 690: 3.42% 4.56% 2.52%
## 691: 3.42% 4.53% 2.55%
## 692: 3.41% 4.53% 2.52%
## 693: 3.41% 4.51% 2.55%
## 694: 3.44% 4.53% 2.57%
## 695: 3.42% 4.53% 2.55%
## 696: 3.40% 4.53% 2.50%
## 697: 3.41% 4.53% 2.52%
## 698: 3.44% 4.56% 2.55%
## 699: 3.40% 4.51% 2.52%
## 700: 3.40% 4.51% 2.52%
## 701: 3.39% 4.51% 2.50%
## 702: 3.39% 4.51% 2.50%
## 703: 3.40% 4.51% 2.52%
## 704: 3.41% 4.51% 2.55%
## 705: 3.40% 4.51% 2.52%
## 706: 3.41% 4.51% 2.55%
## 707: 3.41% 4.53% 2.52%
## 708: 3.41% 4.53% 2.52%
## 709: 3.41% 4.53% 2.52%
## 710: 3.41% 4.53% 2.52%
## 711: 3.41% 4.53% 2.52%
## 712: 3.41% 4.53% 2.52%
## 713: 3.41% 4.53% 2.52%
## 714: 3.41% 4.53% 2.52%
## 715: 3.42% 4.53% 2.55%
## 716: 3.40% 4.51% 2.52%
## 717: 3.40% 4.51% 2.52%
## 718: 3.39% 4.51% 2.50%
## 719: 3.39% 4.51% 2.50%
## 720: 3.40% 4.51% 2.52%
## 721: 3.40% 4.51% 2.52%
## 722: 3.40% 4.51% 2.52%
## 723: 3.40% 4.51% 2.52%
## 724: 3.40% 4.51% 2.52%
## 725: 3.40% 4.51% 2.52%
## 726: 3.40% 4.51% 2.52%
## 727: 3.40% 4.51% 2.52%
## 728: 3.41% 4.53% 2.52%
## 729: 3.41% 4.53% 2.52%
## 730: 3.41% 4.53% 2.52%
## 731: 3.41% 4.53% 2.52%
## 732: 3.42% 4.56% 2.52%
## 733: 3.42% 4.56% 2.52%
## 734: 3.41% 4.53% 2.52%
## 735: 3.42% 4.56% 2.52%
## 736: 3.42% 4.56% 2.52%
## 737: 3.42% 4.56% 2.52%
## 738: 3.44% 4.59% 2.52%
## 739: 3.42% 4.56% 2.52%
## 740: 3.44% 4.59% 2.52%
## 741: 3.44% 4.59% 2.52%
## 742: 3.41% 4.56% 2.50%
## 743: 3.42% 4.56% 2.52%
## 744: 3.41% 4.53% 2.52%
## 745: 3.41% 4.53% 2.52%
## 746: 3.41% 4.53% 2.52%
## 747: 3.41% 4.53% 2.52%
## 748: 3.41% 4.53% 2.52%
## 749: 3.41% 4.53% 2.52%
## 750: 3.41% 4.53% 2.52%
## 751: 3.41% 4.53% 2.52%
## 752: 3.41% 4.53% 2.52%
## 753: 3.41% 4.53% 2.52%
## 754: 3.41% 4.53% 2.52%
## 755: 3.39% 4.51% 2.50%
## 756: 3.39% 4.51% 2.50%
## 757: 3.42% 4.53% 2.55%
## 758: 3.44% 4.53% 2.57%
## 759: 3.40% 4.51% 2.52%
## 760: 3.41% 4.53% 2.52%
## 761: 3.41% 4.53% 2.52%
## 762: 3.42% 4.53% 2.55%
## 763: 3.41% 4.53% 2.52%
## 764: 3.41% 4.53% 2.52%
## 765: 3.41% 4.56% 2.50%
## 766: 3.41% 4.53% 2.52%
## 767: 3.41% 4.53% 2.52%
## 768: 3.41% 4.53% 2.52%
## 769: 3.41% 4.53% 2.52%
## 770: 3.41% 4.53% 2.52%
## 771: 3.42% 4.56% 2.52%
## 772: 3.42% 4.56% 2.52%
## 773: 3.42% 4.56% 2.52%
## 774: 3.42% 4.56% 2.52%
## 775: 3.41% 4.56% 2.50%
## 776: 3.41% 4.56% 2.50%
## 777: 3.41% 4.56% 2.50%
## 778: 3.41% 4.56% 2.50%
## 779: 3.40% 4.56% 2.48%
## 780: 3.40% 4.56% 2.48%
## 781: 3.40% 4.56% 2.48%
## 782: 3.40% 4.56% 2.48%
## 783: 3.39% 4.53% 2.48%
## 784: 3.40% 4.56% 2.48%
## 785: 3.40% 4.56% 2.48%
## 786: 3.40% 4.56% 2.48%
## 787: 3.40% 4.56% 2.48%
## 788: 3.41% 4.59% 2.48%
## 789: 3.40% 4.56% 2.48%
## 790: 3.42% 4.59% 2.50%
## 791: 3.42% 4.59% 2.50%
## 792: 3.44% 4.62% 2.50%
## 793: 3.44% 4.62% 2.50%
## 794: 3.42% 4.59% 2.50%
## 795: 3.42% 4.59% 2.50%
## 796: 3.44% 4.62% 2.50%
## 797: 3.42% 4.59% 2.50%
## 798: 3.44% 4.59% 2.52%
## 799: 3.42% 4.59% 2.50%
## 800: 3.44% 4.59% 2.52%
## 801: 3.44% 4.59% 2.52%
## 802: 3.44% 4.59% 2.52%
## 803: 3.45% 4.59% 2.55%
## 804: 3.45% 4.59% 2.55%
## 805: 3.44% 4.59% 2.52%
## 806: 3.44% 4.59% 2.52%
## 807: 3.44% 4.59% 2.52%
## 808: 3.44% 4.59% 2.52%
## 809: 3.44% 4.59% 2.52%
## 810: 3.44% 4.59% 2.52%
## 811: 3.42% 4.59% 2.50%
## 812: 3.44% 4.62% 2.50%
## 813: 3.44% 4.62% 2.50%
## 814: 3.45% 4.62% 2.52%
## 815: 3.45% 4.65% 2.50%
## 816: 3.46% 4.65% 2.52%
## 817: 3.45% 4.65% 2.50%
## 818: 3.45% 4.65% 2.50%
## 819: 3.44% 4.62% 2.50%
## 820: 3.45% 4.62% 2.52%
## 821: 3.46% 4.62% 2.55%
## 822: 3.46% 4.62% 2.55%
## 823: 3.46% 4.62% 2.55%
## 824: 3.45% 4.62% 2.52%
## 825: 3.48% 4.65% 2.55%
## 826: 3.48% 4.65% 2.55%
## 827: 3.48% 4.65% 2.55%
## 828: 3.48% 4.65% 2.55%
## 829: 3.48% 4.65% 2.55%
## 830: 3.46% 4.62% 2.55%
## 831: 3.46% 4.62% 2.55%
## 832: 3.46% 4.62% 2.55%
## 833: 3.45% 4.59% 2.55%
## 834: 3.45% 4.62% 2.52%
## 835: 3.46% 4.62% 2.55%
## 836: 3.46% 4.62% 2.55%
## 837: 3.48% 4.65% 2.55%
## 838: 3.48% 4.65% 2.55%
## 839: 3.48% 4.65% 2.55%
## 840: 3.48% 4.65% 2.55%
## 841: 3.46% 4.65% 2.52%
## 842: 3.48% 4.65% 2.55%
## 843: 3.46% 4.65% 2.52%
## 844: 3.46% 4.62% 2.55%
## 845: 3.46% 4.62% 2.55%
## 846: 3.46% 4.62% 2.55%
## 847: 3.48% 4.68% 2.52%
## 848: 3.48% 4.68% 2.52%
## 849: 3.42% 4.62% 2.48%
## 850: 3.44% 4.62% 2.50%
## 851: 3.45% 4.65% 2.50%
## 852: 3.46% 4.65% 2.52%
## 853: 3.45% 4.65% 2.50%
## 854: 3.46% 4.65% 2.52%
## 855: 3.48% 4.68% 2.52%
## 856: 3.46% 4.65% 2.52%
## 857: 3.45% 4.62% 2.52%
## 858: 3.45% 4.62% 2.52%
## 859: 3.48% 4.65% 2.55%
## 860: 3.46% 4.62% 2.55%
## 861: 3.46% 4.62% 2.55%
## 862: 3.48% 4.65% 2.55%
## 863: 3.48% 4.65% 2.55%
## 864: 3.48% 4.65% 2.55%
## 865: 3.48% 4.65% 2.55%
## 866: 3.48% 4.65% 2.55%
## 867: 3.48% 4.65% 2.55%
## 868: 3.48% 4.65% 2.55%
## 869: 3.46% 4.65% 2.52%
## 870: 3.46% 4.65% 2.52%
## 871: 3.46% 4.65% 2.52%
## 872: 3.48% 4.65% 2.55%
## 873: 3.48% 4.65% 2.55%
## 874: 3.48% 4.65% 2.55%
## 875: 3.46% 4.65% 2.52%
## 876: 3.46% 4.65% 2.52%
## 877: 3.45% 4.62% 2.52%
## 878: 3.45% 4.62% 2.52%
## 879: 3.46% 4.62% 2.55%
## 880: 3.48% 4.65% 2.55%
## 881: 3.46% 4.62% 2.55%
## 882: 3.45% 4.62% 2.52%
## 883: 3.46% 4.62% 2.55%
## 884: 3.45% 4.59% 2.55%
## 885: 3.45% 4.59% 2.55%
## 886: 3.42% 4.59% 2.50%
## 887: 3.44% 4.59% 2.52%
## 888: 3.46% 4.59% 2.57%
## 889: 3.44% 4.56% 2.55%
## 890: 3.44% 4.56% 2.55%
## 891: 3.46% 4.59% 2.57%
## 892: 3.44% 4.56% 2.55%
## 893: 3.44% 4.56% 2.55%
## 894: 3.45% 4.56% 2.57%
## 895: 3.44% 4.56% 2.55%
## 896: 3.42% 4.56% 2.52%
## 897: 3.42% 4.56% 2.52%
## 898: 3.42% 4.56% 2.52%
## 899: 3.41% 4.56% 2.50%
## 900: 3.44% 4.56% 2.55%
## 901: 3.44% 4.56% 2.55%
## 902: 3.45% 4.56% 2.57%
## 903: 3.42% 4.53% 2.55%
## 904: 3.45% 4.56% 2.57%
## 905: 3.45% 4.56% 2.57%
## 906: 3.44% 4.53% 2.57%
## 907: 3.42% 4.53% 2.55%
## 908: 3.45% 4.56% 2.57%
## 909: 3.45% 4.56% 2.57%
## 910: 3.45% 4.56% 2.57%
## 911: 3.44% 4.56% 2.55%
## 912: 3.46% 4.59% 2.57%
## 913: 3.45% 4.59% 2.55%
## 914: 3.44% 4.59% 2.52%
## 915: 3.45% 4.59% 2.55%
## 916: 3.45% 4.59% 2.55%
## 917: 3.44% 4.56% 2.55%
## 918: 3.44% 4.56% 2.55%
## 919: 3.45% 4.59% 2.55%
## 920: 3.44% 4.56% 2.55%
## 921: 3.44% 4.56% 2.55%
## 922: 3.44% 4.56% 2.55%
## 923: 3.42% 4.56% 2.52%
## 924: 3.44% 4.56% 2.55%
## 925: 3.44% 4.56% 2.55%
## 926: 3.44% 4.56% 2.55%
## 927: 3.44% 4.56% 2.55%
## 928: 3.46% 4.59% 2.57%
## 929: 3.45% 4.56% 2.57%
## 930: 3.46% 4.59% 2.57%
## 931: 3.46% 4.59% 2.57%
## 932: 3.45% 4.56% 2.57%
## 933: 3.46% 4.59% 2.57%
## 934: 3.46% 4.59% 2.57%
## 935: 3.45% 4.59% 2.55%
## 936: 3.45% 4.56% 2.57%
## 937: 3.45% 4.56% 2.57%
## 938: 3.46% 4.59% 2.57%
## 939: 3.44% 4.56% 2.55%
## 940: 3.45% 4.56% 2.57%
## 941: 3.44% 4.56% 2.55%
## 942: 3.44% 4.56% 2.55%
## 943: 3.44% 4.56% 2.55%
## 944: 3.42% 4.53% 2.55%
## 945: 3.44% 4.53% 2.57%
## 946: 3.44% 4.53% 2.57%
## 947: 3.45% 4.56% 2.57%
## 948: 3.44% 4.53% 2.57%
## 949: 3.42% 4.53% 2.55%
## 950: 3.45% 4.56% 2.57%
## 951: 3.44% 4.56% 2.55%
## 952: 3.46% 4.59% 2.57%
## 953: 3.48% 4.59% 2.59%
## 954: 3.48% 4.62% 2.57%
## 955: 3.46% 4.59% 2.57%
## 956: 3.42% 4.53% 2.55%
## 957: 3.44% 4.56% 2.55%
## 958: 3.44% 4.56% 2.55%
## 959: 3.44% 4.56% 2.55%
## 960: 3.45% 4.59% 2.55%
## 961: 3.44% 4.56% 2.55%
## 962: 3.42% 4.53% 2.55%
## 963: 3.42% 4.53% 2.55%
## 964: 3.44% 4.56% 2.55%
## 965: 3.42% 4.56% 2.52%
## 966: 3.44% 4.56% 2.55%
## 967: 3.44% 4.56% 2.55%
## 968: 3.44% 4.59% 2.52%
## 969: 3.42% 4.56% 2.52%
## 970: 3.42% 4.56% 2.52%
## 971: 3.42% 4.56% 2.52%
## 972: 3.42% 4.56% 2.52%
## 973: 3.41% 4.56% 2.50%
## 974: 3.42% 4.56% 2.52%
## 975: 3.42% 4.56% 2.52%
## 976: 3.41% 4.56% 2.50%
## 977: 3.42% 4.56% 2.52%
## 978: 3.41% 4.56% 2.50%
## 979: 3.41% 4.56% 2.50%
## 980: 3.42% 4.56% 2.52%
## 981: 3.42% 4.56% 2.52%
## 982: 3.44% 4.56% 2.55%
## 983: 3.44% 4.56% 2.55%
## 984: 3.46% 4.56% 2.59%
## 985: 3.46% 4.56% 2.59%
## 986: 3.44% 4.56% 2.55%
## 987: 3.46% 4.56% 2.59%
## 988: 3.45% 4.56% 2.57%
## 989: 3.44% 4.56% 2.55%
## 990: 3.44% 4.56% 2.55%
## 991: 3.45% 4.56% 2.57%
## 992: 3.44% 4.56% 2.55%
## 993: 3.45% 4.56% 2.57%
## 994: 3.46% 4.56% 2.59%
## 995: 3.44% 4.56% 2.55%
## 996: 3.46% 4.59% 2.57%
## 997: 3.44% 4.56% 2.55%
## 998: 3.44% 4.56% 2.55%
## 999: 3.46% 4.59% 2.57%
## 1000: 3.46% 4.59% 2.57%
varImpPlot(tree3.train)
plot(tree3.train)
Let us apply our model which reduces variance of forecast with averaged predictions from generated subset trees on test dataset.
tree3.predict = predict(tree3.train,newdata=testset[,-31],type="class")
c3<-confusionMatrix(factor(tree3.predict),factor(testset$Class))
c3
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1401 37
## 1 79 1799
##
## Accuracy : 0.965
## 95% CI : (0.9582, 0.971)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.929
##
## Mcnemar's Test P-Value : 0.0001408
##
## Sensitivity : 0.9466
## Specificity : 0.9798
## Pos Pred Value : 0.9743
## Neg Pred Value : 0.9579
## Prevalence : 0.4463
## Detection Rate : 0.4225
## Detection Prevalence : 0.4337
## Balanced Accuracy : 0.9632
##
## 'Positive' Class : 0
##
We have here an accuracy score of 0.96 which is way better than previous ! Let us compare it with one final tree model.
Finally, last tree here is Boosted Tree aka XGBoost. XGBoost is a well-known and efficient open source implementation of the improved gradient tree algorithm.
Gradient boosting is a supervised learning algorithm, which attempts to accurately predict a target variable by combining estimates from a simpler and weaker set of models. GBoost reduces a regularized objective function (L1 and L2) that combines a convex loss function (based on the difference between predicted and target outputs) and a penalty condition for model complexity (in other words, regression tree functions).
Training continues iteratively, adding new trees that predict residuals or errors from previous trees that are then combined with the previous trees to make the final prediction.
In other words we are building a tree and looks which value is predicted poorly and assign to it higher weigh in our prediction.
Let us build and predict our model with 1000 maximum number of boosting iterations.
tree4.train = xgboost::xgboost(data=data.matrix(trainingset[,-31]),label=as.numeric(as.character(trainingset$Class)),nrounds=1000,params=list(booster="gbtree", eta=0.10, max_depth = 3, objective="binary:logistic",subsample = 0.50, colsample_bytree=0.50))
## [1] train-error:0.136211
## [2] train-error:0.103257
## [3] train-error:0.085681
## [4] train-error:0.085035
## [5] train-error:0.100026
## [6] train-error:0.086327
## [7] train-error:0.088912
## [8] train-error:0.100414
## [9] train-error:0.101060
## [10] train-error:0.100155
## [11] train-error:0.086973
## [12] train-error:0.087878
## [13] train-error:0.089041
## [14] train-error:0.090463
## [15] train-error:0.086586
## [16] train-error:0.084776
## [17] train-error:0.082321
## [18] train-error:0.081416
## [19] train-error:0.076635
## [20] train-error:0.073533
## [21] train-error:0.072112
## [22] train-error:0.071465
## [23] train-error:0.070044
## [24] train-error:0.070690
## [25] train-error:0.070432
## [26] train-error:0.070302
## [27] train-error:0.070819
## [28] train-error:0.070302
## [29] train-error:0.070561
## [30] train-error:0.070432
## [31] train-error:0.070302
## [32] train-error:0.070302
## [33] train-error:0.070302
## [34] train-error:0.069527
## [35] train-error:0.070044
## [36] train-error:0.068235
## [37] train-error:0.068622
## [38] train-error:0.068752
## [39] train-error:0.067847
## [40] train-error:0.067976
## [41] train-error:0.067589
## [42] train-error:0.066684
## [43] train-error:0.067072
## [44] train-error:0.066942
## [45] train-error:0.066167
## [46] train-error:0.065779
## [47] train-error:0.065133
## [48] train-error:0.065004
## [49] train-error:0.064616
## [50] train-error:0.064616
## [51] train-error:0.063065
## [52] train-error:0.063841
## [53] train-error:0.063841
## [54] train-error:0.063582
## [55] train-error:0.063841
## [56] train-error:0.062678
## [57] train-error:0.062161
## [58] train-error:0.059835
## [59] train-error:0.059576
## [60] train-error:0.058671
## [61] train-error:0.059188
## [62] train-error:0.059318
## [63] train-error:0.058671
## [64] train-error:0.058930
## [65] train-error:0.059188
## [66] train-error:0.058413
## [67] train-error:0.058542
## [68] train-error:0.057896
## [69] train-error:0.058413
## [70] train-error:0.057767
## [71] train-error:0.056604
## [72] train-error:0.057121
## [73] train-error:0.057250
## [74] train-error:0.056604
## [75] train-error:0.056733
## [76] train-error:0.056475
## [77] train-error:0.056345
## [78] train-error:0.055828
## [79] train-error:0.056087
## [80] train-error:0.056345
## [81] train-error:0.056345
## [82] train-error:0.055828
## [83] train-error:0.056087
## [84] train-error:0.053761
## [85] train-error:0.053114
## [86] train-error:0.053244
## [87] train-error:0.052468
## [88] train-error:0.052727
## [89] train-error:0.052985
## [90] train-error:0.052985
## [91] train-error:0.051951
## [92] train-error:0.051822
## [93] train-error:0.052339
## [94] train-error:0.051693
## [95] train-error:0.051564
## [96] train-error:0.050401
## [97] train-error:0.050142
## [98] train-error:0.049754
## [99] train-error:0.049754
## [100] train-error:0.050401
## [101] train-error:0.050271
## [102] train-error:0.049754
## [103] train-error:0.049496
## [104] train-error:0.048979
## [105] train-error:0.049496
## [106] train-error:0.049625
## [107] train-error:0.050013
## [108] train-error:0.049754
## [109] train-error:0.049238
## [110] train-error:0.049496
## [111] train-error:0.049754
## [112] train-error:0.049367
## [113] train-error:0.049496
## [114] train-error:0.048591
## [115] train-error:0.048074
## [116] train-error:0.047558
## [117] train-error:0.046782
## [118] train-error:0.047170
## [119] train-error:0.046265
## [120] train-error:0.046394
## [121] train-error:0.046653
## [122] train-error:0.047170
## [123] train-error:0.047299
## [124] train-error:0.047428
## [125] train-error:0.047299
## [126] train-error:0.047041
## [127] train-error:0.047170
## [128] train-error:0.047041
## [129] train-error:0.046653
## [130] train-error:0.046394
## [131] train-error:0.047041
## [132] train-error:0.046782
## [133] train-error:0.046653
## [134] train-error:0.046265
## [135] train-error:0.046265
## [136] train-error:0.046394
## [137] train-error:0.045490
## [138] train-error:0.044973
## [139] train-error:0.044585
## [140] train-error:0.045102
## [141] train-error:0.045490
## [142] train-error:0.045102
## [143] train-error:0.045102
## [144] train-error:0.045619
## [145] train-error:0.044973
## [146] train-error:0.044844
## [147] train-error:0.044714
## [148] train-error:0.044714
## [149] train-error:0.044844
## [150] train-error:0.044973
## [151] train-error:0.044973
## [152] train-error:0.044973
## [153] train-error:0.045231
## [154] train-error:0.044973
## [155] train-error:0.044714
## [156] train-error:0.044844
## [157] train-error:0.043810
## [158] train-error:0.043810
## [159] train-error:0.043681
## [160] train-error:0.043681
## [161] train-error:0.043293
## [162] train-error:0.043293
## [163] train-error:0.043681
## [164] train-error:0.043551
## [165] train-error:0.043293
## [166] train-error:0.043422
## [167] train-error:0.043551
## [168] train-error:0.043164
## [169] train-error:0.043551
## [170] train-error:0.043034
## [171] train-error:0.043164
## [172] train-error:0.043034
## [173] train-error:0.042388
## [174] train-error:0.042517
## [175] train-error:0.042517
## [176] train-error:0.042517
## [177] train-error:0.042259
## [178] train-error:0.042259
## [179] train-error:0.042388
## [180] train-error:0.041613
## [181] train-error:0.041484
## [182] train-error:0.041096
## [183] train-error:0.041225
## [184] train-error:0.041225
## [185] train-error:0.041096
## [186] train-error:0.041225
## [187] train-error:0.041225
## [188] train-error:0.041225
## [189] train-error:0.041484
## [190] train-error:0.041613
## [191] train-error:0.041742
## [192] train-error:0.041613
## [193] train-error:0.041484
## [194] train-error:0.041484
## [195] train-error:0.040967
## [196] train-error:0.040967
## [197] train-error:0.041354
## [198] train-error:0.040967
## [199] train-error:0.040837
## [200] train-error:0.040967
## [201] train-error:0.040837
## [202] train-error:0.040579
## [203] train-error:0.040579
## [204] train-error:0.040579
## [205] train-error:0.040191
## [206] train-error:0.040320
## [207] train-error:0.040320
## [208] train-error:0.040450
## [209] train-error:0.040450
## [210] train-error:0.040967
## [211] train-error:0.041354
## [212] train-error:0.040708
## [213] train-error:0.041225
## [214] train-error:0.040708
## [215] train-error:0.040967
## [216] train-error:0.040967
## [217] train-error:0.041096
## [218] train-error:0.040967
## [219] train-error:0.041354
## [220] train-error:0.041354
## [221] train-error:0.042001
## [222] train-error:0.042130
## [223] train-error:0.041354
## [224] train-error:0.041484
## [225] train-error:0.041484
## [226] train-error:0.040967
## [227] train-error:0.040579
## [228] train-error:0.040320
## [229] train-error:0.040450
## [230] train-error:0.040708
## [231] train-error:0.040320
## [232] train-error:0.040450
## [233] train-error:0.040320
## [234] train-error:0.040320
## [235] train-error:0.040062
## [236] train-error:0.039416
## [237] train-error:0.039804
## [238] train-error:0.040191
## [239] train-error:0.040579
## [240] train-error:0.039933
## [241] train-error:0.039804
## [242] train-error:0.040320
## [243] train-error:0.040191
## [244] train-error:0.040062
## [245] train-error:0.039933
## [246] train-error:0.040062
## [247] train-error:0.040191
## [248] train-error:0.040062
## [249] train-error:0.039804
## [250] train-error:0.039545
## [251] train-error:0.039157
## [252] train-error:0.039028
## [253] train-error:0.038899
## [254] train-error:0.038899
## [255] train-error:0.038899
## [256] train-error:0.038770
## [257] train-error:0.039157
## [258] train-error:0.038640
## [259] train-error:0.038253
## [260] train-error:0.038253
## [261] train-error:0.037865
## [262] train-error:0.038253
## [263] train-error:0.038640
## [264] train-error:0.038511
## [265] train-error:0.039028
## [266] train-error:0.038770
## [267] train-error:0.038770
## [268] train-error:0.038640
## [269] train-error:0.038770
## [270] train-error:0.038770
## [271] train-error:0.039028
## [272] train-error:0.039157
## [273] train-error:0.038511
## [274] train-error:0.038382
## [275] train-error:0.038253
## [276] train-error:0.038124
## [277] train-error:0.037865
## [278] train-error:0.038382
## [279] train-error:0.038640
## [280] train-error:0.038124
## [281] train-error:0.037219
## [282] train-error:0.037348
## [283] train-error:0.037219
## [284] train-error:0.037348
## [285] train-error:0.037477
## [286] train-error:0.037607
## [287] train-error:0.037607
## [288] train-error:0.037090
## [289] train-error:0.036960
## [290] train-error:0.037090
## [291] train-error:0.036960
## [292] train-error:0.037090
## [293] train-error:0.037348
## [294] train-error:0.037348
## [295] train-error:0.037348
## [296] train-error:0.037348
## [297] train-error:0.037090
## [298] train-error:0.036831
## [299] train-error:0.036831
## [300] train-error:0.036960
## [301] train-error:0.037219
## [302] train-error:0.036831
## [303] train-error:0.036314
## [304] train-error:0.036444
## [305] train-error:0.036831
## [306] train-error:0.036831
## [307] train-error:0.037477
## [308] train-error:0.036960
## [309] train-error:0.036573
## [310] train-error:0.037477
## [311] train-error:0.037348
## [312] train-error:0.037348
## [313] train-error:0.036831
## [314] train-error:0.036702
## [315] train-error:0.036702
## [316] train-error:0.036573
## [317] train-error:0.036573
## [318] train-error:0.036702
## [319] train-error:0.037348
## [320] train-error:0.036960
## [321] train-error:0.036960
## [322] train-error:0.037090
## [323] train-error:0.036444
## [324] train-error:0.036573
## [325] train-error:0.036960
## [326] train-error:0.035927
## [327] train-error:0.036444
## [328] train-error:0.036185
## [329] train-error:0.035410
## [330] train-error:0.035797
## [331] train-error:0.036056
## [332] train-error:0.036573
## [333] train-error:0.035797
## [334] train-error:0.035280
## [335] train-error:0.036185
## [336] train-error:0.036831
## [337] train-error:0.036702
## [338] train-error:0.036573
## [339] train-error:0.036444
## [340] train-error:0.036185
## [341] train-error:0.035927
## [342] train-error:0.035797
## [343] train-error:0.035927
## [344] train-error:0.036185
## [345] train-error:0.036056
## [346] train-error:0.035797
## [347] train-error:0.036314
## [348] train-error:0.035280
## [349] train-error:0.035410
## [350] train-error:0.035022
## [351] train-error:0.034634
## [352] train-error:0.035410
## [353] train-error:0.036185
## [354] train-error:0.035151
## [355] train-error:0.036185
## [356] train-error:0.035539
## [357] train-error:0.035539
## [358] train-error:0.035022
## [359] train-error:0.034764
## [360] train-error:0.034893
## [361] train-error:0.034634
## [362] train-error:0.034117
## [363] train-error:0.034376
## [364] train-error:0.034505
## [365] train-error:0.034764
## [366] train-error:0.034376
## [367] train-error:0.034117
## [368] train-error:0.034634
## [369] train-error:0.034634
## [370] train-error:0.034376
## [371] train-error:0.034376
## [372] train-error:0.034634
## [373] train-error:0.034376
## [374] train-error:0.034117
## [375] train-error:0.034117
## [376] train-error:0.033859
## [377] train-error:0.033730
## [378] train-error:0.033342
## [379] train-error:0.033342
## [380] train-error:0.033730
## [381] train-error:0.034117
## [382] train-error:0.033730
## [383] train-error:0.033213
## [384] train-error:0.033600
## [385] train-error:0.034247
## [386] train-error:0.033471
## [387] train-error:0.033730
## [388] train-error:0.033213
## [389] train-error:0.033730
## [390] train-error:0.033213
## [391] train-error:0.032567
## [392] train-error:0.032954
## [393] train-error:0.032696
## [394] train-error:0.032825
## [395] train-error:0.032437
## [396] train-error:0.032437
## [397] train-error:0.032437
## [398] train-error:0.032696
## [399] train-error:0.032954
## [400] train-error:0.032696
## [401] train-error:0.032567
## [402] train-error:0.032437
## [403] train-error:0.032308
## [404] train-error:0.032308
## [405] train-error:0.032437
## [406] train-error:0.032567
## [407] train-error:0.032437
## [408] train-error:0.032437
## [409] train-error:0.032437
## [410] train-error:0.032437
## [411] train-error:0.032954
## [412] train-error:0.032954
## [413] train-error:0.032567
## [414] train-error:0.032437
## [415] train-error:0.032179
## [416] train-error:0.031920
## [417] train-error:0.032050
## [418] train-error:0.031533
## [419] train-error:0.031403
## [420] train-error:0.031791
## [421] train-error:0.031920
## [422] train-error:0.031274
## [423] train-error:0.031533
## [424] train-error:0.031662
## [425] train-error:0.031533
## [426] train-error:0.032179
## [427] train-error:0.031920
## [428] train-error:0.031920
## [429] train-error:0.031662
## [430] train-error:0.031533
## [431] train-error:0.031274
## [432] train-error:0.030757
## [433] train-error:0.030757
## [434] train-error:0.030887
## [435] train-error:0.030757
## [436] train-error:0.031145
## [437] train-error:0.030887
## [438] train-error:0.030370
## [439] train-error:0.030757
## [440] train-error:0.030499
## [441] train-error:0.030111
## [442] train-error:0.029982
## [443] train-error:0.030240
## [444] train-error:0.030240
## [445] train-error:0.030240
## [446] train-error:0.029982
## [447] train-error:0.029723
## [448] train-error:0.030240
## [449] train-error:0.030111
## [450] train-error:0.030111
## [451] train-error:0.030370
## [452] train-error:0.030111
## [453] train-error:0.030370
## [454] train-error:0.030111
## [455] train-error:0.030111
## [456] train-error:0.031016
## [457] train-error:0.030499
## [458] train-error:0.031016
## [459] train-error:0.031016
## [460] train-error:0.031016
## [461] train-error:0.031016
## [462] train-error:0.030887
## [463] train-error:0.030757
## [464] train-error:0.031016
## [465] train-error:0.031016
## [466] train-error:0.030240
## [467] train-error:0.029853
## [468] train-error:0.030240
## [469] train-error:0.031145
## [470] train-error:0.031145
## [471] train-error:0.031145
## [472] train-error:0.031145
## [473] train-error:0.030628
## [474] train-error:0.030240
## [475] train-error:0.030111
## [476] train-error:0.030111
## [477] train-error:0.030111
## [478] train-error:0.030111
## [479] train-error:0.030111
## [480] train-error:0.031016
## [481] train-error:0.030887
## [482] train-error:0.031016
## [483] train-error:0.030887
## [484] train-error:0.029982
## [485] train-error:0.030370
## [486] train-error:0.030628
## [487] train-error:0.030628
## [488] train-error:0.031145
## [489] train-error:0.030370
## [490] train-error:0.030370
## [491] train-error:0.030240
## [492] train-error:0.030240
## [493] train-error:0.030499
## [494] train-error:0.030240
## [495] train-error:0.030628
## [496] train-error:0.029853
## [497] train-error:0.030240
## [498] train-error:0.030111
## [499] train-error:0.030111
## [500] train-error:0.030111
## [501] train-error:0.030111
## [502] train-error:0.030111
## [503] train-error:0.029982
## [504] train-error:0.029594
## [505] train-error:0.029853
## [506] train-error:0.029853
## [507] train-error:0.029853
## [508] train-error:0.030111
## [509] train-error:0.029853
## [510] train-error:0.029594
## [511] train-error:0.029723
## [512] train-error:0.029723
## [513] train-error:0.029853
## [514] train-error:0.029465
## [515] train-error:0.029982
## [516] train-error:0.029336
## [517] train-error:0.029465
## [518] train-error:0.029336
## [519] train-error:0.029336
## [520] train-error:0.029336
## [521] train-error:0.029336
## [522] train-error:0.029465
## [523] train-error:0.029336
## [524] train-error:0.029853
## [525] train-error:0.029982
## [526] train-error:0.029853
## [527] train-error:0.030111
## [528] train-error:0.029723
## [529] train-error:0.029594
## [530] train-error:0.029336
## [531] train-error:0.029336
## [532] train-error:0.029077
## [533] train-error:0.029077
## [534] train-error:0.029336
## [535] train-error:0.029077
## [536] train-error:0.029207
## [537] train-error:0.029207
## [538] train-error:0.029465
## [539] train-error:0.029207
## [540] train-error:0.029594
## [541] train-error:0.029207
## [542] train-error:0.029336
## [543] train-error:0.029723
## [544] train-error:0.029465
## [545] train-error:0.029207
## [546] train-error:0.029594
## [547] train-error:0.029465
## [548] train-error:0.029853
## [549] train-error:0.029594
## [550] train-error:0.029594
## [551] train-error:0.029594
## [552] train-error:0.029723
## [553] train-error:0.029594
## [554] train-error:0.029723
## [555] train-error:0.030111
## [556] train-error:0.029723
## [557] train-error:0.030111
## [558] train-error:0.029594
## [559] train-error:0.029853
## [560] train-error:0.029853
## [561] train-error:0.029853
## [562] train-error:0.029853
## [563] train-error:0.029853
## [564] train-error:0.029853
## [565] train-error:0.029723
## [566] train-error:0.029982
## [567] train-error:0.029594
## [568] train-error:0.029465
## [569] train-error:0.029336
## [570] train-error:0.029336
## [571] train-error:0.029077
## [572] train-error:0.029207
## [573] train-error:0.028690
## [574] train-error:0.028948
## [575] train-error:0.028948
## [576] train-error:0.028819
## [577] train-error:0.028560
## [578] train-error:0.028431
## [579] train-error:0.028302
## [580] train-error:0.028302
## [581] train-error:0.028690
## [582] train-error:0.028560
## [583] train-error:0.028690
## [584] train-error:0.028302
## [585] train-error:0.028173
## [586] train-error:0.028560
## [587] train-error:0.028173
## [588] train-error:0.028043
## [589] train-error:0.028043
## [590] train-error:0.028043
## [591] train-error:0.028302
## [592] train-error:0.028173
## [593] train-error:0.028431
## [594] train-error:0.028431
## [595] train-error:0.028173
## [596] train-error:0.028173
## [597] train-error:0.028043
## [598] train-error:0.028173
## [599] train-error:0.028173
## [600] train-error:0.028173
## [601] train-error:0.028173
## [602] train-error:0.028302
## [603] train-error:0.028431
## [604] train-error:0.027914
## [605] train-error:0.027914
## [606] train-error:0.027785
## [607] train-error:0.027785
## [608] train-error:0.027785
## [609] train-error:0.027785
## [610] train-error:0.027526
## [611] train-error:0.028173
## [612] train-error:0.028173
## [613] train-error:0.028302
## [614] train-error:0.027914
## [615] train-error:0.028173
## [616] train-error:0.027656
## [617] train-error:0.027526
## [618] train-error:0.028173
## [619] train-error:0.028173
## [620] train-error:0.027785
## [621] train-error:0.028043
## [622] train-error:0.027914
## [623] train-error:0.027914
## [624] train-error:0.027914
## [625] train-error:0.027656
## [626] train-error:0.027139
## [627] train-error:0.027139
## [628] train-error:0.026622
## [629] train-error:0.027010
## [630] train-error:0.026493
## [631] train-error:0.026493
## [632] train-error:0.026880
## [633] train-error:0.027010
## [634] train-error:0.026751
## [635] train-error:0.026363
## [636] train-error:0.026493
## [637] train-error:0.026622
## [638] train-error:0.026751
## [639] train-error:0.026880
## [640] train-error:0.027139
## [641] train-error:0.027139
## [642] train-error:0.026622
## [643] train-error:0.026622
## [644] train-error:0.026622
## [645] train-error:0.026622
## [646] train-error:0.026622
## [647] train-error:0.026622
## [648] train-error:0.026880
## [649] train-error:0.027397
## [650] train-error:0.027010
## [651] train-error:0.026880
## [652] train-error:0.026880
## [653] train-error:0.026622
## [654] train-error:0.026622
## [655] train-error:0.026622
## [656] train-error:0.026493
## [657] train-error:0.026493
## [658] train-error:0.026363
## [659] train-error:0.026363
## [660] train-error:0.026363
## [661] train-error:0.027139
## [662] train-error:0.026880
## [663] train-error:0.027139
## [664] train-error:0.026880
## [665] train-error:0.026363
## [666] train-error:0.026622
## [667] train-error:0.026622
## [668] train-error:0.026622
## [669] train-error:0.027010
## [670] train-error:0.026751
## [671] train-error:0.026880
## [672] train-error:0.026234
## [673] train-error:0.025976
## [674] train-error:0.026751
## [675] train-error:0.026751
## [676] train-error:0.025976
## [677] train-error:0.026105
## [678] train-error:0.026363
## [679] train-error:0.026751
## [680] train-error:0.026751
## [681] train-error:0.026493
## [682] train-error:0.026880
## [683] train-error:0.026622
## [684] train-error:0.026880
## [685] train-error:0.026751
## [686] train-error:0.026622
## [687] train-error:0.026493
## [688] train-error:0.026622
## [689] train-error:0.026622
## [690] train-error:0.026234
## [691] train-error:0.025588
## [692] train-error:0.025588
## [693] train-error:0.025717
## [694] train-error:0.025846
## [695] train-error:0.025717
## [696] train-error:0.025976
## [697] train-error:0.026105
## [698] train-error:0.026234
## [699] train-error:0.026234
## [700] train-error:0.026363
## [701] train-error:0.025976
## [702] train-error:0.026234
## [703] train-error:0.026105
## [704] train-error:0.026234
## [705] train-error:0.026234
## [706] train-error:0.025846
## [707] train-error:0.026622
## [708] train-error:0.026363
## [709] train-error:0.026234
## [710] train-error:0.025846
## [711] train-error:0.025846
## [712] train-error:0.025846
## [713] train-error:0.026105
## [714] train-error:0.026105
## [715] train-error:0.026105
## [716] train-error:0.025976
## [717] train-error:0.025976
## [718] train-error:0.025846
## [719] train-error:0.026105
## [720] train-error:0.025717
## [721] train-error:0.026105
## [722] train-error:0.025717
## [723] train-error:0.025976
## [724] train-error:0.025846
## [725] train-error:0.025846
## [726] train-error:0.025846
## [727] train-error:0.025976
## [728] train-error:0.025976
## [729] train-error:0.025846
## [730] train-error:0.025717
## [731] train-error:0.025717
## [732] train-error:0.025717
## [733] train-error:0.025717
## [734] train-error:0.025717
## [735] train-error:0.025717
## [736] train-error:0.025717
## [737] train-error:0.025588
## [738] train-error:0.025459
## [739] train-error:0.025459
## [740] train-error:0.025330
## [741] train-error:0.025071
## [742] train-error:0.024813
## [743] train-error:0.024813
## [744] train-error:0.024813
## [745] train-error:0.025071
## [746] train-error:0.025200
## [747] train-error:0.025200
## [748] train-error:0.025200
## [749] train-error:0.025071
## [750] train-error:0.025200
## [751] train-error:0.025071
## [752] train-error:0.024942
## [753] train-error:0.024942
## [754] train-error:0.024942
## [755] train-error:0.024942
## [756] train-error:0.024942
## [757] train-error:0.024683
## [758] train-error:0.024942
## [759] train-error:0.025071
## [760] train-error:0.024425
## [761] train-error:0.024296
## [762] train-error:0.024166
## [763] train-error:0.024813
## [764] train-error:0.024813
## [765] train-error:0.024813
## [766] train-error:0.025200
## [767] train-error:0.025200
## [768] train-error:0.024942
## [769] train-error:0.025071
## [770] train-error:0.025071
## [771] train-error:0.024813
## [772] train-error:0.024425
## [773] train-error:0.024425
## [774] train-error:0.024166
## [775] train-error:0.024296
## [776] train-error:0.024296
## [777] train-error:0.024813
## [778] train-error:0.024942
## [779] train-error:0.024942
## [780] train-error:0.025200
## [781] train-error:0.024683
## [782] train-error:0.025588
## [783] train-error:0.025330
## [784] train-error:0.025717
## [785] train-error:0.025459
## [786] train-error:0.025846
## [787] train-error:0.025459
## [788] train-error:0.025330
## [789] train-error:0.025330
## [790] train-error:0.024683
## [791] train-error:0.024942
## [792] train-error:0.024813
## [793] train-error:0.024683
## [794] train-error:0.024683
## [795] train-error:0.024425
## [796] train-error:0.024683
## [797] train-error:0.024683
## [798] train-error:0.024683
## [799] train-error:0.024683
## [800] train-error:0.024813
## [801] train-error:0.024942
## [802] train-error:0.024942
## [803] train-error:0.025071
## [804] train-error:0.024942
## [805] train-error:0.024942
## [806] train-error:0.024942
## [807] train-error:0.024554
## [808] train-error:0.024683
## [809] train-error:0.024683
## [810] train-error:0.024942
## [811] train-error:0.025071
## [812] train-error:0.024942
## [813] train-error:0.025071
## [814] train-error:0.025071
## [815] train-error:0.025071
## [816] train-error:0.024942
## [817] train-error:0.025200
## [818] train-error:0.025717
## [819] train-error:0.025459
## [820] train-error:0.025588
## [821] train-error:0.025588
## [822] train-error:0.025330
## [823] train-error:0.025071
## [824] train-error:0.025200
## [825] train-error:0.025459
## [826] train-error:0.025459
## [827] train-error:0.025200
## [828] train-error:0.025071
## [829] train-error:0.025200
## [830] train-error:0.024425
## [831] train-error:0.024166
## [832] train-error:0.024166
## [833] train-error:0.024166
## [834] train-error:0.024166
## [835] train-error:0.024296
## [836] train-error:0.024166
## [837] train-error:0.024166
## [838] train-error:0.024425
## [839] train-error:0.024554
## [840] train-error:0.024425
## [841] train-error:0.024037
## [842] train-error:0.024296
## [843] train-error:0.024296
## [844] train-error:0.024037
## [845] train-error:0.024037
## [846] train-error:0.023908
## [847] train-error:0.024037
## [848] train-error:0.024037
## [849] train-error:0.023908
## [850] train-error:0.024296
## [851] train-error:0.024554
## [852] train-error:0.024296
## [853] train-error:0.024166
## [854] train-error:0.024166
## [855] train-error:0.023908
## [856] train-error:0.023908
## [857] train-error:0.023520
## [858] train-error:0.023520
## [859] train-error:0.023908
## [860] train-error:0.023779
## [861] train-error:0.024037
## [862] train-error:0.023908
## [863] train-error:0.024166
## [864] train-error:0.024554
## [865] train-error:0.024296
## [866] train-error:0.024166
## [867] train-error:0.023650
## [868] train-error:0.023779
## [869] train-error:0.023262
## [870] train-error:0.023391
## [871] train-error:0.023520
## [872] train-error:0.023520
## [873] train-error:0.023650
## [874] train-error:0.023650
## [875] train-error:0.023779
## [876] train-error:0.023520
## [877] train-error:0.023650
## [878] train-error:0.023262
## [879] train-error:0.023520
## [880] train-error:0.023391
## [881] train-error:0.023391
## [882] train-error:0.023520
## [883] train-error:0.023262
## [884] train-error:0.023133
## [885] train-error:0.023779
## [886] train-error:0.024037
## [887] train-error:0.023520
## [888] train-error:0.023391
## [889] train-error:0.023391
## [890] train-error:0.023520
## [891] train-error:0.023779
## [892] train-error:0.023779
## [893] train-error:0.023650
## [894] train-error:0.023650
## [895] train-error:0.023908
## [896] train-error:0.023650
## [897] train-error:0.023779
## [898] train-error:0.023650
## [899] train-error:0.023779
## [900] train-error:0.024037
## [901] train-error:0.023779
## [902] train-error:0.023650
## [903] train-error:0.023650
## [904] train-error:0.023908
## [905] train-error:0.023650
## [906] train-error:0.023650
## [907] train-error:0.023650
## [908] train-error:0.023779
## [909] train-error:0.023779
## [910] train-error:0.023779
## [911] train-error:0.024166
## [912] train-error:0.023908
## [913] train-error:0.023779
## [914] train-error:0.023779
## [915] train-error:0.024554
## [916] train-error:0.024554
## [917] train-error:0.024296
## [918] train-error:0.024166
## [919] train-error:0.023908
## [920] train-error:0.024166
## [921] train-error:0.024296
## [922] train-error:0.024425
## [923] train-error:0.023908
## [924] train-error:0.023908
## [925] train-error:0.023520
## [926] train-error:0.023391
## [927] train-error:0.023133
## [928] train-error:0.023520
## [929] train-error:0.023908
## [930] train-error:0.024166
## [931] train-error:0.023650
## [932] train-error:0.023779
## [933] train-error:0.024166
## [934] train-error:0.023262
## [935] train-error:0.023003
## [936] train-error:0.023650
## [937] train-error:0.023391
## [938] train-error:0.023391
## [939] train-error:0.023520
## [940] train-error:0.023520
## [941] train-error:0.023650
## [942] train-error:0.023908
## [943] train-error:0.023520
## [944] train-error:0.023520
## [945] train-error:0.023908
## [946] train-error:0.023908
## [947] train-error:0.024166
## [948] train-error:0.024037
## [949] train-error:0.024037
## [950] train-error:0.024166
## [951] train-error:0.024166
## [952] train-error:0.024166
## [953] train-error:0.023779
## [954] train-error:0.023908
## [955] train-error:0.023779
## [956] train-error:0.023262
## [957] train-error:0.023520
## [958] train-error:0.023520
## [959] train-error:0.023520
## [960] train-error:0.023520
## [961] train-error:0.023391
## [962] train-error:0.023391
## [963] train-error:0.023650
## [964] train-error:0.023391
## [965] train-error:0.023391
## [966] train-error:0.023391
## [967] train-error:0.023391
## [968] train-error:0.023391
## [969] train-error:0.023262
## [970] train-error:0.023133
## [971] train-error:0.022874
## [972] train-error:0.023003
## [973] train-error:0.023262
## [974] train-error:0.023003
## [975] train-error:0.023003
## [976] train-error:0.023003
## [977] train-error:0.023003
## [978] train-error:0.023133
## [979] train-error:0.023133
## [980] train-error:0.023262
## [981] train-error:0.023133
## [982] train-error:0.023262
## [983] train-error:0.023520
## [984] train-error:0.023262
## [985] train-error:0.022874
## [986] train-error:0.022745
## [987] train-error:0.022745
## [988] train-error:0.023133
## [989] train-error:0.022874
## [990] train-error:0.022874
## [991] train-error:0.022874
## [992] train-error:0.023133
## [993] train-error:0.023133
## [994] train-error:0.022616
## [995] train-error:0.023003
## [996] train-error:0.023003
## [997] train-error:0.023133
## [998] train-error:0.023133
## [999] train-error:0.023133
## [1000] train-error:0.023133
tree4.predict = predict(tree4.train, newdata=data.matrix(trainingset[,-31]),type="class")
tree4.predict<-round(tree4.predict,0)
c4<-confusionMatrix(factor(tree4.predict),factor(trainingset$Class))
c4
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 3316 77
## 1 102 4243
##
## Accuracy : 0.9769
## 95% CI : (0.9733, 0.9801)
## No Information Rate : 0.5583
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.9531
##
## Mcnemar's Test P-Value : 0.07284
##
## Sensitivity : 0.9702
## Specificity : 0.9822
## Pos Pred Value : 0.9773
## Neg Pred Value : 0.9765
## Prevalence : 0.4417
## Detection Rate : 0.4285
## Detection Prevalence : 0.4385
## Balanced Accuracy : 0.9762
##
## 'Positive' Class : 0
##
Our latest model has the highest accaracy score <0.97. Let’s see if other model categories can do better !
First let’s recall that regression model uses a linear combination of the predictors
\[ \eta({\bf x}) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_{p - 1} x_{p - 1} \] Like ordinary linear regression,we will run glm model and perform common hypothesis testing like the Wald-test using P-value.
\[ H_0: \beta_j = 0 \quad \text{vs} \quad H_1: \beta_j \neq 0 \]
GLM <- glm(Class ~.,family=binomial(logit), data=trainingset)
summary(GLM)
##
## Call:
## glm(formula = Class ~ ., family = binomial(logit), data = trainingset)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2301 -0.0445 0.0000 0.1548 3.1477
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.46835 0.66694 -9.699 < 2e-16 ***
## HavingIP1 1.82176 0.17400 10.470 < 2e-16 ***
## LongURL0 -0.61566 0.56618 -1.087 0.276865
## LongURL-1 -0.17935 0.21185 -0.847 0.397223
## ShortURL-1 1.35503 0.37405 3.623 0.000292 ***
## Symbol-1 -0.37081 0.23079 -1.607 0.108124
## ddRedirecting1 0.40411 0.46408 0.871 0.383871
## PrefixSuffix1 18.19612 259.35659 0.070 0.944067
## SubDomain0 -0.01111 0.14386 -0.077 0.938438
## SubDomain1 1.46232 0.14940 9.788 < 2e-16 ***
## HTTPS1 3.31983 0.13442 24.698 < 2e-16 ***
## HTTPS0 -2.14136 0.38437 -5.571 2.53e-08 ***
## DomainRegLen1 -0.54259 0.15414 -3.520 0.000431 ***
## Favicon-1 0.74898 0.49198 1.522 0.127919
## Port-1 -0.56953 0.44727 -1.273 0.202888
## HTTPsToken1 -1.23516 0.31518 -3.919 8.89e-05 ***
## RequestURL-1 -0.31667 0.14517 -2.181 0.029157 *
## AnchorURL0 5.08621 0.32591 15.606 < 2e-16 ***
## AnchorURL1 7.02454 0.37321 18.822 < 2e-16 ***
## LinksInTag-1 -1.18591 0.16409 -7.227 4.94e-13 ***
## LinksInTag0 0.33349 0.17034 1.958 0.050255 .
## SFH1 1.10353 0.20767 5.314 1.07e-07 ***
## SFH0 1.40274 0.25842 5.428 5.70e-08 ***
## SubEmail1 0.03552 0.26835 0.132 0.894694
## AbnormalURL1 -0.65947 0.33995 -1.940 0.052394 .
## Redirect1 -1.04356 0.24849 -4.200 2.67e-05 ***
## OnMouseover-1 -0.39531 0.37075 -1.066 0.286305
## RightClick-1 -0.46711 0.47480 -0.984 0.325209
## PopUp-1 0.32236 0.47623 0.677 0.498463
## Iframe-1 0.73470 0.43783 1.678 0.093336 .
## AgeOfDomain1 -0.14691 0.12718 -1.155 0.248044
## DNSRecord1 1.61293 0.17201 9.377 < 2e-16 ***
## WebTraffic0 -1.73844 0.19543 -8.896 < 2e-16 ***
## WebTraffic1 0.60989 0.17094 3.568 0.000360 ***
## PageRank1 0.11900 0.14563 0.817 0.413842
## GoogleIndex-1 -1.27195 0.16162 -7.870 3.55e-15 ***
## LinkToPage0 -1.81833 0.16890 -10.766 < 2e-16 ***
## LinkToPage-1 -1.43320 0.28010 -5.117 3.11e-07 ***
## StatsReport1 0.64961 0.23788 2.731 0.006316 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10621.8 on 7737 degrees of freedom
## Residual deviance: 2119.7 on 7699 degrees of freedom
## AIC: 2197.7
##
## Number of Fisher Scoring iterations: 18
1 - (GLM$deviance/GLM$null.deviance)
## [1] 0.800435
The results begin by reporting the distribution of residuals.
The table of “coefficients” presents the results of our regression analysis. We are particularly interested in columns 1, 2 and 5: the variable name, the regression coefficient of the variable, and whether the coefficient is significantly different from zero.
Instead of removing by hand variable wich are above 0.05 threshold, we use Stepvise Variable Selection so as to determine the set of variables that results with the minimum AIC and compare it with the variable underlined with decision tree.
WES.STEP <- step(GLM, direction="both", k=1)
## Start: AIC=2158.73
## Class ~ HavingIP + LongURL + ShortURL + Symbol + ddRedirecting +
## PrefixSuffix + SubDomain + HTTPS + DomainRegLen + Favicon +
## Port + HTTPsToken + RequestURL + AnchorURL + LinksInTag +
## SFH + SubEmail + AbnormalURL + Redirect + OnMouseover + RightClick +
## PopUp + Iframe + AgeOfDomain + DNSRecord + WebTraffic + PageRank +
## GoogleIndex + LinkToPage + StatsReport
##
## Df Deviance AIC
## - SubEmail 1 2119.8 2157.8
## - PopUp 1 2120.2 2158.2
## - LongURL 2 2121.3 2158.3
## - PageRank 1 2120.4 2158.4
## - ddRedirecting 1 2120.5 2158.5
## - RightClick 1 2120.7 2158.7
## <none> 2119.7 2158.7
## - OnMouseover 1 2120.9 2158.9
## - AgeOfDomain 1 2121.1 2159.1
## - Port 1 2121.4 2159.4
## - Favicon 1 2122.0 2160.0
## - Symbol 1 2122.3 2160.3
## - Iframe 1 2122.6 2160.6
## - AbnormalURL 1 2123.6 2161.6
## - RequestURL 1 2124.5 2162.5
## - StatsReport 1 2127.2 2165.2
## - DomainRegLen 1 2132.2 2170.2
## - ShortURL 1 2134.3 2172.3
## - HTTPsToken 1 2135.8 2173.8
## - Redirect 1 2137.8 2175.8
## - SFH 2 2175.3 2212.3
## - GoogleIndex 1 2182.8 2220.8
## - DNSRecord 1 2212.1 2250.1
## - HavingIP 1 2237.1 2275.1
## - LinkToPage 2 2246.8 2283.8
## - SubDomain 2 2251.5 2288.5
## - LinksInTag 2 2269.6 2306.6
## - PrefixSuffix 1 2310.6 2348.6
## - WebTraffic 2 2411.2 2448.2
## - AnchorURL 2 3035.4 3072.4
## - HTTPS 2 3113.7 3150.7
##
## Step: AIC=2157.75
## Class ~ HavingIP + LongURL + ShortURL + Symbol + ddRedirecting +
## PrefixSuffix + SubDomain + HTTPS + DomainRegLen + Favicon +
## Port + HTTPsToken + RequestURL + AnchorURL + LinksInTag +
## SFH + AbnormalURL + Redirect + OnMouseover + RightClick +
## PopUp + Iframe + AgeOfDomain + DNSRecord + WebTraffic + PageRank +
## GoogleIndex + LinkToPage + StatsReport
##
## Df Deviance AIC
## - PopUp 1 2120.2 2157.2
## - LongURL 2 2121.3 2157.3
## - PageRank 1 2120.4 2157.4
## - ddRedirecting 1 2120.5 2157.5
## - RightClick 1 2120.7 2157.7
## <none> 2119.8 2157.8
## - OnMouseover 1 2120.9 2157.9
## - AgeOfDomain 1 2121.1 2158.1
## + SubEmail 1 2119.7 2158.7
## - Favicon 1 2122.0 2159.0
## - Port 1 2122.1 2159.1
## - Symbol 1 2122.3 2159.3
## - Iframe 1 2122.6 2159.6
## - AbnormalURL 1 2123.6 2160.6
## - RequestURL 1 2124.5 2161.5
## - StatsReport 1 2127.2 2164.2
## - DomainRegLen 1 2132.2 2169.2
## - ShortURL 1 2134.3 2171.3
## - HTTPsToken 1 2135.8 2172.8
## - Redirect 1 2137.9 2174.9
## - SFH 2 2175.3 2211.3
## - GoogleIndex 1 2182.9 2219.9
## - DNSRecord 1 2212.2 2249.2
## - HavingIP 1 2237.3 2274.3
## - LinkToPage 2 2247.0 2283.0
## - SubDomain 2 2251.9 2287.9
## - LinksInTag 2 2271.3 2307.3
## - PrefixSuffix 1 2310.7 2347.7
## - WebTraffic 2 2411.3 2447.3
## - AnchorURL 2 3035.8 3071.8
## - HTTPS 2 3114.0 3150.0
##
## Step: AIC=2157.22
## Class ~ HavingIP + LongURL + ShortURL + Symbol + ddRedirecting +
## PrefixSuffix + SubDomain + HTTPS + DomainRegLen + Favicon +
## Port + HTTPsToken + RequestURL + AnchorURL + LinksInTag +
## SFH + AbnormalURL + Redirect + OnMouseover + RightClick +
## Iframe + AgeOfDomain + DNSRecord + WebTraffic + PageRank +
## GoogleIndex + LinkToPage + StatsReport
##
## Df Deviance AIC
## - LongURL 2 2121.7 2156.7
## - PageRank 1 2120.9 2156.9
## - OnMouseover 1 2121.0 2157.0
## - ddRedirecting 1 2121.1 2157.1
## - RightClick 1 2121.1 2157.1
## <none> 2120.2 2157.2
## - AgeOfDomain 1 2121.6 2157.6
## + PopUp 1 2119.8 2157.8
## + SubEmail 1 2120.2 2158.2
## - Symbol 1 2122.8 2158.8
## - Port 1 2122.9 2158.9
## - Iframe 1 2123.2 2159.2
## - AbnormalURL 1 2124.5 2160.5
## - RequestURL 1 2125.0 2161.0
## - StatsReport 1 2127.8 2163.8
## - Favicon 1 2131.5 2167.5
## - DomainRegLen 1 2132.8 2168.8
## - ShortURL 1 2134.9 2170.9
## - HTTPsToken 1 2136.0 2172.0
## - Redirect 1 2138.1 2174.1
## - SFH 2 2175.8 2210.8
## - GoogleIndex 1 2183.1 2219.1
## - DNSRecord 1 2212.8 2248.8
## - HavingIP 1 2237.8 2273.8
## - LinkToPage 2 2248.4 2283.4
## - SubDomain 2 2253.1 2288.1
## - LinksInTag 2 2276.2 2311.2
## - PrefixSuffix 1 2311.1 2347.1
## - WebTraffic 2 2411.3 2446.3
## - AnchorURL 2 3036.3 3071.3
## - HTTPS 2 3114.0 3149.0
##
## Step: AIC=2156.73
## Class ~ HavingIP + ShortURL + Symbol + ddRedirecting + PrefixSuffix +
## SubDomain + HTTPS + DomainRegLen + Favicon + Port + HTTPsToken +
## RequestURL + AnchorURL + LinksInTag + SFH + AbnormalURL +
## Redirect + OnMouseover + RightClick + Iframe + AgeOfDomain +
## DNSRecord + WebTraffic + PageRank + GoogleIndex + LinkToPage +
## StatsReport
##
## Df Deviance AIC
## - ddRedirecting 1 2122.2 2156.2
## - OnMouseover 1 2122.4 2156.4
## - RightClick 1 2122.6 2156.6
## - AgeOfDomain 1 2122.6 2156.6
## <none> 2121.7 2156.7
## - PageRank 1 2122.9 2156.9
## + LongURL 2 2120.2 2157.2
## + PopUp 1 2121.3 2157.3
## + SubEmail 1 2121.7 2157.7
## - Symbol 1 2124.2 2158.2
## - Iframe 1 2124.7 2158.7
## - Port 1 2124.7 2158.7
## - AbnormalURL 1 2125.6 2159.6
## - RequestURL 1 2127.2 2161.2
## - StatsReport 1 2129.4 2163.4
## - Favicon 1 2133.0 2167.0
## - DomainRegLen 1 2134.7 2168.7
## - ShortURL 1 2135.9 2169.9
## - HTTPsToken 1 2137.1 2171.1
## - Redirect 1 2139.6 2173.6
## - GoogleIndex 1 2185.5 2219.5
## - SFH 2 2188.9 2221.9
## - DNSRecord 1 2214.2 2248.2
## - HavingIP 1 2241.3 2275.3
## - LinkToPage 2 2251.2 2284.2
## - SubDomain 2 2253.8 2286.8
## - LinksInTag 2 2279.4 2312.4
## - PrefixSuffix 1 2316.2 2350.2
## - WebTraffic 2 2411.3 2444.3
## - AnchorURL 2 3045.8 3078.8
## - HTTPS 2 3119.6 3152.6
##
## Step: AIC=2156.2
## Class ~ HavingIP + ShortURL + Symbol + PrefixSuffix + SubDomain +
## HTTPS + DomainRegLen + Favicon + Port + HTTPsToken + RequestURL +
## AnchorURL + LinksInTag + SFH + AbnormalURL + Redirect + OnMouseover +
## RightClick + Iframe + AgeOfDomain + DNSRecord + WebTraffic +
## PageRank + GoogleIndex + LinkToPage + StatsReport
##
## Df Deviance AIC
## - OnMouseover 1 2122.8 2155.8
## - RightClick 1 2123.1 2156.1
## - AgeOfDomain 1 2123.1 2156.1
## <none> 2122.2 2156.2
## - PageRank 1 2123.4 2156.4
## + PopUp 1 2121.7 2156.7
## + ddRedirecting 1 2121.7 2156.7
## + LongURL 2 2121.1 2157.1
## + SubEmail 1 2122.2 2157.2
## - Symbol 1 2124.7 2157.7
## - Iframe 1 2125.1 2158.1
## - Port 1 2125.2 2158.2
## - AbnormalURL 1 2125.9 2158.9
## - RequestURL 1 2127.9 2160.9
## - StatsReport 1 2129.7 2162.7
## - Favicon 1 2133.4 2166.4
## - DomainRegLen 1 2135.0 2168.0
## - ShortURL 1 2136.8 2169.8
## - HTTPsToken 1 2137.3 2170.3
## - Redirect 1 2145.0 2178.0
## - SFH 2 2189.4 2221.4
## - GoogleIndex 1 2189.1 2222.1
## - DNSRecord 1 2220.1 2253.1
## - HavingIP 1 2246.8 2279.8
## - LinkToPage 2 2253.4 2285.4
## - SubDomain 2 2253.8 2285.8
## - LinksInTag 2 2279.9 2311.9
## - PrefixSuffix 1 2316.4 2349.4
## - WebTraffic 2 2411.6 2443.6
## - AnchorURL 2 3053.1 3085.1
## - HTTPS 2 3120.6 3152.6
##
## Step: AIC=2155.75
## Class ~ HavingIP + ShortURL + Symbol + PrefixSuffix + SubDomain +
## HTTPS + DomainRegLen + Favicon + Port + HTTPsToken + RequestURL +
## AnchorURL + LinksInTag + SFH + AbnormalURL + Redirect + RightClick +
## Iframe + AgeOfDomain + DNSRecord + WebTraffic + PageRank +
## GoogleIndex + LinkToPage + StatsReport
##
## Df Deviance AIC
## - RightClick 1 2123.5 2155.5
## - AgeOfDomain 1 2123.6 2155.6
## <none> 2122.8 2155.8
## - PageRank 1 2124.0 2156.0
## + OnMouseover 1 2122.2 2156.2
## + ddRedirecting 1 2122.4 2156.4
## + PopUp 1 2122.6 2156.6
## + SubEmail 1 2122.7 2156.7
## + LongURL 2 2121.7 2156.7
## - Iframe 1 2125.1 2157.1
## - Symbol 1 2125.4 2157.4
## - Port 1 2126.0 2158.0
## - AbnormalURL 1 2126.0 2158.0
## - RequestURL 1 2128.6 2160.6
## - StatsReport 1 2130.5 2162.5
## - Favicon 1 2133.8 2165.8
## - DomainRegLen 1 2135.4 2167.4
## - ShortURL 1 2137.4 2169.4
## - HTTPsToken 1 2138.4 2170.4
## - Redirect 1 2145.2 2177.2
## - SFH 2 2190.1 2221.1
## - GoogleIndex 1 2189.7 2221.7
## - DNSRecord 1 2221.4 2253.4
## - HavingIP 1 2248.5 2280.5
## - SubDomain 2 2253.8 2284.8
## - LinkToPage 2 2256.5 2287.5
## - LinksInTag 2 2280.6 2311.6
## - PrefixSuffix 1 2317.6 2349.6
## - WebTraffic 2 2412.1 2443.1
## - AnchorURL 2 3054.0 3085.0
## - HTTPS 2 3124.7 3155.7
##
## Step: AIC=2155.52
## Class ~ HavingIP + ShortURL + Symbol + PrefixSuffix + SubDomain +
## HTTPS + DomainRegLen + Favicon + Port + HTTPsToken + RequestURL +
## AnchorURL + LinksInTag + SFH + AbnormalURL + Redirect + Iframe +
## AgeOfDomain + DNSRecord + WebTraffic + PageRank + GoogleIndex +
## LinkToPage + StatsReport
##
## Df Deviance AIC
## - AgeOfDomain 1 2124.3 2155.3
## <none> 2123.5 2155.5
## + RightClick 1 2122.8 2155.8
## - PageRank 1 2124.8 2155.8
## - Iframe 1 2125.1 2156.1
## + OnMouseover 1 2123.1 2156.1
## + ddRedirecting 1 2123.1 2156.1
## + PopUp 1 2123.4 2156.4
## + SubEmail 1 2123.5 2156.5
## + LongURL 2 2122.5 2156.5
## - Symbol 1 2126.1 2157.1
## - Port 1 2126.5 2157.5
## - AbnormalURL 1 2126.7 2157.7
## - RequestURL 1 2129.1 2160.1
## - StatsReport 1 2131.7 2162.7
## - Favicon 1 2134.4 2165.4
## - DomainRegLen 1 2136.6 2167.6
## - ShortURL 1 2138.0 2169.0
## - HTTPsToken 1 2139.6 2170.6
## - Redirect 1 2146.6 2177.6
## - SFH 2 2190.9 2220.9
## - GoogleIndex 1 2190.9 2221.9
## - DNSRecord 1 2222.0 2253.0
## - HavingIP 1 2248.7 2279.7
## - SubDomain 2 2255.1 2285.1
## - LinkToPage 2 2257.7 2287.7
## - LinksInTag 2 2282.0 2312.0
## - PrefixSuffix 1 2317.9 2348.9
## - WebTraffic 2 2413.5 2443.5
## - AnchorURL 2 3055.5 3085.5
## - HTTPS 2 3126.9 3156.9
##
## Step: AIC=2155.34
## Class ~ HavingIP + ShortURL + Symbol + PrefixSuffix + SubDomain +
## HTTPS + DomainRegLen + Favicon + Port + HTTPsToken + RequestURL +
## AnchorURL + LinksInTag + SFH + AbnormalURL + Redirect + Iframe +
## DNSRecord + WebTraffic + PageRank + GoogleIndex + LinkToPage +
## StatsReport
##
## Df Deviance AIC
## <none> 2124.3 2155.3
## + AgeOfDomain 1 2123.5 2155.5
## + RightClick 1 2123.6 2155.6
## - Iframe 1 2125.9 2155.9
## + ddRedirecting 1 2123.9 2155.9
## + OnMouseover 1 2124.0 2156.0
## - PageRank 1 2126.1 2156.1
## + PopUp 1 2124.2 2156.2
## + SubEmail 1 2124.3 2156.3
## + LongURL 2 2123.7 2156.7
## - Symbol 1 2126.9 2156.9
## - Port 1 2127.2 2157.2
## - AbnormalURL 1 2127.4 2157.4
## - RequestURL 1 2129.6 2159.6
## - StatsReport 1 2132.4 2162.4
## - Favicon 1 2134.9 2164.9
## - DomainRegLen 1 2137.2 2167.2
## - ShortURL 1 2138.6 2168.6
## - HTTPsToken 1 2140.5 2170.5
## - Redirect 1 2146.9 2176.9
## - SFH 2 2192.1 2221.1
## - GoogleIndex 1 2193.1 2223.1
## - DNSRecord 1 2222.8 2252.8
## - HavingIP 1 2249.2 2279.2
## - LinkToPage 2 2257.7 2286.7
## - SubDomain 2 2260.2 2289.2
## - LinksInTag 2 2284.0 2313.0
## - PrefixSuffix 1 2317.9 2347.9
## - WebTraffic 2 2420.4 2449.4
## - AnchorURL 2 3055.8 3084.8
## - HTTPS 2 3126.9 3155.9
summary(WES.STEP)
##
## Call:
## glm(formula = Class ~ HavingIP + ShortURL + Symbol + PrefixSuffix +
## SubDomain + HTTPS + DomainRegLen + Favicon + Port + HTTPsToken +
## RequestURL + AnchorURL + LinksInTag + SFH + AbnormalURL +
## Redirect + Iframe + DNSRecord + WebTraffic + PageRank + GoogleIndex +
## LinkToPage + StatsReport, family = binomial(logit), data = trainingset)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3326 -0.0432 0.0000 0.1536 3.1649
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.48947 0.53035 -12.236 < 2e-16 ***
## HavingIP1 1.84636 0.17138 10.773 < 2e-16 ***
## ShortURL-1 1.20373 0.32761 3.674 0.000239 ***
## Symbol-1 -0.36210 0.22710 -1.594 0.110826
## PrefixSuffix1 18.18982 259.75289 0.070 0.944172
## SubDomain0 -0.00312 0.14290 -0.022 0.982583
## SubDomain1 1.42462 0.14527 9.806 < 2e-16 ***
## HTTPS1 3.30811 0.13317 24.842 < 2e-16 ***
## HTTPS0 -2.09365 0.37455 -5.590 2.27e-08 ***
## DomainRegLen1 -0.54856 0.15369 -3.569 0.000358 ***
## Favicon-1 0.89764 0.28512 3.148 0.001642 **
## Port-1 -0.64348 0.38109 -1.689 0.091308 .
## HTTPsToken1 -1.14285 0.28434 -4.019 5.84e-05 ***
## RequestURL-1 -0.32761 0.14271 -2.296 0.021697 *
## AnchorURL0 5.08489 0.32369 15.709 < 2e-16 ***
## AnchorURL1 7.00139 0.37002 18.922 < 2e-16 ***
## LinksInTag-1 -1.21232 0.16291 -7.442 9.93e-14 ***
## LinksInTag0 0.32843 0.16956 1.937 0.052759 .
## SFH1 1.17393 0.19381 6.057 1.39e-09 ***
## SFH0 1.42952 0.25628 5.578 2.43e-08 ***
## AbnormalURL1 -0.54517 0.31358 -1.739 0.082115 .
## Redirect1 -1.08024 0.23123 -4.672 2.99e-06 ***
## Iframe-1 0.41720 0.33864 1.232 0.217946
## DNSRecord1 1.63773 0.16982 9.644 < 2e-16 ***
## WebTraffic0 -1.69961 0.19228 -8.839 < 2e-16 ***
## WebTraffic1 0.59798 0.16938 3.530 0.000415 ***
## PageRank1 0.18496 0.13933 1.328 0.184332
## GoogleIndex-1 -1.30378 0.15897 -8.201 2.38e-16 ***
## LinkToPage0 -1.83713 0.16704 -10.998 < 2e-16 ***
## LinkToPage-1 -1.44360 0.26838 -5.379 7.49e-08 ***
## StatsReport1 0.65986 0.23214 2.842 0.004476 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10621.8 on 7737 degrees of freedom
## Residual deviance: 2124.3 on 7707 degrees of freedom
## AIC: 2186.3
##
## Number of Fisher Scoring iterations: 18
1 - (WES.STEP$deviance/WES.STEP$null.deviance) # McFadden R^2
## [1] 0.800001
And we apply our regression model into our test data.
predict1_reg <- predict(WES.STEP,newdata=testset[,-31],type="response")
head(predict1_reg)
## 1 2 9 20 23 25
## 0.001517046 0.721022036 0.995673838 0.454112634 0.349650039 0.997354470
The object predict1.reg is a vector that holds the predicted Wesbrook outcomes in the test data. The values are probabilities between 0 to 1 (due to the argument type=‘response’).
Let us change the probability into categorical values between 0 to 1.
predict1_reg<-ifelse(predict1_reg>0.5, 1, 0)
head(predict1_reg)
## 1 2 9 20 23 25
## 0 1 1 0 0 1
\[ \hat{C}(x) = \begin{cases} 1 & \hat{p}(x) > 0.5 \\ 0 & \hat{p}(x) \leq 0.5 \end{cases} \]
c5<-confusionMatrix(factor(predict1_reg),factor(testset$Class))
c5
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1333 81
## 1 147 1755
##
## Accuracy : 0.9312
## 95% CI : (0.9221, 0.9396)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8603
##
## Mcnemar's Test P-Value : 1.672e-05
##
## Sensitivity : 0.9007
## Specificity : 0.9559
## Pos Pred Value : 0.9427
## Neg Pred Value : 0.9227
## Prevalence : 0.4463
## Detection Rate : 0.4020
## Detection Prevalence : 0.4264
## Balanced Accuracy : 0.9283
##
## 'Positive' Class : 0
##
We can notice that our regression model has good results and can compete with tree accuracy scores.
Neural network consists of a collection of elements that are highly interconnected and change a set of inputs to a set of desired outputs. The result of the change is dictated by the characteristics of the elements and the weights associated with the interconnections among them. A neural network directs an analysis of the information and provides a probability estimate that it matches with the data it has been trained to recognize. The neural system picks up the experience by training the system with both the input and output of the desired problem. The network configuration is refined until satisfactory results are obtained. The neural network gains experience over a period as it is being trained on the data related to the problem.
We use here a famous R package H2o in order to build a neuron model network.
Let us first initiate the environment of H2o and build some dataset.
h2o.init()
## Connection successful!
##
## R is connected to the H2O cluster:
## H2O cluster uptime: 1 days 21 hours
## H2O cluster timezone: Europe/Paris
## H2O data parsing timezone: UTC
## H2O cluster version: 3.30.0.2
## H2O cluster version age: 1 month and 6 days
## H2O cluster name: H2O_started_from_R_swp_amr953
## H2O cluster total nodes: 1
## H2O cluster total memory: 1.53 GB
## H2O cluster total cores: 4
## H2O cluster allowed cores: 4
## H2O cluster healthy: TRUE
## H2O Connection ip: localhost
## H2O Connection port: 54321
## H2O Connection proxy: NA
## H2O Internal Security: FALSE
## H2O API Extensions: Amazon S3, Algos, AutoML, Core V3, TargetEncoder, Core V4
## R Version: R version 3.6.3 (2020-02-29)
h2o.train <- as.h2o(trainingset)
##
|
| | 0%
|
|======================================================================| 100%
h2o.test <- as.h2o(testset)
##
|
| | 0%
|
|======================================================================| 100%
Let us build our model with described parameters.
h2o.model <- h2o.deeplearning(x = setdiff(names(trainingset), c("Class")),
y = "Class",
training_frame = h2o.train,
standardize = TRUE, # standardize data
hidden = c(100, 100,100), # 3 layers of 100 nodes each
rate = 0.01, # learning rate
epochs = 1000, # iterations/runs over data (we choose a high number of runs in purpose since we want our model to be competitive in front of tree classification)
seed = 1234 # reproducability seed
)
## Warning in .h2o.processResponseWarnings(res): rate cannot be specified if adaptive_rate is enabled..
##
|
| | 0%
|
|= | 1%
|
|=== | 4%
|
|==== | 5%
|
|===== | 7%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======== | 11%
|
|======== | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============= | 18%
|
|============= | 19%
|
|============== | 20%
|
|=============== | 21%
|
|================ | 23%
|
|================= | 24%
|
|======================================================================| 100%
And apply our model to our test dataset.
h2o.prediction <- as.data.frame(h2o.predict(h2o.model, h2o.test))
##
|
| | 0%
|
|======================================================================| 100%
c6<-confusionMatrix(factor(h2o.prediction$predict),factor(testset$Class))
c6
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1401 52
## 1 79 1784
##
## Accuracy : 0.9605
## 95% CI : (0.9533, 0.9669)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.9199
##
## Mcnemar's Test P-Value : 0.02311
##
## Sensitivity : 0.9466
## Specificity : 0.9717
## Pos Pred Value : 0.9642
## Neg Pred Value : 0.9576
## Prevalence : 0.4463
## Detection Rate : 0.4225
## Detection Prevalence : 0.4382
## Balanced Accuracy : 0.9591
##
## 'Positive' Class : 0
##
model_nnet<-nnet(Class ~. , data=trainingset,size=10, maxit = 500)
## # weights: 401
## initial value 7229.216008
## iter 10 value 1378.974336
## iter 20 value 1156.017874
## iter 30 value 1072.220165
## iter 40 value 908.194273
## iter 50 value 779.870238
## iter 60 value 688.362668
## iter 70 value 598.045200
## iter 80 value 517.569671
## iter 90 value 457.772468
## iter 100 value 422.330390
## iter 110 value 402.950924
## iter 120 value 389.435807
## iter 130 value 378.942650
## iter 140 value 370.599448
## iter 150 value 364.212302
## iter 160 value 362.498818
## iter 170 value 360.472550
## iter 180 value 357.991926
## iter 190 value 355.873394
## iter 200 value 355.194006
## iter 210 value 354.455385
## iter 220 value 353.757369
## iter 230 value 352.406584
## iter 240 value 351.783645
## iter 250 value 351.207384
## iter 260 value 350.832206
## iter 270 value 350.633662
## iter 280 value 350.337568
## iter 290 value 350.035898
## iter 300 value 349.689900
## iter 310 value 349.259129
## iter 320 value 348.641011
## iter 330 value 348.568714
## iter 340 value 348.167418
## iter 350 value 347.882373
## iter 360 value 347.708100
## iter 370 value 347.441220
## iter 380 value 347.261646
## iter 390 value 347.010012
## iter 400 value 346.738320
## iter 410 value 346.468868
## iter 420 value 346.046006
## iter 430 value 345.932199
## iter 440 value 345.669345
## iter 450 value 345.463716
## iter 460 value 345.355821
## iter 470 value 344.933877
## iter 480 value 344.610040
## iter 490 value 344.383182
## iter 500 value 344.126434
## final value 344.126434
## stopped after 500 iterations
pred_nnet <- predict(model_nnet, testset[,-31],type = "class")
c7<-confusionMatrix(factor(pred_nnet),factor(testset$Class))
c7
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1390 70
## 1 90 1766
##
## Accuracy : 0.9517
## 95% CI : (0.9439, 0.9588)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9022
##
## Mcnemar's Test P-Value : 0.1331
##
## Sensitivity : 0.9392
## Specificity : 0.9619
## Pos Pred Value : 0.9521
## Neg Pred Value : 0.9515
## Prevalence : 0.4463
## Detection Rate : 0.4192
## Detection Prevalence : 0.4403
## Balanced Accuracy : 0.9505
##
## 'Positive' Class : 0
##
Let us plot all our confusion matrix in order to choose the best model.
c1
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1323 164
## 1 157 1672
##
## Accuracy : 0.9032
## 95% CI : (0.8926, 0.9131)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8042
##
## Mcnemar's Test P-Value : 0.7377
##
## Sensitivity : 0.8939
## Specificity : 0.9107
## Pos Pred Value : 0.8897
## Neg Pred Value : 0.9142
## Prevalence : 0.4463
## Detection Rate : 0.3990
## Detection Prevalence : 0.4484
## Balanced Accuracy : 0.9023
##
## 'Positive' Class : 0
##
c2
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1246 68
## 1 234 1768
##
## Accuracy : 0.9089
## 95% CI : (0.8986, 0.9185)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8137
##
## Mcnemar's Test P-Value : < 2.2e-16
##
## Sensitivity : 0.8419
## Specificity : 0.9630
## Pos Pred Value : 0.9482
## Neg Pred Value : 0.8831
## Prevalence : 0.4463
## Detection Rate : 0.3758
## Detection Prevalence : 0.3963
## Balanced Accuracy : 0.9024
##
## 'Positive' Class : 0
##
c3
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1401 37
## 1 79 1799
##
## Accuracy : 0.965
## 95% CI : (0.9582, 0.971)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.929
##
## Mcnemar's Test P-Value : 0.0001408
##
## Sensitivity : 0.9466
## Specificity : 0.9798
## Pos Pred Value : 0.9743
## Neg Pred Value : 0.9579
## Prevalence : 0.4463
## Detection Rate : 0.4225
## Detection Prevalence : 0.4337
## Balanced Accuracy : 0.9632
##
## 'Positive' Class : 0
##
c4
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 3316 77
## 1 102 4243
##
## Accuracy : 0.9769
## 95% CI : (0.9733, 0.9801)
## No Information Rate : 0.5583
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.9531
##
## Mcnemar's Test P-Value : 0.07284
##
## Sensitivity : 0.9702
## Specificity : 0.9822
## Pos Pred Value : 0.9773
## Neg Pred Value : 0.9765
## Prevalence : 0.4417
## Detection Rate : 0.4285
## Detection Prevalence : 0.4385
## Balanced Accuracy : 0.9762
##
## 'Positive' Class : 0
##
c5
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1333 81
## 1 147 1755
##
## Accuracy : 0.9312
## 95% CI : (0.9221, 0.9396)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.8603
##
## Mcnemar's Test P-Value : 1.672e-05
##
## Sensitivity : 0.9007
## Specificity : 0.9559
## Pos Pred Value : 0.9427
## Neg Pred Value : 0.9227
## Prevalence : 0.4463
## Detection Rate : 0.4020
## Detection Prevalence : 0.4264
## Balanced Accuracy : 0.9283
##
## 'Positive' Class : 0
##
c6
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1401 52
## 1 79 1784
##
## Accuracy : 0.9605
## 95% CI : (0.9533, 0.9669)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : < 2e-16
##
## Kappa : 0.9199
##
## Mcnemar's Test P-Value : 0.02311
##
## Sensitivity : 0.9466
## Specificity : 0.9717
## Pos Pred Value : 0.9642
## Neg Pred Value : 0.9576
## Prevalence : 0.4463
## Detection Rate : 0.4225
## Detection Prevalence : 0.4382
## Balanced Accuracy : 0.9591
##
## 'Positive' Class : 0
##
c7
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 1390 70
## 1 90 1766
##
## Accuracy : 0.9517
## 95% CI : (0.9439, 0.9588)
## No Information Rate : 0.5537
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9022
##
## Mcnemar's Test P-Value : 0.1331
##
## Sensitivity : 0.9392
## Specificity : 0.9619
## Pos Pred Value : 0.9521
## Neg Pred Value : 0.9515
## Prevalence : 0.4463
## Detection Rate : 0.4192
## Detection Prevalence : 0.4403
## Balanced Accuracy : 0.9505
##
## 'Positive' Class : 0
##
Since we have chosen to use a simple metric to evolve our model as our dataset was balanced, we will not use ROC curve or AIC but simply keep the best model accuracy. From above confusion matrix, we can class our models by performance into following order : first comes Boosted Tree, then RandomForest, and Neural Network models. Regression model provided better accuracy than our two first classification trees.
Although the performance of our seven different machine learning methods used is quite comparable, we found that Boosed Trees model achieved
the best results. We have found that a simple regression method can sometimes be better than certain types of trees when it comes to classification and that neural networks achieve competitive results. Our results demonstrate the potential of using learning machines in detecting and classifying phishing website.
One interesting future development would be to build an online Website Phishing Detector into a Web-Application using those models with Rshiny.
Here an example found of such a website and some screenshots of it. https://malicious-url-detectorv5.herokuapp.com/
Homepage - An Example of Phishing Dectector Web-Application
Malicious Website - An Example of Phishing Dectector Web-Application
Building such an application would to be very interesting because it would make it possible to use model in practical cases and validate our conclusion regarding model performance score with new input data. I began to do some research in order to develop such a website that and I propose here identified major steps to implement that.
Within A RShiny script
Use web-scrapper R package “rves”.
Extract all wanted information include in our 30 features.
Build function if else based on the rules used to build the dataset. like this python script : https://github.com/srimani-programmer/Phishing-URL-Detector/blob/master/feature_extraction.py
https://phishtank.com/index.php provides url of detected phishing dataset to feed our models.
Once our dataset is rebuilt Use our models to predict if the input url is phishing or not and print accuracy score.
Show results on a r hiny server web-page.
The following is a list of helpful contributor.
Thank you for reading !