Title: | a Fast Implementation of Adaboost |
---|---|
Description: | Implements Adaboost based on C++ backend code. This is blazingly fast and especially useful for large, in memory data sets. The package uses decision trees as weak classifiers. Once the classifiers have been trained, they can be used to predict new data. Currently, we support only binary classification tasks. The package implements the Adaboost.M1 algorithm and the real Adaboost(SAMME.R) algorithm. |
Authors: | Sourav Chatterjee [aut, cre] |
Maintainer: | Sourav Chatterjee <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2024-11-06 04:11:16 UTC |
Source: | https://github.com/souravc83/fastAdaboost |
Implements Freund and Schapire's Adaboost.M1 algorithm
adaboost(formula, data, nIter, ...)
adaboost(formula, data, nIter, ...)
formula |
Formula for models |
data |
Input dataframe |
nIter |
no. of classifiers |
... |
other optional arguments, not implemented now |
This implements the Adaboost.M1 algorithm for a binary classification task. The target variable must be a factor with exactly two levels. The final classifier is a linear combination of weak decision tree classifiers.
object of class adaboost
Freund, Y. and Schapire, R.E. (1996):“Experiments with a new boosting algorithm” . In Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156, Morgan Kaufmann.
real_adaboost
, predict.adaboost
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- adaboost(Y~X, data=fakedata,10)
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- adaboost(Y~X, data=fakedata,10)
fastAdaboost provides a blazingly fast implementation of both discrete and real adaboost algorithms, based on a C++ backend. The goal of the package is to provide fast performance for large in-memory data sets.
Sourav Chatterjee
Freund, Y. and Schapire, R.E. (1996):“Experiments with a new boosting algorithm” . In Proceedings of the Thirteenth International Conference on Machine Learning, pp. 148–156, Morgan Kaufmann.
Zhu, Ji, et al. “Multi-class adaboost” Ann Arbor 1001.48109 (2006): 1612.
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- adaboost(Y~X, fakedata, 10) pred <- predict( test_adaboost,newdata=fakedata) print(pred$error)
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- adaboost(Y~X, fakedata, 10) pred <- predict( test_adaboost,newdata=fakedata) print(pred$error)
returns a single weak decision tree classifier which is part of the strong classifier
get_tree(object, tree_num)
get_tree(object, tree_num)
object |
object of class adaboost |
tree_num |
integer describing the tree to get |
returns an individual tree from the adaboost object This can provide the user with some clarity on the individual building blocks of the strong classifier
object of class rpart
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- adaboost(Y~X, fakedata, 10) tree <- get_tree(test_adaboost,5)
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- adaboost(Y~X, fakedata, 10) tree <- get_tree(test_adaboost,5)
predictions for model corresponding to adaboost.m1 algorithm
## S3 method for class 'adaboost' predict(object, newdata, ...)
## S3 method for class 'adaboost' predict(object, newdata, ...)
object |
an object of class adaboost |
newdata |
dataframe on which we are looking to predict |
... |
arguments passed to predict.default |
makes predictions for an adaboost object on a new dataset.
The target variable is not required
for the prediction to work.
However, the user must ensure that the test data has the same
columns which were used as inputs to fit the original model.
The error component of the prediction object(as in
pred$error
) can be used to get the error of the
test set if the test data is labeled.
predicted object, which is a list with the following components
formula |
the formula used. |
votes |
total weighted votes achieved by each class |
class |
the class predicted by the classifier |
prob |
a matrix with predicted probability of each class for each observation |
error |
The error on the test data if labeled, otherwise |
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- adaboost(Y~X, fakedata, 10) pred <- predict( test_adaboost,newdata=fakedata) print(pred$error) print( table(pred$class,fakedata$Y) )
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- adaboost(Y~X, fakedata, 10) pred <- predict( test_adaboost,newdata=fakedata) print(pred$error) print( table(pred$class,fakedata$Y) )
predictions for model corresponding to real_adaboost algorithm
## S3 method for class 'real_adaboost' predict(object, newdata, ...)
## S3 method for class 'real_adaboost' predict(object, newdata, ...)
object |
an object of class real_adaboost |
newdata |
dataframe on which we are looking to predict |
... |
arguments passed to predict.default |
makes predictions for an adaboost object on a new dataset
using the real_adaboost algorithm.
The target variable is not required
for the prediction to work.
However, the user must ensure that the test data has the same
columns which were used as inputs to fit the original model.
The error component of the prediction object(as in
pred$error
) can be used to get the error of the
test set if the test data is labeled.
predicted object, which is a list with the following components
formula |
the formula used. |
votes |
total weighted votes achieved by each class |
class |
the class predicted by the classifier |
prob |
a matrix with predicted probability of each class for each observation |
error |
The error on the test data if labeled, otherwise |
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_real_adaboost <- real_adaboost(Y~X, fakedata, 10) pred <- predict(test_real_adaboost,newdata=fakedata) print(pred$error) print( table(pred$class,fakedata$Y) )
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_real_adaboost <- real_adaboost(Y~X, fakedata, 10) pred <- predict(test_real_adaboost,newdata=fakedata) print(pred$error) print( table(pred$class,fakedata$Y) )
S3 method to print an adaboost object
## S3 method for class 'adaboost' print(x, ...)
## S3 method for class 'adaboost' print(x, ...)
x |
object of class adaboost |
... |
arguments passed to print.default |
Displays basic information on the model, such as function call, dependent variable, the number of trees, and weights assigned to each tree
None
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- adaboost(Y~X, fakedata, 10) print(test_adaboost)
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- adaboost(Y~X, fakedata, 10) print(test_adaboost)
S3 method to print a real_adaboost object
## S3 method for class 'real_adaboost' print(x, ...)
## S3 method for class 'real_adaboost' print(x, ...)
x |
object of class real_adaboost |
... |
arguments passed to print.default |
Displays basic information on the model, such as function call, dependent variable and the number of trees
None
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_real_adaboost<- real_adaboost(Y~X, fakedata, 10) print(test_real_adaboost)
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_real_adaboost<- real_adaboost(Y~X, fakedata, 10) print(test_real_adaboost)
Implements Zhu et al's real adaboost or SAMME.R algorithm
real_adaboost(formula, data, nIter, ...)
real_adaboost(formula, data, nIter, ...)
formula |
Formula for models |
data |
Input dataframe |
nIter |
no. of classifiers |
... |
other optional arguments, not implemented now |
This implements the real adaboost algorithm for a binary classification task. The target variable must be a factor with exactly two levels. The final classifier is a linear combination of weak decision tree classifiers. Real adaboost uses the class probabilities of the weak classifiers to iteratively update example weights. It has been found to have lower generalization errors than adaboost.m1 for the same number of iterations.
object of class real_adaboost
Zhu, Ji, et al. “Multi-class adaboost” Ann Arbor 1001.48109 (2006): 1612.
adaboost
,predict.real_adaboost
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- real_adaboost(Y~X, data=fakedata,10)
fakedata <- data.frame( X=c(rnorm(100,0,1),rnorm(100,1,1)), Y=c(rep(0,100),rep(1,100) ) ) fakedata$Y <- factor(fakedata$Y) test_adaboost <- real_adaboost(Y~X, data=fakedata,10)