Derek Andersen and Joanne Chau
chefboost
Package¶We will be utilizing the chefboost
package to build our CaRT trees. chefboost
is a simple to use Python package for building decision tree models that supports ID3, C4.5, CART, CHAID and regression tree algorithms.
# Install chefboost package
!pip install chefboost
We will be using chefboost
and pandas
since CaRT models focus on implementation of data sets.
from chefboost import Chefboost as chef
import pandas as pd
Load and view the dataset.
# Load the dataset
classification = pd.read_csv("drive/My Drive/chefboost-master/tests/dataset/golf.txt")
# View the dataset
classification
The syntax for data configuration is:
{"algorithm" : "[Type of Algorithm to Apply]"}
There are more than just CART
and Regression
algorithms allowed for modeling training with the Chefboost package. I encourage you to explore all possible algorithms in their GitHub repository.
config = {'algorithm': 'CART'} # Using the CART algorithm
class_model = chef.fit(classification.copy(), config)
Now we can create some dummy data to evaluate with our trained model.
# Instances are of the form [Outlook, Temp, Humidity, Wind]
test_set = [
['Sunny', 'Cool', 'High', 'Strong'],
['Sunny', 'Cool', 'High', 'Weak'],
['Overcast', 'Cool', 'High', 'Weak'],
['Overcast', 'Mild', 'Normal', 'Weak'],
['Rain', 'Hot', 'Normal', 'Strong'],
['Rain', 'Hot', 'High', 'Weak']
]
# Evaluate
print("Predictions:")
for instance in test_set:
print(instance, "decision:", chef.predict(class_model, instance))
# Load and preview the dataset we are working with to understand features being used
regression = pd.read_csv('drive/My Drive/chefboost-master/tests/dataset/golf4.txt')
regression
This is where Chefboost can be confusing. Though it is simple to configure the data to understand what is needed, we demonstrated earlier that Classification trees use the CART
algorithm. When using chefboost
, remember that CART
is for Classification trees and Regression
is for Regression trees.
# Train and configure the model using the Regression algorithm
config = {"algorithm" : "Regression"}
reg_model = chef.fit(regression.copy(), config)
Unlike the Classification Tree seen earlier, the Regression Tree shows how accurate they were and the margin of error between the instances. With this small training set, there is about a 8% error.
# Compare our training to predictions
# For a better understanding of the error
# Utilizing line 4 from the dataset
test_instance = [ "Rain", 70, 96, "Weak"]
prediction = round(chef.predict(reg_model, test_instance), 1)
actual = regression.iloc[4]["Decision"]
print("Prediction: ", prediction, "| Actual : ", actual, "| Error: ", round((actual - prediction), 1))
# Compare all predictions to accurate numbers
# Take the absolute value for error
for index, instance in regression.iterrows():
prediction = round(chef.predict(reg_model, instance), 1)
actual = instance["Decision"]
print("Prediction: ", prediction, "| Actual : ", actual, "| Error: ", round((abs(actual - prediction)),1))
Now we can create some dummy test set and feed it to our model for predictions. These test set instances are the similar to the ones we used for the Classification Tree example earlier.
# Instances are of the form [Outlook, Temp, Humidity, Wind]
test_set = [
['Sunny', 50, 75, 'Strong'],
['Sunny', 54, 85, 'Weak'],
['Overcast', 43, 87, 'Weak'],
['Overcast', 65, 60, 'Weak'],
['Rain', 90, 55, 'Strong'],
['Rain', 98, 90, 'Weak']
]
# Evaluate
for instance in test_set:
print(instance , "Prediction: ", round(chef.predict(reg_model, instance)))