Practical Example: How to Use a Machine Learning Library to Train and Evaluate a Prediction Model#
Here's an example of how to use a machine learning library, specifically scikit-learn
, to train and evaluate a prediction model using the popular Iris dataset.
import numpy as npy
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
# Initialize the logistic regression model
model = LogisticRegression()
# Perform k-fold cross-validation
cv_scores = cross_val_score(model, X, y, cv = 5)
# Calculate the mean accuracy across all folds
mean_accuracy = npy.mean(cv_scores)
# Train the model on the entire dataset
model.fit(X, y)
# Make predictions on the same dataset
predictions = model.predict(X)
# Calculate accuracy on the predictions
accuracy = accuracy_score(y, predictions)
# Print the results
print("Cross-Validation Accuracy:", mean_accuracy)
print("Overall Accuracy:", accuracy)
In this example, we first load the Iris dataset using load_iris()
function from scikit-learn
. Then, we initialize a logistic regression model using LogisticRegression()
class.
Next, we perform k-fold cross-validation using cross_val_score()
function with cv=5
parameter, which splits the dataset into 5 folds and evaluates the model's performance on each fold. The cv_scores
variable stores the accuracy scores for each fold.
After that, we train the model on the entire dataset using fit()
method. We then make predictions on the same dataset and calculate the accuracy of the predictions using accuracy_score()
function.
Finally, we print the cross-validation accuracy, which is the mean of the accuracy scores obtained from cross-validation, and the overall accuracy of the model on the entire dataset.