Chapter 7 Naive Bayes
7.1 TL;DR
- What it does
- Provides yet another way to try to classify observations into one or more categories
- When to do it
- When you’re rounding out your exhaustive try-them-all process to try to determine the best classifier for your data after linear discriminant analysis and quadratic discriminant analysis
- How to do it
-
With the
naiveBayes()
function from the e1071 library - How to assess it
- As with the other methods, assess it on test data after training and see if its accuracy is better than other methods
7.2 What it does
Naive Bayes makes the assumption that there is no association between the predictors. According to James et al. (2021), this still yields “decent results” even though that’s generally not true, especially in datasets where the number of observations \(n\) is smallish relative to the number of variables \(p\).
7.3 When to do it
When you’re rounding our your assessment of the accuracy of the various classification methods looking for the best one.
TODO: there must be a better explanation for this.
7.4 How to do it
Exactly like linear discriminant analysis and quadratic discriminant analysis, except for the naiveBayes()
call instead of qda()
or lda()
. Train the model on training data, assess it against test data.
data(Boston)
<- Boston %>%
boston mutate(
# Create the categorical crim_above_med response variable
crim_above_med = as.factor(ifelse(crim > median(crim), "Yes", "No")),
)
We again split the data into training and test sets:
set.seed(1235)
<- rep(FALSE, nrow(boston))
boston.training sample(nrow(boston), nrow(boston) * 0.8)] <- TRUE
boston.training[<- !boston.training boston.test
and call naiveBayes()
on the training data:
<-
boston_nb.fit naiveBayes(
~ nox,
crim_above_med data = boston,
subset = boston.training,
)
7.5 How to assess it
boston_nb.fit
##
## Naive Bayes Classifier for Discrete Predictors
##
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
##
## A-priori probabilities:
## Y
## No Yes
## 0.5247525 0.4752475
##
## Conditional probabilities:
## nox
## Y [,1] [,2]
## No 0.4710519 0.05635737
## Yes 0.6357865 0.10210051
In the output above, we see our A-priori probabilities
, which like the Prior probabilities
of lda()
and qda()
show the percentage of observations in each of the outcome categories:
> mean(boston[boston.training,]$crim_above_med == "Yes")
1] 0.4752475
[> mean(boston[boston.training,]$crim_above_med == "No")
1] 0.5247525 [
The Conditional probabilities
are identical to the Group means
from lda()
and qda()
, showing the means of the predictor variable (nox
in this case) for each of the categories:
%>%
boston[boston.training, ] group_by(crim_above_med) %>%
summarize("Group means" = mean(nox))
## # A tibble: 2 × 2
## crim_above_med `Group means`
## <fct> <dbl>
## 1 No 0.471
## 2 Yes 0.636
All of these results are the same as lda()
and qda()
. The Naive Bayes output also includes a second column in the Conditional probabilities
section, which is the standard deviation for the (in this case nox
) variable in each category.
After the fit is built, we can run prediction as with the other methods, to predict the crim_above_med
variable from nox
(note that we just make a table of the prediction output itself, without the $class
attribute):
<- predict(boston_nb.fit, boston[boston.test, ])
boston_nb.pred table(boston_nb.pred, boston[boston.test, ]$crim_above_med)
##
## boston_nb.pred No Yes
## No 38 12
## Yes 3 49
In this example, just like LDA and QDA, Naive Bayes made the correct categorization 87 times out of 102, with 3 false positives and 12 false negatives. And again, we can compute the prediction accuracy by the mean of the correct-to-wrong guesses:
mean(boston_nb.pred == boston[boston.test, ]$crim_above_med)
## [1] 0.8529412
We see again here that Naive Bayes performed identically to lda()
and qda()
. This means we probably need a better example for this document.
7.6 Where to learn more
- Chapter 4.4.4 in James et al. (2021)