Chapter 7 Naive Bayes

7.1 TL;DR

What it does: Provides yet another way to try to classify observations into one or more categories
When to do it: When you’re rounding out your exhaustive try-them-all process to try to determine the best classifier for your data after linear discriminant analysis and quadratic discriminant analysis
How to do it: With the naiveBayes() function from the e1071 library
How to assess it: As with the other methods, assess it on test data after training and see if its accuracy is better than other methods

7.2 What it does

Naive Bayes makes the assumption that there is no association between the predictors. According to James et al. (2021), this still yields “decent results” even though that’s generally not true, especially in datasets where the number of observations $n$ is smallish relative to the number of variables $p$.

7.3 When to do it

When you’re rounding our your assessment of the accuracy of the various classification methods looking for the best one.

TODO: there must be a better explanation for this.

7.4 How to do it

Exactly like linear discriminant analysis and quadratic discriminant analysis, except for the naiveBayes() call instead of qda() or lda(). Train the model on training data, assess it against test data.

data(Boston)
boston <- Boston %>%
  mutate(
    # Create the categorical crim_above_med response variable
    crim_above_med = as.factor(ifelse(crim > median(crim), "Yes", "No")),
  )

We again split the data into training and test sets:

set.seed(1235)
boston.training <- rep(FALSE, nrow(boston))
boston.training[sample(nrow(boston), nrow(boston) * 0.8)] <- TRUE
boston.test <- !boston.training

and call naiveBayes() on the training data:

boston_nb.fit <-
  naiveBayes(
    crim_above_med ~ nox,
    data = boston,
    subset = boston.training,
  )

7.5 How to assess it

boston_nb.fit

## 
## Naive Bayes Classifier for Discrete Predictors
## 
## Call:
## naiveBayes.default(x = X, y = Y, laplace = laplace)
## 
## A-priori probabilities:
## Y
##        No       Yes 
## 0.5247525 0.4752475 
## 
## Conditional probabilities:
##      nox
## Y          [,1]       [,2]
##   No  0.4710519 0.05635737
##   Yes 0.6357865 0.10210051

In the output above, we see our A-priori probabilities, which like the Prior probabilities of lda() and qda() show the percentage of observations in each of the outcome categories:

> mean(boston[boston.training,]$crim_above_med == "Yes")
[1] 0.4752475
> mean(boston[boston.training,]$crim_above_med == "No")
[1] 0.5247525

The Conditional probabilities are identical to the Group means from lda() and qda(), showing the means of the predictor variable (nox in this case) for each of the categories:

boston[boston.training, ] %>%
  group_by(crim_above_med) %>%
  summarize("Group means" = mean(nox))

## # A tibble: 2 × 2
##   crim_above_med `Group means`
##   <fct>                  <dbl>
## 1 No                     0.471
## 2 Yes                    0.636

All of these results are the same as lda() and qda(). The Naive Bayes output also includes a second column in the Conditional probabilities section, which is the standard deviation for the (in this case nox) variable in each category.

After the fit is built, we can run prediction as with the other methods, to predict the crim_above_med variable from nox (note that we just make a table of the prediction output itself, without the $class attribute):

boston_nb.pred <- predict(boston_nb.fit, boston[boston.test, ])
table(boston_nb.pred, boston[boston.test, ]$crim_above_med)

##               
## boston_nb.pred No Yes
##            No  38  12
##            Yes  3  49

In this example, just like LDA and QDA, Naive Bayes made the correct categorization 87 times out of 102, with 3 false positives and 12 false negatives. And again, we can compute the prediction accuracy by the mean of the correct-to-wrong guesses:

mean(boston_nb.pred == boston[boston.test, ]$crim_above_med)

## [1] 0.8529412

We see again here that Naive Bayes performed identically to lda() and qda(). This means we probably need a better example for this document.

7.6 Where to learn more

Chapter 4.4.4 in James et al. (2021)

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning. 2nd ed. Springer. https://link.springer.com/book/10.1007/978-1-0716-1418-1.