Math - Naive Bayes Classification Algorithm (Notes)

Naive Bayes

“After updating our initial beliefs about something with objective new information, we get a new, improved belief.” —- Mathematician Thomas Bayes (1702~1761)

When you cannot accurately know the essence of something, you can rely on how many events related to that specific essence appear to judge the probability of its essential attributes.

The more events that support a certain attribute occur, the more likely that attribute is to be true.

In 1774, French mathematician Pierre-Simon Laplace (1749-1827) independently rediscovered Bayes’ formula.

Figure 16

Alternative notation:

Figure 17

Making Computers Distinguish Fruits

We need to convert fruit characteristics into data that computers can understand. The most common way is to extract attributes of objects in the real world and convert them into numbers.

For example: shape, skin color, zebra texture, weight, grip feel, taste.

Figure 19

Convert these descriptions into numbers, converting weight from continuous values to discrete values, because Naive Bayes processes discrete values.

Figure 20

Expand the sample. Just 3 fruits are not enough to constitute the training sample required for Naive Bayes classification.

Figure 21

How We Use Bayes’ Formula

Estimate posterior probability using prior probability and conditional probability.

Figure 22

Assume that different attributes of a data object are independent when affecting its classification. If attributes fi and fj appear simultaneously in data object o, then the probability that object o belongs to category c is:

The Naive Bayes algorithm assumes that features are independent of each other, which is why both sides can be equal. This is also the origin of the word “naive” in Naive Bayes classification.

Figure 23

Use data from 10 fruits to build a Naive Bayes model.

Figure 24

Smoothing

Situations where the result is 0 will occur, so we usually take a very small value smaller than the minimum statistical probability in this dataset to replace “zero probability”. For example, we take 0.01 here. When filling attribute values that have never appeared in training data, we use this technique, which we call Smoothing.

Example:

Suppose we have a new fruit with a round shape and sweet taste. According to Naive Bayes, what are the probabilities that it belongs to apple, sweet orange, and watermelon respectively?

Figure 25

apple represents classification as apple, shape-2 represents the shape attribute value of 2 (i.e., round), taste-2 represents the taste attribute value of 2. Similarly, we can calculate the probability that this fruit belongs to sweet orange and watermelon.

Figure 26

Comparing these three values, 0.00198<0.00798<0.26934, so the computer can conclude that the fruit is most likely a sweet orange.

Naive Bayes Classification Mainly Includes These Steps

  • Prepare Data: For the fruit classification case, we collected several fruit instances and converted them into data that computers can understand starting from common fruit attributes. This data is also called training samples.

  • Build Model: Through fruit instances at hand, we let the computer count the prior probability of each fruit and attribute occurrence, as well as the conditional probability of a certain attribute appearing under a certain fruit classification. This process is also called sample-based training.

  • Classify New Data: For a new fruit’s attribute data, the computer performs derivation calculations based on the established model to obtain the probability that the fruit belongs to each classification, achieving the purpose of classification. This process is also called prediction.

Advantages and Disadvantages of Naive Bayes Classification

  • Advantages:
  1. Simple algorithm logic, easy to implement

  2. Small time and space overhead during classification

  • Disadvantages:

Theoretically, Naive Bayes models have the smallest error rate compared to other classification methods. But in practice, this is not always the case, because Naive Bayes models assume that attributes are independent of each other. This assumption is often not true in practical applications. When there are many attributes or high correlation between attributes, classification performance is poor.

Article Link:

https://alili.tech/en/archive/6iwpimvelxh/

# Latest Articles