Naive Bayes - NB

Supervised Learning

Overview

Naive Bayes is a classification algorithm that can help assist with the classification of text in high-dimensionality data sets.

Some places where Naive Bayes can be used are in sentimental analysis, classifying new articles, and spam filtration.

Naive Bayes is a probability-based algorithm. It predicts the event's occurrence out of all possible outcomes.

Bayesian Probability allows for the calculation of conditional probabilities given that there is partial knowledge for calculating the occurrence of a certain event.

Multinomal NB Algorithm

Step 1: Import the libraries needed

Step 2: Import the data set

Step 3: Data preprocessing (split into 70% training and 30% into test data set)

Step 4: Feature scaling (transforms the data set into a mean of 0 and standard deviation of 1)

Step 5: Train the model

Step 6: Construct the confusion matrix (helps understand the quality of the model)

Step 7: Visualize the model

Bayes Theory

Standard Multinomal NB

Used for documentation classification issues
Features needed: frequency of words converted from the document

Bernoulli NB

Used for data that has binary or boolean attributes (True/False, Yes/No, etc.)
Word frequency is less important

Bernoulli Probabilities: P(X=1)=p or P(X=0)=1-p.

Advantages of NB

A large dataset is not needed
Straightforward implementation
Converges quickly
Highly scalable
Can handle continuous and categorical values
Not sensitive to irrelevant data
Provides real-time predictions

Disadvantages of NB

Has trouble with zero frequency problems (i.e. assigning 0 probability to categorical variables in the data set)
Assumes all attributes are independent, which doesn't happen in real life
Estimates things incorrectly, so the probabilities outputted should be taken with a grain of salt.

Laplace Smoothing

As mentioned in the disadvantages section of NB, zero probability is an issue. The way to combat this challenge is by using smoothing. We assume a value for alpha ideally alpha=1 and when using a higher value for alpha we get a probability of a word being close to 0.5.

The advantage of using laplace smoothing is ensures no case of zero probabilities.

The disadvantage of using this smoothing technique is it changes the event probabilities. To increase a probability from 0, the probabilities of other events must be decreased to satisfy the law of total probability.

Information Sourced From: Medium, Codingninjas, Turing, Bernoulli NB - Medium

Data Prep

VIEW THE DATA SET ON GITHUB

The data set that will be used for Naive Bayes analysis will be the Harvard Migraine data set. Please use the button above to download the data set! Initial cleaning has been completed, check out the EDA tab for more information. The summary of what has been cleaned in the data set is the column names and the values for example of visual loss being changed from Weak bilat to Weak Bilat. There were cases where there were discrepancies in how the data was inputted into the data set so taking the time and smoothing those discrepancies out.

For the Naive Bayes analysis, the data prep is split into a training set and a test data set. One key feature of Naive Bayes is having two mutually exclusive sets for the training and test set. To ensure that this algorithm runs properly, we must satisfy the assumption that features are conditionally independent and do not exist in the other set.

Training Set

Testing Set

Code & Results

VIEW THE CODE ON GITHUB

The code will be completed and explored in R.

For the initial Naive Bayes, the formula was tested against the original 50 records in the data set.

How did the model perform? Based on both the training and test data sets confusion matrices and accuracy statistics, they both performed about 60% accurately. Not great!

From this visualization, most of the records fell into a normal pulse rate. The next largest chunk of records fell into a week bilateral pulse.

It appears that having a normal pulse has a higher rate of visual loss over weak bilateral (both sides of the brain), but having a weak bilat pulse has a higher rate of ipsi or ipsilateral meaning on same side, in this case, vision loss in both eyes.

Training

Test

Since the data set was so small with only 50 records, the NB was retested by pulling 50,000 samples with replacement from the data set and then split into 2 data sets, at 70% & 30%.

The results on the left is from the new test data set comprising 15,000 records. It performed about 59% accurately, closer to the initial training set value of accuracy, not great, but not as bad as the original test!

Conclusion

VIEW THIS PROJECT ON GITHUB

From the Naive Bayes classifying algorithm, the key takeaways are that anybody who experiences a migraine can experience some sort of vision loss or none at all. Visually from the diagram, it is implied that those who have a weak bilateral pulse meaning both sides of the brain experience vision loss in both eyes more frequently than those who have a normal pulse. It seems that there are really only two categories, either Normal pulses or weak bilateral pulse. Not a lot of patients had a weak left or a weak right pulse.

Doing some further research, from a study about arterial pressure and heart rate changes, there is a correlation between increased arterial pressure during cluster migraine attacks. It was not a factor when it was just a standard migraine. It would be interesting to further expand this study to see about heart rate over the brain and determine if those have any impacts or based on diagnosis if there are any additional trends that can be found.