Published 3 years ago

Updated 3 years ago

We’ll take a Keras network designed for continuous (linear) output, and convert it into a network for binary classification, which can divide data into two classes (for example: “dog” vs “cat”), and can be used for things like sentiment analysis ("positive" vs "negative").

Instructor: [00:00] We have a neural network defined, which takes in four numbers and returns a numerical value that represents the mean of those four numbers. Instead, what if we wanted to classify our data with the network?

[00:11] For example, we'll call the numbers less than 50 low, and the numbers greater than 50 will be high. In this example, the numbers are less than 50, so the class of this set of numbers is low. The main structure of the network can remain exactly the same. We're taking in four numbers, and the layers are still fine. It's only the output that is different.

[00:36] We have a linear activation function on the output, but that will give us values like 12 or -1.5. Instead, we want to pick a class, low or high. First, let's assign the number zero to the class low, and the number one to the class high.

[00:51] Then, instead of a linear activation, we should use sigmoid, which is a function that will return a single number between zero and one. The closer it is to zero, the more likely the input is to be low. The closer it is to one, the more likely the input is to be high.

[01:06] There are some changes to make in the compile step as well. First, change the loss from mean squared error to binary cross entropy, which is a loss function that will optimize the data for being a part of one of two classes.

[01:21] Then we want to make a new metric to our model and call it accuracy, which can tell us the percentage of our training, validation, or test data points that were correctly identified. Now, we can train our X train and Y train Numpy arrays, and fill in several examples.

[01:43] Now, the Y train variable, instead of the mean of the four numbers, we'll use the value zero for low and one for high. The first three inputs are zero, and the next three are one. We can also define a validation set in the same way, and then train the model with the fit method, using the same parameters that we would if we were only using a linear output.

[02:11] When we run that, we see 100 epochs of training, where we still see the loss value, but we also see accuracy for both training and validation. In just 100 epochs, we can get 100 percent accuracy on our small data set.

[02:30] This is admittedly a small data set, and a very contrived example, but it shows the potential power of a fully connected neural network for problems like two-class classification or sentiment analysis.