How do Convolutional Neural Networks work?

Featured image

Today we are going to be talking about Convolutional neural networks that are an important part of Deep learning.

Convolutional neural networks are similar to the artificial neural network. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity.

According to Wikipedia -

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery.

How do Convolutional Neural Networks Work

In short, To train the deep-learning model, each input image will pass through the Convolutional layer, filters, and pooling layer then an activation function that will classify output in probabilities.

Convolutional Neural Networks are mainly designed to process and understand the images. It’s a type of feed-forward artificial neural network that is used in image recognition and processing and also in NLP.

First, we will understand the architecture of CNN and then we will see how it works.

Convolutional Neural Networks Architecture

Input Layer:- The input layer is the first layer in CNN. It’s the raw pixel value of the image. The input layer is a 3D matrix of the image with height, width, and depth. The depth of the image is 3 because the image is in RGB format. If the image is in grayscale then the depth of the image is 1.

Convolutional Layer: A filter is generally a 3*3 matrix in convolutional layer. It’s also known as a kernel. The kernel is always smaller then the input image and the main use case of the kernel is to extract features from the image by performing the dot product with the input image. The filter has the same depth as the input image and it’s also known as the number of channels. It’s suggested to users 3 or 5 or 7 sizes of the filter. The smaller the size of the filter, it will extract the smaller features from the image and the larger the size of the filter, it will extract the larger features from the image. It depends upon the use case of the problem.

Pooling Layer:- The next layer in CNN is Pooling Layer and it’s also known as downsampling and subsampling.  Pooling layers are used to simplify the information collected by the convolutional layer and it reduces the parameters and improves computation for the complex image otherwise it will take a lot of time and resources and sometimes it will overfit the model. Pooling layers are also used to make the model more robust to slight changes in the input image.

It’s as simple as it sounds, Max pooling takes the largest value from one patch of an image and creates a new matrix next to the max values from other patches and discards the rest of the information contained in the activation maps. Average pooling can also be used instead of Max pooling. where each entry point is transformed into the average value of the group of points instead of its maximum value. Average pooling is less used than Max pooling because in average pooling we are losing the information and in max pooling, we are keeping the information of the image.

max pooling

Activation Function:- Activation function is used to make neural networks non-linear. It limits the output between 0 and 1 probability. It’s also known as the probability function. It’s used to classify the algorithm output as a probability distribution. Relu and Adam are the most used activation functions in CNN Models. We will discuss activation functions in detail in the next post. How to use activation functions in CNN.

average pooling

Use of Convolutional Neural Networks

CNN is used in many applications like image classification, image recognition, image segmentation, object detection, face recognition, natural language processing, text classification, text recognition, etc.

Suppose we have thousands of images of people participating in the marathon and we want to send their images to the individual person. How can we do that? We can’t send images one by one to the person. It will take a lot of time. So, we can use CNN to recognize the person in the image and send the related image to the person.

One more example, Suppose we have a lot of images of cats and dogs and we want to classify them. With the help of CNN, we can classify them.

So, CNN is used in many applications.

Convolutional Neural Networks Implementation

Here is the code for the CNN model in TensorFlow.

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

The first layer in the model is Conv2D. It’s a convolutional layer that will extract features from the input image. The first argument is the number of filters that we want to use in the convolutional layer. The second argument is the size of the filter. The third argument is the activation function. The fourth argument is the input shape. The input shape is the shape of the image that we are going to feed in the model. The input shape is 32323. The 3 is the depth of the image. The depth of the image is 3 because the image is in RGB format. If the image is in grayscale then the depth of the image is 1.

The second layer in the model is MaxPooling2D. It’s a pooling layer that will reduce the size of the image. The first argument is the size of the pooling layer. The second argument is the size of the stride. The stride is the number of pixels that the filter moves each time. The default value of the stride is 2.

Thanks for reading this post. I hope you like this post. If you have any questions, then feel free to comment below.