Convolutional Neural Networks (CNNs) in Computer Vision

Chanchala Gorale
4 min readJun 8, 2024

--

Training image data using convolutional neural networks (CNNs) in computer vision involves several key steps. Here’s an overview of the process:

1. Data Preparation

Collecting Data:

  • Gather a large and diverse set of labeled images. The labels correspond to the category or class each image belongs to (e.g., “cat”, “dog”, “car”).

Preprocessing Data:

  • Resizing: Normalize the image sizes so that they have the same dimensions (e.g., 224x224 pixels).
  • Normalization: Scale pixel values (e.g., from 0–255 to 0–1) to improve training performance.
  • Augmentation: Apply random transformations such as rotation, flipping, and cropping to increase the diversity of the training set and reduce overfitting.

2. Model Architecture

Convolutional Layers:

  • Filters/Kernels: Small matrices (e.g., 3x3, 5x5) that slide over the input image and perform convolution operations to extract features such as edges, textures, and shapes.
  • Stride: The step size with which the filter moves across the image. Stride can affect the spatial dimensions of the output feature map.
  • Padding: Adding zeros around the input image to control the spatial size of the output feature map.

Pooling Layers:

  • Max Pooling: Reduces the spatial dimensions of the feature maps by taking the maximum value in a window (e.g., 2x2) and helps in achieving translation invariance.
  • Average Pooling: Takes the average of values in a window but is less common in practice compared to max pooling.

Fully Connected Layers:

  • Flatten the final set of feature maps into a single vector and pass it through fully connected layers to make predictions.

Activation Functions:

  • ReLU (Rectified Linear Unit): Applies a non-linear transformation f(x)=max⁡(0,x)f(x) = \max(0, x)f(x)=max(0,x), helping to introduce non-linearity and allowing the network to learn complex patterns.

3. Training Process

Forward Propagation:

  • Pass the input image through the network, performing convolutions, activations, and pooling operations, to obtain predictions.

Loss Function:

  • Cross-Entropy Loss: Commonly used for classification tasks to measure the difference between the predicted probabilities and the true labels.

Backward Propagation:

  • Calculate the gradient of the loss function with respect to each weight in the network using the chain rule.
  • Update the weights using optimization algorithms such as stochastic gradient descent (SGD) or Adam.

4. Optimization

Learning Rate:

  • A hyperparameter that controls the size of the steps taken during optimization. It needs careful tuning; too high can lead to divergence, too low can slow down training.

Regularization:

  • Techniques like dropout (randomly setting a fraction of activations to zero during training) and weight decay (L2 regularization) help prevent overfitting.

Batch Size:

  • The number of training samples used in one forward/backward pass. Balances memory usage and convergence speed.

5. Evaluation and Testing

Validation Set:

  • A separate set of images not seen by the model during training, used to tune hyperparameters and evaluate model performance.

Testing Set:

  • A final set of images used to assess the model’s generalization ability after training is complete.

6. Fine-Tuning and Transfer Learning

Pretrained Models:

  • Use models pretrained on large datasets like ImageNet as a starting point and fine-tune them on your specific dataset, which can significantly improve performance and reduce training time.

Example Workflow

  1. Data Loading: Load and preprocess the images.
  2. Model Definition: Define the CNN architecture (e.g., layers, activation functions).
  3. Compilation: Compile the model with a loss function, optimizer, and evaluation metrics.
  4. Training: Train the model on the training set while monitoring performance on the validation set.
  5. Evaluation: Evaluate the model on the test set to determine its final performance.

Example of a CNN Image Training Model Using Keras

Here’s a simple example using the Keras library in Python to create and train a CNN on the CIFAR-10 dataset, which consists of 60,000 32x32 color images in 10 classes.

import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

# Load and preprocess the CIFAR-10 dataset
(train_images, train_labels), (test_images, test_labels) = cifar10.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0 # Normalize pixel values

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Define the CNN model
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

# Train the model
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(f'Test accuracy: {test_acc}')

# Save the model
model.save('cnn_cifar10_model.h5')

Explanation of the Code

Data Loading and Preprocessing:

  • Load the CIFAR-10 dataset and normalize the pixel values to be between 0 and 1.
  • Convert the labels to categorical format using one-hot encoding.

Model Definition:

  • The model consists of three convolutional layers with increasing filter sizes (32, 64, 64), each followed by a ReLU activation function and a max pooling layer.
  • The output from the last convolutional layer is flattened and passed through a fully connected (dense) layer with 64 units and ReLU activation.
  • The final layer is a dense layer with 10 units (one for each class) and a softmax activation function to output the class probabilities.

Model Compilation:

  • Compile the model using the Adam optimizer, categorical cross-entropy loss, and accuracy as the evaluation metric.

Model Training:

  • Train the model for 10 epochs, using a portion of the training data as validation data.

Model Evaluation:

  • Evaluate the trained model on the test set to determine its accuracy.

Model Saving:

  • Save the trained model to a file for future use.

By following these steps, a CNN can be effectively trained to recognize patterns and objects within images, leading to robust computer vision applications.

--

--

Chanchala Gorale
Chanchala Gorale

Written by Chanchala Gorale

Founder | Product Manager | Software Developer

No responses yet