Skip to content

Using deep learning to classify 43 categories of German traffic signs. Project #2 in Udacity's Self-Driving Car Nanodegree.

License

Notifications You must be signed in to change notification settings

SlothFriend/CarND-Term1-P2

 
 

Repository files navigation

Project: Build a Traffic Sign Recognition Program

Udacity - Self-Driving Car NanoDegree

Overview

This project is about using deep learning, specifically convolutional neural networks, to classify images of German traffic signs. This is an important problem, for being able to classify an image of a traffic sign according to its type will allow a self-driving car to make important decisions.

I wrote a less technical write-up of this project on Medium here.

Exploring the Dataset

In Step 1, I explore the dataset and identify some relevant information:

  • There 34,799 training images and 12,630 testing images
  • The images are indeed 32x32 pixels with RGB color channels
  • There are 43 different classes of traffic signs labeled 0-42

Pre-Processing the Data

I did very minimal pre-processing. With about 35,000 training images, I didn’t feel like augmenting the data was necessary to get good results, so I used only the raw images from the dataset.

I kept the images in RGB instead of converting to grayscale, because I felt like the color information was not only valuable but important. One major way to identify signs is by the colors they include, and these colors are often simply used and easily distinguishable. I figured these traits should be picked up by an appropriate model.

The two pre-processing steps I performed are:

  1. Shuffling
  2. Normalizing

I pre-shuffled the data just to make sure it was good and mixed - no groupings of sign types to oddly influence the development of the model. Finally, I normalized the data to [0, 1] by dividing the numpy array by 255.

Conveniently, the data came pre-divided into training, validation, and test sets. All I had to do was pre-process each of these sets accordingly.

The Model

LeNet architecture

My model can best be described as the standard LeNet architecture (pictured above) with a few tweaks. The two major tweaks I made are:

  1. Account for color images instead of just grayscale
  2. Add a third convolutional layer

The table below shows each layer in the model. I used convolutions, activations, maxpooling, and linear combinations to classify the 32x32 color images into one of 43 traffic sign categories.

All activations used are ReLU’s.

Layer Description
Input Pass in the RGB images of shape 32x32x3
Convolution with Activation Transform the space from 32x32x3 to 28x28x18 using a 5x5 filter. No maxpooling is used here to preserve space for future convolutions.
Convolution with Activation Transform the space from 28x28x18 to 24x24x48 using a 5x5 filter.
Maxpool Condense the space to 12x12x48
Convolution with Activation Perform a final convolution turning the 12x12x48 space into an 8x8x96 space using a 4x4 filter. (A 4x4 filter was used instead of 5x5 to avoid shrinking the space too far.)
Maxpool Condense the space to 4x4x96, bringing the total volume from 6,144 down to 1,536.
Flatten Flatten the space down to one dimension of size 1,536 in preparation for the fully-connected layers.
Fully-Connected with Activation Calculate linear combinations of the features bringing the space down to size 360.
Fully-Connected with Activation Calculate linear combinations of the features bringing the space down to size 252.
Output Calculate linear combinations of the 252 features to produce estimates for each of the 43 classes of traffic signs.

Why this Model?

While there is nothing strange or revolutionary about my model, it’s worth visiting how I arrived at this architecture.

I started with the the standard LeNet model and first adjusted for three color channels. This produced similar results to the grayscale images. I expected additional color data to improve the model, but it didn’t.

To adjust for the additional color channels, I basically multiplied the output sizes of each layer by three. I figured with more data to digest there needed to be more space to capture those additional relationships. This expanding of the model increased the validation accuracy from 88% to 93% - quite an improvement.

Finally I reasoned that there is probably more abstraction to identify in a traffic sign as opposed to a handwritten digit. To accommodate for this I added a third convolutional layer and adjusted the rest of the model accordingly. Adding a third convolutional layer further improved the validation accuracy from 93% to 97%.

Training the Model

After designing the model, I built the training pipeline and executed it with the model. I one-hot encoded the labels, used softmax cross entropy to determine loss, and used the Adam optimizer.

The learning rate was set to 0.001, the batch size to 128, and the number of epochs to 10.

The final results from training were a validation accuracy of 97% and a training accuracy of 95%. While those are close enough, I was a tad disappointed the testing accuracy was lower. This probably indicates a bit of overfitting in the model, but I’m not certain how big of an indicator a 2% difference is.

Testing the Network on My Own Images

As a helpful exercise I downloaded five different pictures of German traffic signs from the web to see how my model would classify them. The five images, after pre-processing, are shown below.

30 speed limit 100 speed limit Yield Road work Go straight or right

The results of the model’s predictions for these five images (in order) are summarized in the table below.

Actual Sign Predicted Sign Softmax Probability
30 speed limit 30 speed limit 77%
100 speed limit 100 speed limit 100%
Yield Yield 100%
Road work Road work 100%
Go straight or right Keep right 68%

As seen from the table, the model had a total performance of 80% on these five signs. It’s comforting to me that the model mistook the “go straight or right” sign for a “keep right” sign, since those are at least close in nature.

About

Using deep learning to classify 43 categories of German traffic signs. Project #2 in Udacity's Self-Driving Car Nanodegree.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 69.0%
  • Jupyter Notebook 31.0%