This is an experimental Tensor Flow implementation of Faster RCNN (TFFRCNN), mainly based on the work of smallcorgi and rbgirshick. I have re-organized the libraries under lib
path, making each of python modules independent to each other, so you can understand, re-write the code easily.
For details about R-CNN please refer to the paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks by Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun.
- Resnet networks support
- KITTI object detection dataset support
- Position Sensitive ROI Pooling (psroi_pooling), not testing yet
- Hard Example Mining
- Data Augment
- PVANet
- Tensorflow 1.0
- Multi-layer Architecture (HyperNet)
- more hacks...
Requirements for Tensorflow (see: Tensorflow)
Python packages you might not have:
(recommend to install: Anaconda)
- For training the end-to-end version of Faster R-CNN with VGG16, 3G of GPU memory is sufficient (using CUDNN)
- Clone the Faster R-CNN repository
git clone
- Build the Cython modules
cd TFFRCNN/lib make # compile cython and roi_pooling_op, you may need to modify for your platform
After successfully completing basic installation, you'll be ready to run the demo.
To run the demo
python ./faster_rcnn/ --model model_path
The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007.
Download the training, validation, test data and VOCdevkit
wget wget wget
Extract all of these tars into one directory named
tar xvf VOCtrainval_06-Nov-2007.tar tar xvf VOCtest_06-Nov-2007.tar tar xvf VOCdevkit_08-Jun-2007.tar
It should have this basic structure
$VOCdevkit/ # development kit $VOCdevkit/VOCcode/ # VOC utility code $VOCdevkit/VOC2007 # image sets, annotations, etc. # ... and several other directories ...
Create symlinks for the PASCAL VOC dataset
cd $TFFRCNN/data ln -s $VOCdevkit VOCdevkit2007
Download pre-trained model VGG16 and put it in the path
Run training scripts
cd $TFFRCNN python ./faster_rcnn/ --gpu 0 --weights ./data/pretrain_model/VGG_imagenet.npy --imdb voc_2007_trainval --iters 70000 --cfg ./experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train --set EXP_DIR exp_dir
Run a profiling
cd $TFFRCNN # install a visualization tool sudo apt-get install graphviz ./experiments/profiling/ # generate an image ./experiments/profiling/profile.png
Download the KITTI detection dataset
Extract all of these tar into
and the directory structure looks like this:KITTI |-- training |-- image_2 |-- [000000-007480].png |-- label_2 |-- [000000-007480].txt |-- testing |-- image_2 |-- [000000-007517].png |-- label_2 |-- [000000-007517].txt
Convert KITTI into Pascal VOC format
cd $TFFRCNN ./experiments/scripts/ \ --kitti $TFFRCNN/data/KITTI --out $TFFRCNN/data/KITTIVOC
The output directory looks like this:
KITTIVOC |-- Annotations |-- [000000-007480].xml |-- ImageSets |-- Main |-- [train|val|trainval].txt |-- JPEGImages |-- [000000-007480].jpg
Training on
is just like on Pascal VOC 2007python ./faster_rcnn/ \ --gpu 0 \ --weights ./data/pretrain_model/VGG_imagenet.npy \ --imdb kittivoc_train \ --iters 160000 \ --cfg ./experiments/cfgs/faster_rcnn_kitti.yml \ --network VGGnet_train