Skip to main content

How to train custom objects in YOLOv2


This article is based on [1]. We wanna a way to train the object tags that we are interested. Darknet has a Windows version that is ported by AlexeyAB [2]. First of all, we need to build darknet.exe from AlexeyAB to help us train and test data. Go to build/darknet, using VS 2015 to open darknet.sln, and config it to x64 solution platform. Rebuild solution! It should be success to generate darknet.exe. Then, we need to label objects from images that are used for training data. I use BBox label tool to help me label objects' coordinates in images for training data. (python ./main.py) This tool's image root folder is at ./Images, we can create a sub-folder (002) and insert 002 to let this tool load all *.jpg files from there. We will mark labels in this tool to help us generate objects' region to mark where objects are. The outputs are the image-space coordinate in images and stored at ./Labels/002.
However, the format of this coordinate is different from YOLOv2, YOLOv2 needs the relative coordinate of the dimension of images. The BBox label output is
[obj number]
[bounding box left X] [bounding box top Y] [bounding box right X] [bounding box bottom Y],
and YOLOv2 wants

[category number] [object center in X] [object center in Y] [object width in X] [object width in Y].  

Therefore, we need a converter to do this conversion. We can get the converter from this script [4] add change Ln 34 and 35 for the path in and out. Then run python ./convert.py. Following, we have to move the output *.txt files and the *.jpg file to the same folder. Next, we begin to edit train.txt and test.txt to describe what images are our training set, and what are served as the test set.

In train.txt
data/002/images.jpg
data/002/images1.jpg
data/002/images2.jpg
data/002/images3.jpg

In test.txt
data/002/large.jpg
data/002/maxresdefault1.jpg
data/002/testimage2.jpg


Then, creating YOLOv2 configure files. In cfg/obj.data, editing it to define what train and test files are.
classes= 1 
train  = train.txt 
valid  = test.txt 
names = cfg/obj.names 
backup = backup/


 In cfg/
obj.names, adding the label names for training classes, like
Subaru


The final file, we duplicate the yolo-voc.cfg file as yolo-obj.cfg. Set batch=2 to make using 64 images for every training step. subdivisions=1 to adjust GPU VRAM requirements. classes=1, the number of categories we want to detect. In line 237: set filters=(classes + 5)*5 in our case filters=30.

Training
YOLOv2 requires a set of convolutional weights for training data, Darknet provides a set that was pre-trained on Imagenet. This conv.23 file can be downloaded (76Mb) from the official YOLOv2 website.

Type darknet.exe detector train cfg/obj.data cfg/yolo-obj.cfg darknet19_448.conv.23 to start training data in terminal.


Testing 
After training, we will get the trained weight in the backup folder. Just type darknet.exe detector test cfg/obj.data cfg/yolo-obj.cfg backup\yolo-obj_2000.weights data/testimage.jpg to verify our result.

[1] https://timebutt.github.io/static/how-to-train-yolov2-to-detect-custom-objects/
[2] https://github.com/AlexeyAB/darknet
[3] https://github.com/puzzledqs/BBox-Label-Tool
[4] https://github.com/Guanghan/darknet/blob/master/scripts/convert.py

Comments

Popular posts from this blog

tex2D vs. tex2Dproj

texCoord(  texX, texY, texZ , texW ) means texture coordinate after transforming from the texture matrix [ position goes through world, view, projection, and texture ] . We can use texCoord to fetch texture: 1.  float4 color = tex2D( sampler,  float2( texX / texW, texY/texW ); 2.  float4 color = tex2Dporj( sampler, float4( texX, texY, texZ, texW ) ); The top-two methods will have the same result. Because tex2Dproj operator supports divide w in its interface. http://www.gamedev.net/community/forums/topic.asp?topic_id=408894 http://bbs.gameres.com/showthread.asp?threadid=104316

Fast subsurface scattering

Fig.1 - Fast Subsurface scattering of Stanford Bunny Based on the implementation of three.js . It provides a cheap, fast, and convincing approach to do ray-tracing in translucent surfaces. It refers the sharing in GDC 2011  [1], and the approach is used by Frostbite 2 and Unity engines [1][2][3]. Traditionally, when a ray intersects with surfaces, it needs to calculate the bouncing result after intersections. Materials can be divided into three types roughly.  Opaque , lights can't go through its geometry and the ray will be bounced back.  Transparency , the ray passes and allow it through the surface totally, it probably would loose a little energy after leaving.  Translucency , the ray after entering the surface will be bounced internally like below Fig. 2. Fig.2 - BSSRDF [1] In the case of translucency, we have several subsurface scattering approaches to solve our problem. When a light is traveling inside the shape, that needs to consider the diffuse value influe

Bringing Large Scale Console Game to iOS

Y8ObbsOsNQEhMCEhMCEt The Bard's Tale Why? Device are fast enough Market segment is not as crowded Hugh potential user base Rich back data log of content Potentially very low cost Session1: port process - application framework  - development workflow Session 2 port-port - performance/memory opt. OpenGL Cocoa Touch App Bring it all together Workflow: Data deployment OpenAL limit 32 active sources, DirectSound 256 channels on XBOX -Sluggish -5.8G -> 2Glimited on iOS 60Hz?    - terriable for battery life 30 Hz   - Game may depend on 60Hz Hybrid    -   60hz, functionsliity intact   - 30hz : low GPU/ save energy    -60hz, at 5th device is optional VFP?? Candidate for NEON SIMD SGX GPUs: a few stats Render opt. -minimal vertex format size -texture size and mipmapping Alpha test and SGX - Fragment discard expensive -huge impact on fill rate -use alpha blend at all costs Eliminating Alpha blend