2018年11月17日 星期六

How to train custom objects in YOLOv2


This article is based on [1]. We wanna a way to train the object tags that we are interested. Darknet has a Windows version that is ported by AlexeyAB [2]. First of all, we need to build darknet.exe from AlexeyAB to help us train and test data. Go to build/darknet, using VS 2015 to open darknet.sln, and config it to x64 solution platform. Rebuild solution! It should be success to generate darknet.exe. Then, we need to label objects from images that are used for training data. I use BBox label tool to help me label objects' coordinates in images for training data. (python ./main.py) This tool's image root folder is at ./Images, we can create a sub-folder (002) and insert 002 to let this tool load all *.jpg files from there. We will mark labels in this tool to help us generate objects' region to mark where objects are. The outputs are the image-space coordinate in images and stored at ./Labels/002.
However, the format of this coordinate is different from YOLOv2, YOLOv2 needs the relative coordinate of the dimension of images. The BBox label output is
[obj number]
[bounding box left X] [bounding box top Y] [bounding box right X] [bounding box bottom Y],
and YOLOv2 wants

[category number] [object center in X] [object center in Y] [object width in X] [object width in Y].  

Therefore, we need a converter to do this conversion. We can get the converter from this script [4] add change Ln 34 and 35 for the path in and out. Then run python ./convert.py. Following, we have to move the output *.txt files and the *.jpg file to the same folder. Next, we begin to edit train.txt and test.txt to describe what images are our training set, and what are served as the test set.

In train.txt
data/002/images.jpg
data/002/images1.jpg
data/002/images2.jpg
data/002/images3.jpg

In test.txt
data/002/large.jpg
data/002/maxresdefault1.jpg
data/002/testimage2.jpg


Then, creating YOLOv2 configure files. In cfg/obj.data, editing it to define what train and test files are.
classes= 1 
train  = train.txt 
valid  = test.txt 
names = cfg/obj.names 
backup = backup/


 In cfg/
obj.names, adding the label names for training classes, like
Subaru


The final file, we duplicate the yolo-voc.cfg file as yolo-obj.cfg. Set batch=2 to make using 64 images for every training step. subdivisions=1 to adjust GPU VRAM requirements. classes=1, the number of categories we want to detect. In line 237: set filters=(classes + 5)*5 in our case filters=30.

Training
YOLOv2 requires a set of convolutional weights for training data, Darknet provides a set that was pre-trained on Imagenet. This conv.23 file can be downloaded (76Mb) from the official YOLOv2 website.

Type darknet.exe detector train cfg/obj.data cfg/yolo-obj.cfg darknet19_448.conv.23 to start training data in terminal.


Testing 
After training, we will get the trained weight in the backup folder. Just type darknet.exe detector test cfg/obj.data cfg/yolo-obj.cfg backup\yolo-obj_2000.weights data/testimage.jpg to verify our result.

[1] https://timebutt.github.io/static/how-to-train-yolov2-to-detect-custom-objects/
[2] https://github.com/AlexeyAB/darknet
[3] https://github.com/puzzledqs/BBox-Label-Tool
[4] https://github.com/Guanghan/darknet/blob/master/scripts/convert.py

2018年4月27日 星期五

Fast subsurface scattering

Fig.1 - Fast Subsurface scattering of Stanford Bunny

Based on the implementation of three.js. It provides a cheap, fast, and convincing approach to do ray-tracing in translucent surfaces. It refers the sharing in GDC 2011 [1], and the approach is used by Frostbite 2 and Unity engines [1][2][3]. Traditionally, when a ray intersects with surfaces, it needs to calculate the bouncing result after intersections. Materials can be divided into three types roughly. Opaque, lights can't go through its geometry and the ray will be bounced back. Transparency, the ray passes and allow it through the surface totally, it probably would loose a little energy after leaving. Translucency, the ray after entering the surface will be bounced internally like below Fig. 2.

Fig.2 - BSSRDF [1]

In the case of translucency, we have several subsurface scattering approaches to solve our problem. When a light is traveling inside the shape, that needs to consider the diffuse value influence according the varying thickness of objects. As the Fig. 3 below, when a light leaving a surface, it generates diffusion and has attenuation based on the thickness of the shapes.

Fig.3 - Translucent lighting [1]

Thus, we need to have a way to determine the thickness inside surfaces. The most direct way is calculating  ambient occlusion to get its local thickness into a thickness map. The thickness map as below Fig.4 can be easy to generate from DCC tools.

Fig.4 - Local thickness map of Stanford Bunny

Then, we can start to implement our approximate subsurface scattering approach.

void Subsurface_Scattering(const in IncidentLight directLight, const in vec2 uv, const in vec3 geometryViewDir, const in vec3 geometryNormal, inout vec3 directDiffuse) {
  vec3 thickness = thicknessColor * texture2D(thicknessMap, uv).r;
  vec3 scatteringHalf = normalize(directLight.direction + (geometryNormal * thicknessDistortion));
  float scatteringDot = pow(saturate(dot(geometryViewDir, -scatteringHalf)), thicknessPower) * thicknessScale;
  vec3 scatteringIllu = (scatteringDot + thicknessAmbient) * thickness;
  directDiffuse += scatteringIllu * thicknessAttenuation * directLight.color;
}

The tricky part of the exit light is its direction is opposite to the incident light.  Therefore,  we get the light attenuation with  dot(geometryViewDir, -scatteringHalf) as its attenuation. Besides, We have several parameters that can be discussed detailed.

thicknessAmbient
- Ambient light value
- Visible from all angles even at the back side of surfaces

thicknessPower
- Power value of direct translucency
- View independent

thicknessDistortion
- Subsurface distortion
- Shift the surface normal
- View dependent

thicknessMap
- Pre-computed local thickness map
- Attenuates the back diffuse color with the local thickness map
- Can be utilized for both of direct and indirect lights

Because the local thickness map is precomputed, it doesn't work for animated/morph objects and concave objects. The alternative way is via real-time ambient occlusion map and inverting its normal or doing real-time thickness map.


Reference:
[1] GDC 2011 – Approximating Translucency for a Fast, Cheap and Convincing Subsurface Scattering Look, https://colinbarrebrisebois.com/2011/03/07/gdc-2011-approximating-translucency-for-a-fast-cheap-and-convincing-subsurface-scattering-look/
[2] Fast Subsurface Scattering in Unity Part 1,  https://www.alanzucconi.com/2017/08/30/fast-subsurface-scattering-1/
[3] Fast Subsurface Scattering in Unity Part 2,  https://www.alanzucconi.com/2017/08/30/fast-subsurface-scattering-2/

2018年2月22日 星期四

Physically-Based Rendering in WebGL

According to the image from Physically Based Shading At Disney as below, the left is the real chrome, the middle is PBR approach, and the right is Blinn-Phong. We can find PBR is more closer to the real case, and the difference part is the specular lighting part.


Blinn-Phong

The most important part of specular term in Blinn-Phong is it uses half-vector instead of using dot(lightDir, normalDir) to avoid the traditional Phong lighting model hard shape problem.

vec3 BRDF_Specular_BlinnPhong( vec3 lightDir, vec3 viewDir, vec3 normal, vec3 specularColor, float shininess ) {
  vec3 halfDir = normalize( lightDir + viewDir );
  float dotNH = saturate( dot( normal, halfDir ) );
  float dotLH = saturate( dot( lightDir, halfDir ) );
  vec3 F = F_Schlick( specularColor, dotLH );
  float G = G_BlinnPhong_Implicit( );
  float D = D_BlinnPhong( shininess, dotNH );
  return F * ( G * D );
}

Physically-Based rendering

Regarding to the lighting model of GGX, UE4 Shading presentation by Brian Karis, it takes the Cook-Torrance separation of terms as three factors:

D) GGX Distribution
F) Schlick-Fresnel
V) Schlick approximation of Smith solved with GGX

float G1V(float dotNV, float k) {
  return 1.0 / (dotNV * (1.0 - k) + k);
}

float BRDF_Specular_GGX(vec3 N, vec3 V, vec3 L, float roughness, float f0) {
  float alpha = roughness * roughness;
  float H = normalize(V+L);

  float dotNL = saturate(dot(N, L));
  float dotNV = saturate(dot(N, V));
  float dotNH = saturate(dot(N, H));
  float dotLH = saturate(dot(L, H));

  float F, D, vis;

  // D
  float alphaSqr = alpha * alpha;
  float pi = 3.14159;
  float denom = dotNH * dotNH * (alphaSqr - 1.0) + 1.0;
  D = alphaSqr / (pi * denom * denom);

  // F
  float dotLH5 = pow(1.0 - dotLH, 5);
  F = f0 + (1.0 - f0) * (dotLH5);

  // V
  float k = alpha / 2.0;
  vis = G1V(dotNL, k) * G1V(dotNL, k);

  float specular = dotNL * D * F * vis;
  return specular;
}

Unreal engine utilizes an approximate approach from Physically Based Shading on Mobile. We can see the specular term is shorten for the performance of mobile platform. (three.js' Standard material adopts this approach as well)

half3 EnvBRDFApprox( half3 SpecularColor, half Roughness,half NoV )
{
  const half4 c0 = { -1, -0.0275, -0.572, 0.022 };
  const half4 c1 = { 1, 0.0425, 1.04, -0.04 };
  half4 r = Roughness * c0 + c1;
  half a004 = min( r.x * r.x, exp2( -9.28 * NoV ) ) * r.x + r.y;
  half2 AB = half2( -1.04, 1.04 ) * a004 + r.zw;
  return SpecularColor * AB.x + AB.y;
}

Result:

http://daoshengmu.github.io/dsmu/pbr/webgl_materials_pbr.html


Reference:
[1] GGX Shading Model For Metallic Reflections,  http://www.neilblevins.com/cg_education/ggx/ggx.htm
[2] Optimizing GGX Shaders with dot(L,H), http://filmicworlds.com/blog/optimizing-ggx-shaders-with-dotlh/
[3] Physically Based Shading in Call of Duty: Black Ops, http://blog.selfshadow.com/publications/s2013-shading-course/lazarov/s2013_pbs_black_ops_2_notes.pdf

2017年7月13日 星期四

Setup TensorFlow with GPU support on Windows

TensorFlow with GPU support brings higher speed for computation than CPU-only. But, you need some additional settings especially for CUDA. First of all, we need to follow the guideline from https://www.tensorflow.org/install/install_windows. TensorFlow on Windows currently only has Python 3 support, I suggest to use python 3.5.3 or below. Then, install CUDA 8.0 and download cuDNN v6.0.

Then, move the files from cuDNN v6.0 that you already download to the path where you installed CUDA 8.0, like "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0", following the steps as below:

cudnn-8.0-windows10-x64-v6.0\cuda\bin\cudnn64_6.dll ---------------- CUDA\v8.0\bin
cudnn-8.0-windows10-x64-v6.0\cuda\include\cudnn.h ---------------------- CUDA\v8.0\include
cudnn-8.0-windows10-x64-v6.0\cuda\lib\x64\cudnn.lib --------------------- CUDA\v8.0\lib\x64

Don't need to add the folder path of cudnn-8.0-windows10-x64-v6.0 to your %PATH%. Now, we can start to confirm our installation is ready.

Steps:
1. Create a virtualenv under your working folder:
virtualenv --system-site-packages tensorflow
2. Activate it
tensorflow\Scripts\activate
It shows (tensorflow)$
3. Install TensorFlow with GPU support
pip3 install --upgrade tensorflow-gpu
4. Import TensorFlow to confirm it is ready
(tensorflow) %YOUR_PATH%\tensorflow>python
Python 3.5.2 [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>>

If it doesn't show anything, that means it works.
But, if you see error messages like, No module named '_pywrap_tensorflow_internal', you can take a look at issue 9469, 7705. It should be the cudnn version problem or cudnn can't be found. Please follow the method that I mentioned above.

 
 
 

2016年8月20日 星期六

Webrender 1.0

Source code: https://github.com/servo/webrender

2016年8月18日 星期四

AR on the Web

Because of the presence of Pokémon Go, lots of people start to discuss the possibility of AR (augmented reality) on the Web. Thanks for Jerome Etienne's slides, it brings me some idea to make this AR demo.

First of all, it is based on three.js and js-aruco. three.js is a WebGL framework that helps us construct and load 3D models. js-aruco is Javascript version of ArUco that is a minimal library for Augmented Reality applications based on OpenCV. These two project make it is possible to implement a Web AR proof of concept.

Then, I would like to introduce how to implement this demo. First, we need to use navigator.getUserMedia to give us the video stream from our webcam. This function is not supported on all browser vendors. Please take a look at the status.


navigator.getUserMedia = ( navigator.getUserMedia ||
                       navigator.webkitGetUserMedia ||
                       navigator.mozGetUserMedia ||
                       navigator.msGetUserMedia);

if (navigator.getUserMedia) {
    navigator.getUserMedia( { 'video': true }, gotStream, noStream);
}

The above code shows us how to get media stream in Javascript. In this demo, I just need video, and it will be sent to gotStream callback function. In gotSteam function, I give the stream to my video element that will be displayed on screen. And then, go to setupAR module. In setupAR(), I have to initialize my AR module, and setup my model and scene scale. Furthermore, I just need to wait the new videoStream coming and get my AR detect result from js-aruco at updateVideoStream() function.

In updateVideoStream(), like the above picture, it draws the current videoStream to an imageData that is maintained by a Canvas2D. Go on, the imageData is sent to arDetector to investigate if there is any marker on it. It will return a marker array that contains markers are detected from this imageData. Every marker owns the corners (x, y) coordinate of a marker. We can use these corner coordinates to do lots of applications. In my demo, I draw the corners and the marker id on it. The most interesting part is we can leverage markers to update the pose of a 3D model.

POS.Posit gives us a library to assist us get the transformation pose from the corners. In a pose, it contains a rotation matrix and a translation vector in a 3D space. Therefore, it is very easy for us to show a 3D model on markers except we need to do some coordinate conversion. First, we need to keep in mind video stream is in a 2D space, so it makes sense that we have to transform the corners to 3D space.


for (i = 0; i < corners.length; ++ i){
   corner = corners[i];
   // to 2D canvas space to 3D world space
   corner.x = corner.x - (canvas.width / 2);
   corner.y = (canvas.height/2) - corner.y;
}
Moreover, we need to apply this rotation matrix to the 3D model's rotation vector.
   dae.rotation.x = -Math.asin(-rotation[1][2]);
   dae.rotation.y = -Math.atan2(rotation[0][2], rotation[2][2]) - 90;
   dae.rotation.z = Math.atan2(rotation[1][0], rotation[1][1]);

At last, set the position to the 3D model.
   dae.position.x = translation[0];
   dae.position.y = translation[1];
   dae.position.z = -translation[2] * offsetScale;


Demo video: https://www.youtube.com/watch?v=68O5w1oIURM
Demo link: http://daoshengmu.github.io/ConsoleGameOnWeb/webar.html (Best for Firefox)

2016年7月17日 星期日

How to setup RustDT

RustDT is the IDE for Rust. If you are a guy like me who need a IDE for learning language and developing efficiently, you must have a try on RustDT(https://github.com/RustDT/RustDT/blob/latest/documentation/UserGuide.md#user-guide)

Enable code complete.















Here you go!