2018年11月17日 星期六

How to train custom objects in YOLOv2


This article is based on [1]. We wanna a way to train the object tags that we are interested. Darknet has a Windows version that is ported by AlexeyAB [2]. First of all, we need to build darknet.exe from AlexeyAB to help us train and test data. Go to build/darknet, using VS 2015 to open darknet.sln, and config it to x64 solution platform. Rebuild solution! It should be success to generate darknet.exe. Then, we need to label objects from images that are used for training data. I use BBox label tool to help me label objects' coordinates in images for training data. (python ./main.py) This tool's image root folder is at ./Images, we can create a sub-folder (002) and insert 002 to let this tool load all *.jpg files from there. We will mark labels in this tool to help us generate objects' region to mark where objects are. The outputs are the image-space coordinate in images and stored at ./Labels/002.
However, the format of this coordinate is different from YOLOv2, YOLOv2 needs the relative coordinate of the dimension of images. The BBox label output is
[obj number]
[bounding box left X] [bounding box top Y] [bounding box right X] [bounding box bottom Y],
and YOLOv2 wants

[category number] [object center in X] [object center in Y] [object width in X] [object width in Y].  

Therefore, we need a converter to do this conversion. We can get the converter from this script [4] add change Ln 34 and 35 for the path in and out. Then run python ./convert.py. Following, we have to move the output *.txt files and the *.jpg file to the same folder. Next, we begin to edit train.txt and test.txt to describe what images are our training set, and what are served as the test set.

In train.txt
data/002/images.jpg
data/002/images1.jpg
data/002/images2.jpg
data/002/images3.jpg

In test.txt
data/002/large.jpg
data/002/maxresdefault1.jpg
data/002/testimage2.jpg


Then, creating YOLOv2 configure files. In cfg/obj.data, editing it to define what train and test files are.
classes= 1 
train  = train.txt 
valid  = test.txt 
names = cfg/obj.names 
backup = backup/


 In cfg/
obj.names, adding the label names for training classes, like
Subaru


The final file, we duplicate the yolo-voc.cfg file as yolo-obj.cfg. Set batch=2 to make using 64 images for every training step. subdivisions=1 to adjust GPU VRAM requirements. classes=1, the number of categories we want to detect. In line 237: set filters=(classes + 5)*5 in our case filters=30.

Training
YOLOv2 requires a set of convolutional weights for training data, Darknet provides a set that was pre-trained on Imagenet. This conv.23 file can be downloaded (76Mb) from the official YOLOv2 website.

Type darknet.exe detector train cfg/obj.data cfg/yolo-obj.cfg darknet19_448.conv.23 to start training data in terminal.


Testing 
After training, we will get the trained weight in the backup folder. Just type darknet.exe detector test cfg/obj.data cfg/yolo-obj.cfg backup\yolo-obj_2000.weights data/testimage.jpg to verify our result.

[1] https://timebutt.github.io/static/how-to-train-yolov2-to-detect-custom-objects/
[2] https://github.com/AlexeyAB/darknet
[3] https://github.com/puzzledqs/BBox-Label-Tool
[4] https://github.com/Guanghan/darknet/blob/master/scripts/convert.py

2018年4月27日 星期五

Fast subsurface scattering

Fig.1 - Fast Subsurface scattering of Stanford Bunny

Based on the implementation of three.js. It provides a cheap, fast, and convincing approach to do ray-tracing in translucent surfaces. It refers the sharing in GDC 2011 [1], and the approach is used by Frostbite 2 and Unity engines [1][2][3]. Traditionally, when a ray intersects with surfaces, it needs to calculate the bouncing result after intersections. Materials can be divided into three types roughly. Opaque, lights can't go through its geometry and the ray will be bounced back. Transparency, the ray passes and allow it through the surface totally, it probably would loose a little energy after leaving. Translucency, the ray after entering the surface will be bounced internally like below Fig. 2.

Fig.2 - BSSRDF [1]

In the case of translucency, we have several subsurface scattering approaches to solve our problem. When a light is traveling inside the shape, that needs to consider the diffuse value influence according the varying thickness of objects. As the Fig. 3 below, when a light leaving a surface, it generates diffusion and has attenuation based on the thickness of the shapes.

Fig.3 - Translucent lighting [1]

Thus, we need to have a way to determine the thickness inside surfaces. The most direct way is calculating  ambient occlusion to get its local thickness into a thickness map. The thickness map as below Fig.4 can be easy to generate from DCC tools.

Fig.4 - Local thickness map of Stanford Bunny

Then, we can start to implement our approximate subsurface scattering approach.

void Subsurface_Scattering(const in IncidentLight directLight, const in vec2 uv, const in vec3 geometryViewDir, const in vec3 geometryNormal, inout vec3 directDiffuse) {
  vec3 thickness = thicknessColor * texture2D(thicknessMap, uv).r;
  vec3 scatteringHalf = normalize(directLight.direction + (geometryNormal * thicknessDistortion));
  float scatteringDot = pow(saturate(dot(geometryViewDir, -scatteringHalf)), thicknessPower) * thicknessScale;
  vec3 scatteringIllu = (scatteringDot + thicknessAmbient) * thickness;
  directDiffuse += scatteringIllu * thicknessAttenuation * directLight.color;
}

The tricky part of the exit light is its direction is opposite to the incident light.  Therefore,  we get the light attenuation with  dot(geometryViewDir, -scatteringHalf) as its attenuation. Besides, We have several parameters that can be discussed detailed.

thicknessAmbient
- Ambient light value
- Visible from all angles even at the back side of surfaces

thicknessPower
- Power value of direct translucency
- View independent

thicknessDistortion
- Subsurface distortion
- Shift the surface normal
- View dependent

thicknessMap
- Pre-computed local thickness map
- Attenuates the back diffuse color with the local thickness map
- Can be utilized for both of direct and indirect lights

Because the local thickness map is precomputed, it doesn't work for animated/morph objects and concave objects. The alternative way is via real-time ambient occlusion map and inverting its normal or doing real-time thickness map.


Reference:
[1] GDC 2011 – Approximating Translucency for a Fast, Cheap and Convincing Subsurface Scattering Look, https://colinbarrebrisebois.com/2011/03/07/gdc-2011-approximating-translucency-for-a-fast-cheap-and-convincing-subsurface-scattering-look/
[2] Fast Subsurface Scattering in Unity Part 1,  https://www.alanzucconi.com/2017/08/30/fast-subsurface-scattering-1/
[3] Fast Subsurface Scattering in Unity Part 2,  https://www.alanzucconi.com/2017/08/30/fast-subsurface-scattering-2/

2018年2月22日 星期四

Physically-Based Rendering in WebGL

According to the image from Physically Based Shading At Disney as below, the left is the real chrome, the middle is PBR approach, and the right is Blinn-Phong. We can find PBR is more closer to the real case, and the difference part is the specular lighting part.


Blinn-Phong

The most important part of specular term in Blinn-Phong is it uses half-vector instead of using dot(lightDir, normalDir) to avoid the traditional Phong lighting model hard shape problem.

vec3 BRDF_Specular_BlinnPhong( vec3 lightDir, vec3 viewDir, vec3 normal, vec3 specularColor, float shininess ) {
  vec3 halfDir = normalize( lightDir + viewDir );
  float dotNH = saturate( dot( normal, halfDir ) );
  float dotLH = saturate( dot( lightDir, halfDir ) );
  vec3 F = F_Schlick( specularColor, dotLH );
  float G = G_BlinnPhong_Implicit( );
  float D = D_BlinnPhong( shininess, dotNH );
  return F * ( G * D );
}

Physically-Based rendering

Regarding to the lighting model of GGX, UE4 Shading presentation by Brian Karis, it takes the Cook-Torrance separation of terms as three factors:

D) GGX Distribution
F) Schlick-Fresnel
V) Schlick approximation of Smith solved with GGX

float G1V(float dotNV, float k) {
  return 1.0 / (dotNV * (1.0 - k) + k);
}

float BRDF_Specular_GGX(vec3 N, vec3 V, vec3 L, float roughness, float f0) {
  float alpha = roughness * roughness;
  float H = normalize(V+L);

  float dotNL = saturate(dot(N, L));
  float dotNV = saturate(dot(N, V));
  float dotNH = saturate(dot(N, H));
  float dotLH = saturate(dot(L, H));

  float F, D, vis;

  // D
  float alphaSqr = alpha * alpha;
  float pi = 3.14159;
  float denom = dotNH * dotNH * (alphaSqr - 1.0) + 1.0;
  D = alphaSqr / (pi * denom * denom);

  // F
  float dotLH5 = pow(1.0 - dotLH, 5);
  F = f0 + (1.0 - f0) * (dotLH5);

  // V
  float k = alpha / 2.0;
  vis = G1V(dotNL, k) * G1V(dotNL, k);

  float specular = dotNL * D * F * vis;
  return specular;
}

Unreal engine utilizes an approximate approach from Physically Based Shading on Mobile. We can see the specular term is shorten for the performance of mobile platform. (three.js' Standard material adopts this approach as well)

half3 EnvBRDFApprox( half3 SpecularColor, half Roughness,half NoV )
{
  const half4 c0 = { -1, -0.0275, -0.572, 0.022 };
  const half4 c1 = { 1, 0.0425, 1.04, -0.04 };
  half4 r = Roughness * c0 + c1;
  half a004 = min( r.x * r.x, exp2( -9.28 * NoV ) ) * r.x + r.y;
  half2 AB = half2( -1.04, 1.04 ) * a004 + r.zw;
  return SpecularColor * AB.x + AB.y;
}

Result:

http://daoshengmu.github.io/dsmu/pbr/webgl_materials_pbr.html


Reference:
[1] GGX Shading Model For Metallic Reflections,  http://www.neilblevins.com/cg_education/ggx/ggx.htm
[2] Optimizing GGX Shaders with dot(L,H), http://filmicworlds.com/blog/optimizing-ggx-shaders-with-dotlh/
[3] Physically Based Shading in Call of Duty: Black Ops, http://blog.selfshadow.com/publications/s2013-shading-course/lazarov/s2013_pbs_black_ops_2_notes.pdf