2010年12月19日 星期日

本年度的引擎開發結束了

這禮拜今年引擎所規劃的開發工作都即將一一被驗收完畢,這套系統從我進公司時從無到有到現在技術一一被實作完畢,從那時剛畢業一無所知到現在可以慢慢獨當一面,我確實從這過程成長了許多。

這套引擎看不到要給誰用,能夠幫上哪個專案。這是讓我感到灰心的...我為了它一個禮拜加了三四天的班,我想要的只是讓台灣知道我們的技術能力,想要乘著這艘船挑戰著世界...但這艘船始終無法啟航。我實在有點無力,熱情一點點被熄滅,現在什麼東西只要有就好了,我已經無法在繼續要求自己再更好了,我對其他同事也不再要求,只要大家能開心就好了...

不管明年的新目標是什麼,明年可能又是做一艘新的船,可能不做船了,不管結果怎樣,我還是讓自己成為了一個船匠,謝謝一起努力的夥伴們。

Cascaded shadow map implement done

This week I finished cascade shadow map implement, I use two split to save the shadow map. The fine one can show clear result, when the objects are far enough to out view, we save them to the rough texture. Fine texture is close to 1024 x 1024, the rough texture can be 512x512, after the distance I don't want to wrtie to my texture.

I provide splitlambda to adjust image resolution between fine and rough texture, and also can use splitscale to solve the situation when texel is close to the near plane.


                                             Perspective view - has a little artifact

Orthogonal view - improve the artifact

2010年12月12日 星期日

web plugin default resource folder

IE 8.0
your desktop

Google chrome
C:\Documents and Settings\Administrator\Local Settings\Application Data\Google\Chrome\Application\8.0.552.215

Mozilla  Firefox
C:\Program Files\Mozilla Firefox

The windows program named c:\windows\regsvr32.exe is used to register and unregister ActiveX DLL's and OCX's. (An OCX file is merely a renamed DLL file.) You RUN regsvr32 as follows:
                 Register:  regsvr32 dllFileName.dll
               UnRegister:  regsvr32 /u dllFileName.dll
 

"plugins": [
    { "path": "your_npapi_plugin.dll" }    
  ],
 
<script>
var plugin = document.getElementById("MyNPAPIPluginId");
...</script>
 
 
http://stackoverflow.com/questions/9392536/developing-chrome-extensions-
using-npapi-in-c
 
http://www.firebreath.org/display/documentation/Deploying+and+updating
+your+plugin 

2010年11月26日 星期五

tex2D vs. tex2Dproj

texCoord(  texX, texY, texZ , texW ) means texture coordinate after transforming from the texture matrix [ position goes through world, view, projection, and texture ].


We can use texCoord to fetch texture:
1.  float4 color = tex2D( sampler,  float2( texX / texW, texY/texW );
2.  float4 color = tex2Dporj( sampler, float4( texX, texY, texZ, texW ) );

The top-two methods will have the same result. Because tex2Dproj operator supports divide w in its interface.


http://www.gamedev.net/community/forums/topic.asp?topic_id=408894
http://bbs.gameres.com/showthread.asp?threadid=104316

2010年11月19日 星期五

Cascaded Shadow Map

http://http.developer.nvidia.com/GPUGems3/gpugems3_ch10.html

http://www.gamedev.net/community/forums/topic.asp?topic_id=399014

http://appsrv.cse.cuhk.edu.hk/~fzhang/pssm_vrcia/

http://hax.fi/asko/PSSM.html

hardware shadow map
http://developer.nvidia.com/object/hwshadowmap_paper.html
http://developer.nvidia.com/forums/index.php?showtopic=34
http://forum.beyond3d.com/showthread.php?t=43182
http://www.gamedev.net/community/forums/topic.asp?topic_id=515318

2010年11月5日 星期五

The d3d example in browsers - IE

http://www.gamedev.net/community/forums/topic.asp?topic_id=495203

http://stackoverflow.com/questions/202567/web-browser-in-a-fullscreen-direct3d-application

http://www.ozzu.com/programming-forum/and-directx-plus-web-browser-t95817.html

http://bytes.com/topic/c-sharp/answers/237895-directx-window-web-browser

http://efreedom.com/Question/1-202567/Web-Browser-Fullscreen-Direct3D-Application

http://thedailyreviewer.com/dotnet/view/directx-window-in-web-browser-104239128

http://ubrowser.com/

cross platform
http://www.firebreath.org/display/documentation/Building+on+Windows#BuildingonWindows-BuildingtheFireBreathPlugin 

If your firefox 3.5 crash on initialize d3d app, this problem is d3d create device type, you should try to add D3DCREATE_FPU_PRESERVE this flag. (http://stackoverflow.com/questions/1310034/directx-firefox-plugin-rendering-artifacts/1321058#1321058)

Result in firefox, google chrome, and IE:

An OpenGL Sample as Firefox Plugin

An OpenGL Sample as Firefox Plugin

http://www.codeproject.com/KB/openGL/FirefoxOpenGL.aspx?fid=459173&df=90&mpp=25&noise=3&sort=Position&view=Quick&select=2821101&fr=1


Building Firefox Plugins using Visual Studio

http://www.apijunkie.com/APIJunkie/blog/post/2008/09/Building-Firefox-Plugins-using-Visual-Studio.aspx

2010年11月2日 星期二

Volume Texel


In OGRE Engine, you should have a 3D texture and was implemnted in fixed function.

material Examples/VTDarkStuff
{
    technique
    {
        pass
        {
            diffuse 0.0 0.0 0.0
            ambient 0.1 0.1 0.15
            cull_hardware none
            lighting on
        }
      
    }
}

2010年10月30日 星期六

Volumetric Fog

Volumetric Fog
http://www.gamedev.net/reference/articles/article677.asp 

Volumetric Rendering in Realtime
http://www.gamasutra.com/view/feature/3033/volumetric_rendering_in_realtime.php 

Volumetric Fog II 
http://www.evl.uic.edu/sjames/cs525/shader.html 

Volumetric Fog

http://www.apgardner.karoo.net/gl/fog.html

Fog Polygon Volumes

http://developer.download.nvidia.com/SDK/9.5/Samples/samples.html#FogPolygonVolumes3

Crysis volFog
http://www.youtube.com/watch?v=UuTTAa4R4dA

UE3: Fog Volumes

http://udn.epicgames.com/Three/FogVolumes.html

2010年10月25日 星期一

MRT format

When I use different format render target in deferred lighting. D3DFMT_A8R8G8B8 is better than D3DFMT_A16B16G16R16F when you don't have precision issue. I have 530 vs. 487 FPS experiment result.

2010年10月18日 星期一

PS3 Cell Processor


http://www.ps3station.com/cell.php

2010年10月17日 星期日

感謝...又突破一個瓶頸

我自己有時候會是一個悲觀性格的人,遇到bug的時候就會責怪自己數學不好英文也不好,開始質疑自己對於這方面的工作是不是難以勝任...別人可能很快就可以搞懂的東西我卻要比別人多花時間...

但今天把工作帶回家解決掉一個花費將近一個禮拜的bug,又獲得一枚經驗值。真不知道是不是自己又太幸運了~~~證明自己還是可以繼續努力,加油!!!  明天有國外客戶要來交流,被叮嚀說一定要開口問問題! 所以我爛爛的英文對話要拿出來面對大家了,希望能夠及格,我真要開始找個時間好好補習了...

Deferred lighting

     Today I finish my deferred lighting debug, there are some experience notes:
  1. In the geometry pass, I prepare 4 render target
    • depth map( view position and depth, actually we can try to just store the depth value only) - remove, we can just save clip space depth(z/w) to instead
    • normal map( view space normal ) + clip space depth(z/w)
    • diffuse map( diffuse color)
    • specular map( specular color and shininess )
  2. Draw lights
    • First, draw ambient light, we fetch the diffuse texture compute with ambient light  and fetch depth value from depth map write to the frame buffer and depth buffer. If you have z-prepass needn't rewrite depth to the z buffer.
    • Second, sun light draw a full screen rectangle( is clip space coordinate ) compute the directional lighting, multiply among diffuse, specular map and lighting formula. Enable alpha blending one plus one, disable depth write
    • Third, omni light draw the light volume( a sphere ), use point lighting formula. Enable alpha blending one plus one, use front-face culling, disable depth write, and depth test use greater than. We can use clip space depth to transform pixel to the view space and do lighting. Importantly, the texture coordinate of the MRT maps, must in the pixel shader transform from screen space to image space !!! otherwise you will see the light volume slip on your screen.

           float2 vTexcoord = IN.texCoord.xy / IN.texCoord.z;   
          vTexcoord.x = 0.5 * (1 + vTexcoord.x);
          vTexcoord.y = 0.5 * (1 - vTexcoord.y);

      thank for this article(http://www.gamedev.net/community/forums/topic.asp?topic_id=557716) give me idea.
    • Fourth,  spot light just draw a cone mesh, this kind of light we don't not support now.


      The bottom figure is ambient + sun + three omni light result:

                 World space deferred lighting discussion: http://www.gamedev.net/community/forum/topic.asp?topic_id=544144&whichpage=1&#3508871

2010年10月15日 星期五

D24S8 depth sampling wrong

http://www.gamedev.net/community/forums/topic.asp?topic_id=515318

2010年10月1日 星期五

Deferred Rendering

Deferred Shading - nvidia
http://developer.nvidia.com/object/6800_leagues_deferred_shading.html
 
Deferred Rendering Demystified
http://www.gamedev.net/reference/programming/features/defRender/

Deferred Shading
http://www.3dvia.com/studio/documentation/user-manual/shaders/deferred-shading

G-buffer, Geometry Buffer (Saito and Takahashi 1990)

2010年9月30日 星期四

Mac OS VMware

       為了評估以後我到底想不想要買MacBook,我決定先在我的PC上安裝MacOS試用看看,最方便的方法就是透過安裝VMware,詳細安裝方式如以下連結:
http://junclj.blogspot.com/2010/02/vmware-workstation-70mac-os-x-snow.html

      直得留意的是一些需要傳到Mac上面執行的安裝檔,例如audio driver,都要透過PC端設定的VMware share folder來放置。
      最後竟然網路不需要設定就可以使用,真是讓我驚訝!! 音樂、影片只要檔案格式允許的話都可以播放,比較可惜的是畢竟VM的效能有限,還是沒辦法一對一同步...以下為成功畫面:

2010年9月28日 星期二

Game Rendering Papers

Game Rendering:
http://www.gamerendering.com/2008/11/01/deferred-lightning/

Beyond3D:
http://www.beyond3d.com/content/articles/19/9

2010年9月13日 星期一

Dive Into Python 3

http://diveintopython3.org/

網頁版的Dive Into Python 3

2010年9月10日 星期五

reference to pointer

void GetData( sFOO *&fooIn )
{
    char *pCH = "ABC";
    sFOO *lPoo = new sFOO();

    memcpy( lPoo->foo, pCH, strlen(pCH));

   
    fooIn = lPoo;
};


int _tmain(int argc, _TCHAR* argv[])
{
sFOO  *lpFOO = XE_NULL;

    GetData( lpFOO );
 }

void GetData( sFOO *&fooIn )
{
    char *pCH = "ABC";
    sFOO lPoo;// = new sFOO();

    memcpy( lPoo.foo, pCH, strlen(pCH));

   
    *fooIn = lPoo;
};

int _tmain(int argc, _TCHAR* argv[])
{
    sFOO    Foo;
    sFOO  *lpFOO = &Foo; //= XE_NULL;

    GetData( lpFOO );
 }

2010年9月9日 星期四

The Mono runtime

The Mono runtime engine provides a Just-in-Time compiler (JIT), an Ahead-of-Time compiler (AOT)

Mono has both an optimizing just-in-time (JIT) runtime and a interpreter runtime. The interpreter runtime is far less complex and is primarily used in the early stages before a JIT version for that architecture is constructed. The interpreter is not supported on architectures where the JIT has been ported.

The idea is to allow developers to pre-compile their code to native code to reduce startup time, and the working set that is used at runtime in the just-in-time compiler.

When an assembly (a Mono/.NET executable) is installed in the system, it is then be possible to pre-compile the code, and have the JIT compiler tune the generated code to the particular CPU on which the software is installed.

The code produced by Mono's ahead-of-time compiler is Position Independent Code (PIC) which tends to be a bit slower than regular JITed code, but what you loose in performance with PIC you gain by being able to use all the available optimizations.

http://www.mono-project.com/Mono:Runtime

2010年9月8日 星期三

Computer Language Benchmarks Game

http://shootout.alioth.debian.org/u64/benchmark.php?test=all&lang=csharp&lang2=v8

2010年8月26日 星期四

Post-Build copy DLL to directory

 首先撰寫我們的batch file - build.bat:
 
XCOPY ..\..\folder\filename %1 /S /Y

@pause

 將它放在project檔案旁邊。接著在MSVC中project perporties中的
Post-Build Event輸入call ./build.bat $(OutDir)。 如此一來就
可以將我們的.dll copy到執行檔旁邊了。 
 
Batch 教學 
http://bbs.nsysu.edu.tw/txtVersion/boards/msdos/M.1078700757.A.html

Batch指令
http://ca95.pixnet.net/blog/post/3922827

Creating a Batch File to copy a directory
http://en.kioskea.net/forum/affich-30405-creating-a-batch-file-to-copy-a-directory 


 Pre-build Event/Post-build Event Command Line Dialog Box

http://msdn.microsoft.com/en-us/library/42x5kfw4%28VS.80%29.aspx

2010年8月23日 星期一

D3D instancing

D3D instance 技术概述与实践
http://www.cnitblog.com/updraft/articles/56980.html

 Efficiently Drawing Multiple Instances of Geometry (Direct3D 9)

http://msdn.microsoft.com/en-us/library/bb173349%28VS.85%29.aspx 


Instancing Sample
http://msdn.microsoft.com/en-us/library/ee418269%28VS.85%29.aspx

2010年8月17日 星期二

Release Bug...

 這已經是我遇到第三次了吧,都是debug沒問題,到了執行release版本才會發生不可預期的錯誤,然後程式關不掉的問題...目前只能朝著記憶体超寫的問題作猜測,希望可以早點把問題解決@@

結果竟然是dll and exe的project setting中optimization設定選項問題 (fiber-safe, whole program optimization),我手邊的project沒辦法讓exe dll跟slib用一樣最佳化的參數。改天真要好好搞懂那各是代表什麼意思。

2010年8月10日 星期二

Template in .cpp file

- Compiler uses template classes to create types by substituting template parameters, and this process is called instantiation.

- The type that is created from a template class is called a  specialization.
  
//---------------------------------------------------------------
// in main.cpp
#include "temp.h"

int main()
{
    Foo A;

    Foo B = A;
   

    return 0;
}


// in temp.h
template < typename T>
class  Foo
{
public:
    Foo() ;
    ~Foo() ;

    Foo( Foo &rhs )
    {
        int test = 0 ;
    };
};

#include "temp.cpp"

template < typename T>
Foo::Foo()
{
   int test = 0;
}

template < typename T>
Foo::~Foo()
{
    int test = 0;
}

 
template class Foo ; // explicit instantiation   
//--------------------------------------------------------------- 


With this approach, we don't have huge headers, and hence the build time will drop. Also, the header files will be "cleaner" and more readable. However, we don't have the benefits of lazy instantiation here (explicit instantiation generates the code for all member functions)

reference: http://www.codeproject.com/KB/cpp/templatesourceorg.aspx

2010年8月8日 星期日

Shader Model 5.0

With Shader Model 5, Microsoft applies certain concepts of object-oriented programming to its shader language, HLSL. Unlike preceding versions, which introduced new capabilities (Dynamic Branching, integer support, etc.)

Increase in maximum texture size from 4K x 4K to 16K x 16K and the possibility of limiting the number of mipmaps loaded in VRAM. There’s also the possibility of changing the depth value of a pixel without disabling functionality like early Z checking, support for double-precision floating-point types, scatter memory writes, etc.


Reference: http://www.tomshardware.com/reviews/opengl-directx,2019-9.html
GDC 2009, http://cmpmedia.vo.llnwd.net/o1/vault/gdc09/slides/100_Handout%206.pdf

Skybox Render

Skyboxes are often frame-buffer-bandwidth bound optimize them:
(1) render them last, reading (but not writing) depth, and allow the early-z optimizations along with regular depth buffering to save bandwidth.
(2) render the skybox first, and disable all depth reads and writes. Which option will save you more bandwidth is a function of the target hardware.

If a large portion of the skybox is obscured, the first technique will likely be better; otherwise, the second one may save more bandwidth.

reference: http://http.developer.nvidia.com/GPUGems/gpugems_ch28.html

2010年8月1日 星期日

GPU Gems

Gems 1: http://http.developer.nvidia.com/GPUGems/gpugems_part01.html
Gems 2: http://http.developer.nvidia.com/GPUGems2/gpugems2_part01.html
Gems 3: http://http.developer.nvidia.com/GPUGems3/gpugems3_part01.html

2010年7月22日 星期四

Normal Maps

  1. Bump map
    Use grey texel to record bump, and converts bump to the vertex normal, Bump maps require additional information describing how the bump intensity range maps to global space distance units, in order to convert to normal map.

       Bump map elevation
  2. Object space normal map
    More cheap computation than tangent space normal map, but only support rigid mesh, can't support deformed mesh, and isn't suited for tiling and symmetric model.
  3. Tangent space normal map
    A little expensive one. It supplies deformed mesh and all results that object space normal map and bump map can be. So this solution is more flexible for artists pipeline using.
     Left: tangent space. Right: bump. 
    Have the same result on the cube

reference:  Understanding Normal Maps, http://www.pixologic.com/docs/index.php/Understanding_Normal_Maps#Object_Space_Map_Uses
http://tech-artists.org/wiki/Normal_mapping

Calculation tangent space:
http://jerome.jouvie.free.fr/opengl-tutorials/Lesson8.php
http://www.terathon.com/code/tangent.html

2010年7月19日 星期一

Using the Intel(R) MKL Memory Management

http://software.intel.com/sites/products/documentation/hpc/mkl/win/index.htm

Intel MKL has memory management software that controls memory buffers for the use by the library functions.

2010年7月14日 星期三

董事長您好

今天第一次被跟我同天生日的董事長記住名子,開會第一次被問話時發言結結巴巴...眼前一面漆黑,真是讓我尷尬,還好後來慢慢開始講的流利起來,果然還是要多練習報告...

2010年7月12日 星期一

Shader Special Concern Point

Shader Fog: http://randomchaosuk.blogspot.com/2007/07/shader-fog.html
In Shader 3.0 or higher, DirectX removed fog states: http://www.3dvia.com/forums/topic/how-to-use-fog-in-shader
SM3 Fog:http://xna-uk.net/blogs/randomchaos/archive/2007/10/15/generic-xna-sm3-fog.aspx
Fog Formulas:http://msdn.microsoft.com/en-us/library/bb324452%28VS.85%29.aspx

Texture Stage and Sampler States
A pixel shader completely replaces the pixel-blending functionality specified by the multi-texture blender including operations previously defined by the texture stage states. Texture sampling and filtering operations which were controlled by the standard texture stage states for minification, magnification, mip filtering, and the wrap addressing modes, can be initialized in shaders. The application is free to change these states without requiring the regeneration of the currently bound shader. Setting state can be made even easier if your shaders are designed within an effect.

2010年7月10日 星期六

Scene Graph and Hierachical Clustering Structure

http://www.gamedev.net/community/forums/topic.asp?topic_id=128965
http://www.gamedev.net/community/forums/topic.asp?topic_id=110342
http://www.gamedev.net/community/forums/topic.asp?topic_id=181233
http://www.gamedev.net/community/forums/topic.asp?topic_id=540401

Game Engines List

Game engine list in Wiki:
http://wiki.gamedev.net/index.php/Game_Engines

2010年7月9日 星期五

Exposure Correction

http://freespace.virgin.net/hugo.elias/graphics/x_posure.htm

2010年6月25日 星期五

Depth Bias

To solve Z fighting issue: use depth bias.
An application can help ensure that coplanar polygons are rendered properly by adding a bias to the z-values that the system uses when rendering the sets of coplanar polygons. To add a z-bias to a set of polygons, call the IDirect3DDevice9::SetRenderState method just before rendering them, setting the State parameter to D3DRS_DEPTHBIAS, and the Value parameter to a value between 0-16 inclusive. A higher z-bias value increases the likelihood that the polygons you render will be visible when displayed with other coplanar polygons.
Offset = m * D3DRS_SLOPESCALEDEPTHBIAS + D3DRS_DEPTHBIAS
where m is the maximum depth slope of the triangle being rendered.
m = max(abs(delta z / delta x), abs(delta z / delta y))  
reference: http://msdn.microsoft.com/en-us/library/bb205599%28VS.85%29.aspx 
 

                                              Z Slope Scale = 0.0, Depth Bias = 0.0

                                             Z Slope Scale = 0.1, Depth Bias = 0.0

                                            Z Slope Scale = -0.2, Depth Bias = 0.0

                                          Z Slope Scale = -0.0, Depth Bias = -0.01

                                          Z Slope Scale = -0.0, Depth Bias = 0.01

Program demo: http://www.codesampler.com/dx9src/dx9src_5.htm

2010年6月18日 星期五

Occlusion Culling Using Direct3D 9.0

In the view frustum culling scenario, the culling-in objects like the bottom figure shows:
But in the practical view looks like the bottom figure, we just need to render three objects: red, purple, and blue one. The green and yellow one are hidden.
In this condition, we can adopt GPU hardware support. Direct3D provides Occlusion Query to calculate number of pixel visible. If the number is zero, meaning it is fully occluded, else is greater than zero  the pixel is visible by the viewer.

The IDirect3DQuery9 process is presented below:
  1. Render every object's bounding mesh
  2. For every object:

    1. Begin query
    2. Re-render the bounding mesh
    3. End query
    4. Retrieve occlusion query data. If the pixels visible are greater than zero, the object should be rendered. Otherwise, the object should be occluded from rendering.
In the first step, we need to decide the bounding mesh. Use bounding box? sphere?...The suit way is using lower vertex count mesh to instead of bounding box or sphere.
Bounding sphere may have the same vertex count with bounding mesh, but it can't approximate like bounding mesh, use bounding mesh can effectively test the occlusion, and well enough accurate. render the bounding mesh first to make sure the scene is present in the Z buffer.
 
At the second step, occlusion query to determine each object's visibility status. If the query is zero pixel, the object is exclude from the final draw, else the object is include in the render list. Adopt much smaller surface (320 pixels to 240 pixels) is used to improve performance.


reference: Occlusion Culling Using DirectX 9, http://www.gamedev.net/reference/programming/features/occlusionculling/
CodeSampler: http://www.codesampler.com/dx9src/dx9src_7.htm#dx9_occlusion_query
Direct3D 9: http://msdn.microsoft.com/en-us/library/bb147308%28VS.85%29.aspx
Image-Based Occlusion Culling: http://cggmwww.csie.nctu.edu.tw/~danki/myweb/projects/hom/index.html  
GPU Gems2: Chapter 6. Hardware Occlusion Queries Made Useful http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter06.html

2010年6月13日 星期日

Shadow Application In A Soccer Game


Maybe you will wonder in this snapshot how to present so many shadows in one frame.

There is some tricky, because the gorgeous shadow result doesn't appear always. In the often gameplay situation, it only looks like the bottom snapshot.
In this snapshot, doesn't have any self-shadow result. The shadow approach looks like planar shadow.

When in the replay clips, Planar shadow and shadow map are used together. In the planar shadow phases, all shadows are rendered to a target, and then preparing shadow map uses another render target(using sizeof 1024x2048, cause what?, now I still don't understand.). At the rendering pass, planar shadow can use a rectangle to mapping the render target. With regard to shadow map, fetch the shadow texel in each object's pixel shader.

2010年6月7日 星期一

Order-Independent Transparency



Depth sorting alpha blended objects
http://blogs.msdn.com/b/shawnhar/archive/2009/02/18/depth-sorting-alpha-blended-objects.aspx

Order Independent Transparency with Dual Depth Peeling
http://developer.download.nvidia.com/SDK/10/opengl/src/dual_depth_peeling/doc/DualDepthPeeling.pdf

Interactive Order-Independent Transparency
http://developer.nvidia.com/object/Interactive_Order_Transparency.html

Alpha blending without sorting
http://www.gamedev.net/community/forums/topic.asp?topic_id=405755

Depth-sort based Alpha Blending
http://www.opengpu.org/bbs/archiver/?tid-422.html

Sort-Indenpency Alpha Blending
http://0rz.tw/DSYnW

Order-indenpendent transparency
http://www.wolfgang-engel.info/blogs/?p=96

2010年6月6日 星期日

Current shadow map method

  1. Switch render target. Keep the current pass color surface and depth surface, set the shadow map color surface and depth surface in order to get the frame buffer data into the screen space texture - shadow map.
  2. We use the R32 color map to store shadow texture, size of 1024x1024. Because dx9.0 can't be valid for modify depth texture, but graphics card vendor support get depth texture approach, it may has something warning in dx9.0 runtime.( Nvidia: hardware shadow map, pcf2x2. ATI: fetch4) If we can fetch depth info. as texture then disable color write, use depth bias to solve depth fighting. The process of shadow map will decrease cost effectively.
  3. In the second rendering pass, fetch the shadow map, and adopt PCF 2x2 to blur the aliasing.
ftp://download.nvidia.com/developer/presentations/2004/GPU_Jackpot/Shadow_Mapping.pdf

http://msdn.microsoft.com/en-us/library/ee416324%28VS.85%29.aspx

Matrix multiple in HLSL

In HLSL, We could use constant float to transit the value into shader code. There is a confusing problem to me, in row-major and column major rule will have the same result?

When we use setConstantF to the register, shader's assembly use vector to record them. At processing time, these vector use dot product operator with you variable. So when you use mul( pos, WorldMtx);
WorldMtx is:
{
_00, _01, _02, _03,
_10, _11, _12, _13,
_20, _21, _22, _23,
_30, _31, _32, _33
}
the WorldMtx need to be transpose at software, and the process is:
Output.x = dot( vector(pos.xyz, 1.0f), vector(_00,_10,_20,_30) );
Output.y = dot( vector(pos.xyz, 1.0f), vector(_01,_11,_21,_31) );
Output.z = dot( vector(pos.xyz, 1.0f), vector(_02,_12,_22,_32) );

Or you also can use mul(WoldMtx, pos); , and don't transpose WorldMtx. The result is:
Output.x = mul( vector(_00,_01,_02,_03), pos.x ) + mul(_30, 1.0f);
Output.y = mul( vector(_10,_11,_12,_13), pos.y ) + mul(_31, 1.0f);
Output.z = mul( vector(_20,_21,_22,_23), pos.z ) + mul(_32, 1.0f);

Although the result is the same, but you can imagine the second one has more operator in the shader code.

2010年6月5日 星期六

好黑暗的會議

昨天的會議後來真是讓我覺得有點灰心,可能這就是社會...會有人被覺得不適合,然後就從你身邊被拔掉了。如果平常表現的能力無法取得信任,又沒有時時為自己充電新知識,就會被有所挑剔。其實在旁邊聽的我,也覺得自己要有所警惕,效率要快,要多汲取新知,不然隨時都會被後來的新血給取代...

SSAO - Screen Space Ambient Occlusion

About SSAO approach can reference here
http://mtlung.blogspot.com/2008/09/ssao.html
http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter14.html

2010年6月4日 星期五

Hardware Occlusion Queries

GPU provides query mode to test pixel whether or not would be draw. In the previous frame, render model only its bounding box, and query if success in the view indeed. Furthermore, we just render objects the bounding box is in the clip space .
But sometimes this method will have more drawcalls, and need to wait some frames. It will have overcome.
http://www.gamedev.net/community/forums/topic.asp?topic_id=377484

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter06.html

http://cggmwww.csie.nctu.edu.tw/~danki/myweb/projects/hom/index.html

Fast and Simple Occlusion Culling using Hardware-Based.pdf

Occlusion Culling Using DirectX 9
http://www.gamedev.net/reference/programming/features/occlusionculling/

2010年6月2日 星期三

Website introduce shader

HLSL Shaders
http://knol.google.com/k/hlsl-shaders#

Introduction to Shader Programming
http://www.gamedev.net/columns/hardcore/dxshader1/
http://www.gamedev.net/columns/hardcore/dxshader2/
http://www.gamedev.net/columns/hardcore/dxshader3/
http://www.gamedev.net/columns/hardcore/dxshader4/
http://www.gamedev.net/columns/hardcore/dxshader5/

2010年5月30日 星期日

Shadow Mapping and Shadow Volumes

http://www.devmaster.net/articles/shadow_techniques/

Abstract

In recent years, both Williams’ original Z-buffer shadow mapping algorithm [Williams 1978] and Crow’s shadow volumes [Crow 1977] have seen many variations, additions and enhancements, greatly increasing the visual quality and efficiency of renderings using these techniques. Additionally, the fast evolution of commodity graphics hardware allows for a nearly complete mapping of these algorithms to such devices (depending on the GPU’s capabilities) which results in real-time display rates as seen in the real-time version of Pixar’s Luxo Jr and the use of hardware shadow maps therein. In this article, we describe the major contributions since Williams’ and Crow’s original publications in 1978 and 1977 respectively, briefly present both the shadow mapping (which is computed in image space) and the shadow volume algorithms, present more sophisticated approaches to shadow mapping which are better suited to high quality off-line renderers and describe the aliasing problems inherent in all shadow algorithms which operate in image space (and proposed solutions). Finally, we describe new extensions to the existing algorithms such as perspective shadow maps as described by [Stamminger and Drettakis 2002] in the 2002 SIGGRAPH conference, and robust stenciled shadow volumes by Mark Kilgard [Everitt and Kilgard 2002].

Shadow map save as texture

http://www.gamedev.net/community/forums/topic.asp?topic_id=180931
當first pass做好的shadow map它的格式可能為R32F,應該要轉成ARGB8的格式才能被儲存。

http://www.gamedev.net/community/forums/topic.asp?topic_id=534392&whichpage=1&#3453499

告訴我,dx9並沒有辦法直接透過surface去取得depth texture(xbox360好像可以辦到),除非是Nvidia的PCF,或是ATI的Fetch4,但是這兩種方法都會造成d3d debug runtime invaild。

http://www.gamedev.net/community/forums/topic.asp?topic_id=566495
這邊說到OpenGL可以透過FBO取得depth texture,但是D3D要到10.0之後才支援這項功能,最後提到了如果要cross不同API還是做R32F的color texture。

目前最單純又cheap的方法還是用R32F的color map。

http://developer.nvidia.com/object/hwshadowmap_paper.html

Hardware Shadow mapping - Nvidia

關於render target的注意事項

引述自: http://blog.csdn.net/xoyojank/archive/2009/02/11/3876843.aspx

1. 设置一个RenderTarget会导致viewport变成跟RenderTarget一样大

2. 反锯齿类型必须跟DepthStencilBuffer一样

3. RenderTarget的类型必须跟DepthStencilBuffer的类型兼容, 可以用IDirect3D9::CheckDepthStencilMatch进行检测

4. DepthStencilBuffer的大小必须>=RenderTarget的大小

5. IDirect3DDevice9::SetRenderTarget的第0个不能为NULL

6. Usage为D3DUSAGE_RENDERTARGET的Texture不能进行反锯齿, 而且Pool必须为D3DPOOL_DEFAULT. 如果想利用RenderTarget做为纹理又想反锯齿, 可以先把场景渲染到一个CreateRenderTarget创建的Surface(或BackBuffer)上, 再用IDirect3DDevice9::StretchRect拷贝到纹理上

7. D3DX提供了一个ID3DXRenderToSurface, 简化了RenderTarget的使用. 注意它的BeginScene跟EndScene与再用IDirect3DDevice9的同名函数对不能嵌套, 因为实际上内部还是调用的IDirect3DDevice9的, 用PIX可以看到它进行了哪些调用. 还有就是这个接口仍然不能反锯齿, 而且每次都要保存/恢复一堆状态, 总觉得不爽

8. RTT不能既做为输入就做为输出目标, 某些显卡上可能只会给一个warning, 有些显卡上则会发生报错/黑屏/死机之类不可预计的事情...另外, Depth stencil texture(参见Hareware shadow map)也有同样的问题, 用完之后要SetTexture(n, NULL)清空, 不然A卡会黑屏/花屏/深度错误, 既使你没有使用, 只要它被寄存器引用了, 显卡还是会当做是正在使用的, 这时就不能做为depth stencil buffer

9. RTT如果想保存到文件中, 是不能直接SaveToTexture的. 需要创建一个OffscreenSurface, 拷贝过去, 再保存. 不过N卡好像不支持DXT1格式的OffscreenSurface, 可以创建Texture, 取其level0的surface代替.

10. N卡在开启了锯齿后冒似所有的RTT都要反锯齿, 不然深度测试会失败-_-

11. Intel的显卡在RTT没有设置DepthBuffer时可能所有绘制全部深度测试失败, 需要关闭深度测试再画.

葉師傅贏了~~~

今天看了葉問二,無論對手犯規在鐘響後偷拳,還被限制無法使用腳。在總總不公平的環境下,仍然能夠突顯出自己的決心,這才能讓別人明瞭出自己的價值。

2010年5月16日 星期日

真想只專心寫我的shader...

當我的工作跟別人卡在一起,就會被催促先去弄什麼什麼,不然誰的畫面看起來就不能夠先正確。我也知道這樣說可能會不太負責,我們本來就是開發中的引擎,畫面現在不太對也是正常的,我也會盡快去處理的阿...現在引擎很多架構都還在制定中,我真的很想早早把底層給寫完,不然其他人用我們引擎寫shader,還要去動到底層的結構...。 有時候真想躲起來,專心的寫shader就好了,話說最近為了支援lighting,不停的拆解shader讓我有點累了...之後每支援一種效果都要build 14種shader嗎?

2010年5月9日 星期日

John Carmack GDC 2010 Lifetime Award



John Carmack是PC game史上的傳奇人物,隨著他的技術進展,PC game的硬體規格也跟著進化,甚至許多rendering相關的專有名詞,也是跟著他們的成就而被命名。

關於J.C.以及id software他們的自傳,可以閱讀Masters of Doom(DOOM啟示錄)一書來了解,或是觀看猴子靈藥的文章介紹。

《DOOM啟世錄》:遊戲界的傳奇英雄:John Carmack & John Romero
http://blog.monkeypotion.net/book/gamedevbook/masters-of-doom-2

Rage PC Games Interview - John Carmak Interview

Rage at IGN.com

對於John Carmack的傳奇我就不用多說了,光從他獲得今年GDC 2010的Lifetime Achievement Award就可以被了解。

這次公布了他的新引擎id tech 5開發的Rage,使用了mega texture。id的引擎每一代都有它的創新技術,諸如tech1採用了BSP, PVS的scene graph,tech 2支援3D硬體加速以及lightmap,tech 3的fixed-function shader script和dynamic shadow,tech 4支援了shader model,提供了per-pixel lighting and bump mapping,並且採用skeletal animation等等。
http://en.wikipedia.org/wiki/Id_Tech

2010年5月8日 星期六

投上Journal了~~~

等了兩年,我終於等到我以前念碩班的論文,投上SCI級的Journal了~~~

Alan Wake -Building the Technology- Trailer



Alan Wake(心靈殺手)這部2010年三月release的Xbox 360獨占遊戲,其開發商為Remedy Entertainment,公司團隊位於芬蘭,團隊創立於1995年,已具有15年的開發經驗,近年來的作品為Max Payne(江湖本色)系列。
http://www.remedygames.com/

這次本作品的引擎為自行開發,由於市面上的引擎無法滿足其光影以及技術上的需求。

2010年5月4日 星期二

我的shader架構訂錯了?...

最近每天上班都越來越不像在寫shader code,反而像是在build shader code。因為我很不想把shader拆成很多份,但是隨著功能的需求,已經變成不拆不行了...我現在每天都要拆個七八份去rebuild...

還有我覺得我的shader相關的script訂的可能也不是很周密,今天做相關tool的人跟我抱怨:早叫你當初就訂的完備一點,現在可能整體工具介面都要被改掉...我知道這已經是他很重的口氣了,但是當初的我也是第一次做,我沒有那個經驗可以弄得這麼縝密...好吧~~~希望能在這個prototype後找出所有的問題點,一口氣被改正@@~

2010年4月30日 星期五

數學好的人較可能領高薪

過去一年中階人員平均最高薪的前五名是數學家、精算師、軟體工程師、電腦系統分析師、統計學家,他們平均年薪可以在七十三萬美元以上。不過要進這幾行,優異的數學能力將 是必備。

《華爾街日報》特別分析,在金融業紛紛裁員減薪時,數學家、精算師反而不怕。因為此 刻企業需要他們的精準的評估損失、計算風險報酬,反而穩坐高薪一族。

有好的數學頭腦進入金融、軟體、工程等領域,是未來要拿到高薪的保證。

http://tw.news.yahoo.com/article/url/d/a/100430/122/24tno.html

.....七十三萬美元會不會太多了一點,我懷疑是不是數字有問題。

老大請吃飯

今天老大說我們做的產品在北美已經通過發行認證了,所以中午請我們吃飯,吃的好飽喔~~這個產品其實我才幫忙兩個禮拜而已,個人貢獻度有限,有這樣的結果 多虧了同事們的努力,真是辛苦了!!

Shader use register constant array index

Today I wonder if using register constant array index will influence performance. I design my experiment is a three point light shading shader:
  1. Lighting color, intensity, and position all them use three constant floating to set into shader separately.
  2. Use color, intensity, and position array, their size is three for shader computing.
In the experiment result, they all have the same frame rate are 2725. So in my experiment, using register constant array index doesn't has performance penalty.

2010年4月29日 星期四

The experiment of shader programming with if-condition instruction

Today I make an experiment to talk about the if-else syntax in shader codes, how this way influences my application performance.
  1. The first one has three shaders are one point light, two point lights, and three point lights. We apply them to three different surface model.
  2. The second one, we use a shader code which writes three if condition instructions to show three types light which also be applied to three different surface model(same with upper experiment).
Finally, the experiment results show us the first one has 2639 FPS, the other is 1762 FPS. Although the second one use the same code, we can avoid reset shader to device. It tells us the if instruction will effect our performance, and it hurts degree more than reset shader to device. So all we can do is avoid to use if instruction possibly.

好想買ShaderX7

最近在研究如何規劃lighting系統,和實作shadow map的演算法。剛好這本書都有提到我想探知的topic。如果有誰有購買的可以借我翻一翻嗎?...最近的我有點窮

2010年4月25日 星期日

Shadow map develope (I)

Shadow map can help us generate full-screen shadow result. It can achieve natural self-shadow effect。But its bottleneck is hard shadow and resolution problem。
First, I want to use this dx9 sample to describe shadow map implement, shadow map is a multiple pass technique, so I divide into different term to decribe shadow map。
  1. The first pass, we need to generate the shadow map. Store the current render target and DepthStencil, then set render target to the shadow map texture, then tranform view space to lighting's view space(including projection space), write the depth value to the render target, finally resort previous render target and DepthStencil。
  2. The End pass, render the scene. Use camera's view space and projection space, set the shadow map texture, generate the shadow map martix that means transform from view space to lighting space. And in the pixel shader, we can at shadow map space to sink our shadow map texel, and using pcf(percentage closest filtering) to blur it, the texel color then compute with diffuse color is the output color.

NVemulate

NVemulate allows you to emulate the functionality of various GPUs (very slowly) in software. In addition, you can use it to control GLSL Support and Open GL 3.0 Support.

http://developer.nvidia.com/object/nvemulate.html

Dynamic Branching in Shader

Static branching vs. Dynamic branching:
  • Static one to switch on or off, based of a boolean shader constant, for disable or enable a code path。Between draw calls, you can decude which features you want to support, and set the Boolean flags to support this behavior。
  • Dynamic one the comparison condition resides in a variable, done for each vertex or each pixel at run time( not at compiling time or between two draw calls)。The performance hit is the cost of branch + the cost of the instructions on the side of the branch token。Implemented in shader model 3.0 or higher。
http://msdn.microsoft.com/en-us/library/bb944006%28VS.85%29.aspx

該討論串討論: 當GLSL在Nvidia的GPU時,遇到了雙層迴圈,執行8x8的braching運算,發生效能剩下2fps的問題。 當對於looping做unroll的動作,發現又可以回到30fps。
因此一開始猜想的內容是對於GPU的branching能力,後來發現arry index中使用了constant register index才是影響效能的關鍵,當將array的內容儲存在一個texture上,或是改用uniform array of vec3's都可以獲得改善。
http://www.gamedev.net/community/forums/topic.asp?topic_id=559196


此討論串:討論了使用if condition可能會造成執行了both branching的問題,因為compiler判斷的問題,此耗費的計算比手動切分shader code,讓flow趨向單一branching的計算相對來的多。
(result = cmp( condition, result_a, result_b ); compiler可能會用cmp的asm,而不是if - else)
http://www.gamedev.net/community/forums/topic.asp?topic_id=474592

能用static branch就別用dynamic branching,使用static branch需要將condition內使用boolean ,避免使用variable,否則會造成dyanmic branching。 如果用? : 就可以使用single instruction,而不會如if - else發生較多instruction。
http://www.microsoft.com/downloads/en/confirmation.aspx?familyId=74db343e-e8ff-44e9-a43e-6f1615d9fce0&displayLang=en

當asm中未出現dyanmic branching的支援,反而使用cmp的instruction時,可以嘗試以下方法:
  • Texture using branch, have to use tex2Dlod rather than tex2D
  • D3DXSHADER_PREFER_FLOW_CONTROL when you call D3DXCompileShader... and of course specify ps_3_0

2010年4月11日 星期日

The first article

  這是我的第一篇文章,在這個空間我想要將我所學習到的程式經驗給記錄下來,記錄著自己一點一點進步留下的痕跡~~~