Posts

Showing posts from 2010

本年度的引擎開發結束了

這禮拜今年引擎所規劃的開發工作都即將一一被驗收完畢,這套系統從我進公司時從無到有到現在技術一一被實作完畢,從那時剛畢業一無所知到現在可以慢慢獨當一面,我確實從這過程成長了許多。

這套引擎看不到要給誰用,能夠幫上哪個專案。這是讓我感到灰心的...我為了它一個禮拜加了三四天的班,我想要的只是讓台灣知道我們的技術能力,想要乘著這艘船挑戰著世界...但這艘船始終無法啟航。我實在有點無力,熱情一點點被熄滅,現在什麼東西只要有就好了,我已經無法在繼續要求自己再更好了,我對其他同事也不再要求,只要大家能開心就好了...

不管明年的新目標是什麼,明年可能又是做一艘新的船,可能不做船了,不管結果怎樣,我還是讓自己成為了一個船匠,謝謝一起努力的夥伴們。

Cascaded shadow map implement done

Image
This week I finished cascade shadow map implement, I use two split to save the shadow map. The fine one can show clear result, when the objects are far enough to out view, we save them to the rough texture. Fine texture is close to 1024 x 1024, the rough texture can be 512x512, after the distance I don't want to wrtie to my texture.

I provide splitlambda to adjust image resolution between fine and rough texture, and also can use splitscale to solve the situation when texel is close to the near plane.


                                             Perspective view - has a little artifact

Orthogonal view - improve the artifact

web plugin default resource folder

IE 8.0
your desktop

Google chrome
C:\Documents and Settings\Administrator\Local Settings\Application Data\Google\Chrome\Application\8.0.552.215

Mozilla  Firefox
C:\Program Files\Mozilla Firefox

The windows program named c:\windows\regsvr32.exe is used to register and unregister ActiveX DLL's and OCX's. (An OCX file is merely a renamed DLL file.) You RUN regsvr32 as follows:
Register: regsvr32 dllFileName.dll UnRegister: regsvr32 /u dllFileName.dll "plugins": [ { "path": "your_npapi_plugin.dll" } ],<script>var plugin = document.getElementById("MyNPAPIPluginId"); ...</script>http://stackoverflow.com/questions/9392536/developing-chrome-extensions-using-npapi-in-chttp://www.firebreath.org/display/documentation/Deploying+and+updating+your+plugin

tex2D vs. tex2Dproj

texCoord(  texX, texY, texZ , texW ) means texture coordinate after transforming from the texture matrix [ position goes through world, view, projection, and texture ].


We can use texCoord to fetch texture:
1.  float4 color = tex2D( sampler,  float2( texX / texW, texY/texW );
2.  float4 color = tex2Dporj( sampler, float4( texX, texY, texZ, texW ) );

The top-two methods will have the same result. Because tex2Dproj operator supports divide w in its interface.


http://www.gamedev.net/community/forums/topic.asp?topic_id=408894
http://bbs.gameres.com/showthread.asp?threadid=104316

Cascaded Shadow Map

The d3d example in browsers - IE

Image

An OpenGL Sample as Firefox Plugin

Volume Texel

Image
In OGRE Engine, you should have a 3D texture and was implemnted in fixed function.

material Examples/VTDarkStuff
{
    technique
    {
        pass
        {
            diffuse 0.0 0.0 0.0
            ambient 0.1 0.1 0.15
            cull_hardware none
            lighting on
        }

    }
}

Volumetric Fog

MRT format

When I use different format render target in deferred lighting. D3DFMT_A8R8G8B8 is better than D3DFMT_A16B16G16R16F when you don't have precision issue. I have 530 vs. 487 FPS experiment result.

PS3 Cell Processor

Image

感謝...又突破一個瓶頸

我自己有時候會是一個悲觀性格的人,遇到bug的時候就會責怪自己數學不好英文也不好,開始質疑自己對於這方面的工作是不是難以勝任...別人可能很快就可以搞懂的東西我卻要比別人多花時間...

但今天把工作帶回家解決掉一個花費將近一個禮拜的bug,又獲得一枚經驗值。真不知道是不是自己又太幸運了~~~證明自己還是可以繼續努力,加油!!!  明天有國外客戶要來交流,被叮嚀說一定要開口問問題! 所以我爛爛的英文對話要拿出來面對大家了,希望能夠及格,我真要開始找個時間好好補習了...

Deferred lighting

Image
Today I finish my deferred lighting debug, there are some experience notes:
In the geometry pass, I prepare 4 render target
depth map( view position and depth, actually we can try to just store the depth value only) - remove, we can just save clip space depth(z/w) to insteadnormal map( view space normal ) + clip space depth(z/w)diffuse map( diffuse color)specular map( specular color and shininess )Draw lightsFirst, draw ambient light, we fetch the diffuse texture compute with ambient light  and fetch depth value from depth map write to the frame buffer and depth buffer. If you have z-prepass needn't rewrite depth to the z buffer. Second, sun light draw a full screen rectangle( is clip space coordinate ) compute the directional lighting, multiply among diffuse, specular map and lighting formula. Enable alpha blending one plus one, disable depth writeThird, omni light draw the light volume( a sphere ), use point lighting formula. Enable alpha blending one plus one, use front-face…

D24S8 depth sampling wrong

Deferred Rendering

Mac OS VMware

Image
為了評估以後我到底想不想要買MacBook,我決定先在我的PC上安裝MacOS試用看看,最方便的方法就是透過安裝VMware,詳細安裝方式如以下連結:
http://junclj.blogspot.com/2010/02/vmware-workstation-70mac-os-x-snow.html

      直得留意的是一些需要傳到Mac上面執行的安裝檔,例如audio driver,都要透過PC端設定的VMware share folder來放置。
      最後竟然網路不需要設定就可以使用,真是讓我驚訝!! 音樂、影片只要檔案格式允許的話都可以播放,比較可惜的是畢竟VM的效能有限,還是沒辦法一對一同步...以下為成功畫面:

Game Rendering Papers

Dive Into Python 3

http://diveintopython3.org/

網頁版的Dive Into Python 3

reference to pointer

void GetData( sFOO *&fooIn )
{
    char *pCH = "ABC";
    sFOO *lPoo = new sFOO();

    memcpy( lPoo->foo, pCH, strlen(pCH));


    fooIn = lPoo;
};


int _tmain(int argc, _TCHAR* argv[])
{
sFOO  *lpFOO = XE_NULL;

    GetData( lpFOO );
 }

void GetData( sFOO *&fooIn )
{
    char *pCH = "ABC";
    sFOO lPoo;// = new sFOO();

    memcpy( lPoo.foo, pCH, strlen(pCH));


    *fooIn = lPoo;
};

int _tmain(int argc, _TCHAR* argv[])
{
    sFOO    Foo;
    sFOO  *lpFOO = &Foo; //= XE_NULL;

    GetData( lpFOO );
 }

The Mono runtime

The Mono runtime engine provides a Just-in-Time compiler (JIT), an Ahead-of-Time compiler (AOT)

Mono has both an optimizing just-in-time (JIT) runtime and a interpreter runtime. The interpreter runtime is far less complex and is primarily used in the early stages before a JIT version for that architecture is constructed. The interpreter is not supported on architectures where the JIT has been ported.

The idea is to allow developers to pre-compile their code to native code to reduce startup time, and the working set that is used at runtime in the just-in-time compiler.

When an assembly (a Mono/.NET executable) is installed in the system, it is then be possible to pre-compile the code, and have the JIT compiler tune the generated code to the particular CPU on which the software is installed.

The code produced by Mono's ahead-of-time compiler is Position Independent Code (PIC) which tends to be a bit slower than regular JITed code, but what you loose in perfor…

Computer Language Benchmarks Game

Post-Build copy DLL to directory

首先撰寫我們的batch file - build.bat:XCOPY ..\..\folder\filename %1 /S /Y @pause 將它放在project檔案旁邊。接著在MSVC中project perporties中的Post-Build Event輸入call ./build.bat $(OutDir)。 如此一來就可以將我們的.dll copy到執行檔旁邊了。 Batch 教學http://bbs.nsysu.edu.tw/txtVersion/boards/msdos/M.1078700757.A.html

Batch指令
http://ca95.pixnet.net/blog/post/3922827

Creating a Batch File to copy a directory
http://en.kioskea.net/forum/affich-30405-creating-a-batch-file-to-copy-a-directory 
 Pre-build Event/Post-build Event Command Line Dialog Boxhttp://msdn.microsoft.com/en-us/library/42x5kfw4%28VS.80%29.aspx

D3D instancing

Release Bug...

這已經是我遇到第三次了吧,都是debug沒問題,到了執行release版本才會發生不可預期的錯誤,然後程式關不掉的問題...目前只能朝著記憶体超寫的問題作猜測,希望可以早點把問題解決@@

結果竟然是dll and exe的project setting中optimization設定選項問題 (fiber-safe, whole program optimization),我手邊的project沒辦法讓exe dll跟slib用一樣最佳化的參數。改天真要好好搞懂那各是代表什麼意思。

Template in .cpp file

- Compiler uses template classes to create types by substituting template parameters, and this process is called instantiation.

- The type that is created from a template class is called a  specialization.

//---------------------------------------------------------------
// in main.cpp
#include "temp.h"

int main()
{
    Foo A;

    Foo B = A;


    return 0;
}

// in temp.h template < typename T>
class  Foo
{
public:
    Foo() ;
    ~Foo() ;

    Foo( Foo &rhs )
    {
        int test = 0 ;
    };
};

#include "temp.cpp"

template < typename T>
Foo::Foo()
{
   int test = 0;
}

template < typename T>
Foo::~Foo()
{
    int test = 0;
}

template class Foo ;// explicit instantiation 
//--------------------------------------------------------------- 


With this approach, we don't have huge headers, and hence the build time will drop. Also, the header files will be "cleaner" and more readable. However, we don't have the benefits of lazy …

Shader Model 5.0

With Shader Model 5, Microsoft applies certain concepts of object-oriented programming to its shader language, HLSL. Unlike preceding versions, which introduced new capabilities (Dynamic Branching, integer support, etc.)

Increase in maximum texture size from 4K x 4K to 16K x 16K and the possibility of limiting the number of mipmaps loaded in VRAM. There’s also the possibility of changing the depth value of a pixel without disabling functionality like early Z checking, support for double-precision floating-point types, scatter memory writes, etc.


Reference: http://www.tomshardware.com/reviews/opengl-directx,2019-9.html
GDC 2009, http://cmpmedia.vo.llnwd.net/o1/vault/gdc09/slides/100_Handout%206.pdf

Skybox Render

Skyboxes are often frame-buffer-bandwidth bound optimize them:
(1) render them last, reading (but not writing) depth, and allow the early-z optimizations along with regular depth buffering to save bandwidth.
(2) render the skybox first, and disable all depth reads and writes. Which option will save you more bandwidth is a function of the target hardware.

If a large portion of the skybox is obscured, the first technique will likely be better; otherwise, the second one may save more bandwidth.

reference: http://http.developer.nvidia.com/GPUGems/gpugems_ch28.html

GPU Gems

Normal Maps

Image
Bump map
Use grey texel to record bump, and converts bump to the vertex normal, Bump maps require additional information describing how the bump intensity range maps to global space distance units, in order to convert to normal map.

   Bump map elevation Object space normal map
More cheap computation than tangent space normal map, but only support rigid mesh, can't support deformed mesh, and isn't suited for tiling and symmetric model.Tangent space normal map
A little expensive one. It supplies deformed mesh and all results that object space normal map and bump map can be. So this solution is more flexible for artists pipeline using. Left: tangent space. Right: bump.      Have the same result on the cube
reference:  Understanding Normal Maps, http://www.pixologic.com/docs/index.php/Understanding_Normal_Maps#Object_Space_Map_Uses
http://tech-artists.org/wiki/Normal_mapping

Calculation tangent space:
http://jerome.jouvie.free.fr/opengl-tutorials/Lesson8.php
http://www.terathon.com/c…

Using the Intel(R) MKL Memory Management

http://software.intel.com/sites/products/documentation/hpc/mkl/win/index.htm

Intel MKL has memory management software that controls memory buffers for the use by the library functions.

董事長您好

今天第一次被跟我同天生日的董事長記住名子,開會第一次被問話時發言結結巴巴...眼前一面漆黑,真是讓我尷尬,還好後來慢慢開始講的流利起來,果然還是要多練習報告...

Detail map

Shader Special Concern Point

Shader Fog: http://randomchaosuk.blogspot.com/2007/07/shader-fog.html
In Shader 3.0 or higher, DirectX removed fog states: http://www.3dvia.com/forums/topic/how-to-use-fog-in-shader
SM3 Fog:http://xna-uk.net/blogs/randomchaos/archive/2007/10/15/generic-xna-sm3-fog.aspx
Fog Formulas:http://msdn.microsoft.com/en-us/library/bb324452%28VS.85%29.aspx

Texture Stage and Sampler States
A pixel shader completely replaces the pixel-blending functionality specified by the multi-texture blender including operations previously defined by the texture stage states. Texture sampling and filtering operations which were controlled by the standard texture stage states for minification, magnification, mip filtering, and the wrap addressing modes, can be initialized in shaders. The application is free to change these states without requiring the regeneration of the currently bound shader. Setting state can be made even easier if your shaders are designed within an effect.

Scene Graph and Hierachical Clustering Structure

Game Engines List

Exposure Correction

Depth Bias

Image
To solve Z fighting issue: use depth bias.
An application can help ensure that coplanar polygons are rendered properly by adding a bias to the z-values that the system uses when rendering the sets of coplanar polygons. To add a z-bias to a set of polygons, call the IDirect3DDevice9::SetRenderState method just before rendering them, setting the State parameter to D3DRS_DEPTHBIAS, and the Value parameter to a value between 0-16 inclusive. A higher z-bias value increases the likelihood that the polygons you render will be visible when displayed with other coplanar polygons.
Offset = m * D3DRS_SLOPESCALEDEPTHBIAS + D3DRS_DEPTHBIAS where m is the maximum depth slope of the triangle being rendered.
m = max(abs(delta z / delta x), abs(delta z / delta y))  reference: http://msdn.microsoft.com/en-us/library/bb205599%28VS.85%29.aspx                                                Z Slope Scale = 0.0, Depth Bias = 0.0

                                             Z Slope Scale = 0.1, Depth Bi…

Occlusion Culling Using Direct3D 9.0

Image
In the view frustum culling scenario, the culling-in objects like the bottom figure shows:
But in the practical view looks like the bottom figure, we just need to render three objects: red, purple, and blue one. The green and yellow one are hidden.
In this condition, we can adopt GPU hardware support. Direct3D provides Occlusion Query to calculate number of pixel visible. If the number is zero, meaning it is fully occluded, else is greater than zero  the pixel is visible by the viewer.

The IDirect3DQuery9 process is presented below:
Render every object's bounding meshFor every object:

Begin queryRe-render the bounding meshEnd queryRetrieve occlusion query data. If the pixels visible are greater than zero, the object should be rendered. Otherwise, the object should be occluded from rendering.In the first step, we need to decide the bounding mesh. Use bounding box? sphere?...The suit way is using lower vertex count mesh to instead of bounding box or sphere.
Bounding sphere may h…

Shadow Application In A Soccer Game

Image
Maybe you will wonder in this snapshot how to present so many shadows in one frame.

There is some tricky, because the gorgeous shadow result doesn't appear always. In the often gameplay situation, it only looks like the bottomsnapshot.
In this snapshot, doesn't have any self-shadow result. The shadow approach looks like planar shadow.

When in the replay clips, Planar shadow and shadow map are used together. In the planar shadow phases, all shadows are rendered to a target, and then preparing shadow map uses another render target(using sizeof 1024x2048, cause what?, now I still don't understand.). At the rendering pass, planar shadow can use a rectangle to mapping the render target. With regard to shadow map, fetch the shadow texel in each object's pixel shader.

Order-Independent Transparency

Current shadow map method

Switch render target. Keep the current pass color surface and depth surface, set the shadow map color surface and depth surface in order to get the frame buffer data into the screen space texture - shadow map.We use the R32 color map to store shadow texture, size of 1024x1024. Because dx9.0 can't be valid for modify depth texture, but graphics card vendor support get depth texture approach, it may has something warning in dx9.0 runtime.( Nvidia: hardware shadow map, pcf2x2. ATI: fetch4) If we can fetch depth info. as texture then disable color write, use depth bias to solve depth fighting. The process of shadow map will decrease cost effectively.In the second rendering pass, fetch the shadow map, and adopt PCF 2x2 to blur the aliasing.ftp://download.nvidia.com/developer/presentations/2004/GPU_Jackpot/Shadow_Mapping.pdf

http://msdn.microsoft.com/en-us/library/ee416324%28VS.85%29.aspx

Matrix multiple in HLSL

In HLSL, We could use constant float to transit the value into shader code. There is a confusing problem to me, in row-major and column major rule will have the same result?

When we use setConstantF to the register, shader's assembly use vector to record them. At processing time, these vector use dot product operator with you variable. So when you use mul( pos, WorldMtx);
WorldMtx is:
{
_00, _01, _02, _03,
_10, _11, _12, _13,
_20, _21, _22, _23,
_30, _31, _32, _33
}
the WorldMtx need to be transpose at software, and the process is:
Output.x = dot( vector(pos.xyz, 1.0f), vector(_00,_10,_20,_30) );
Output.y = dot( vector(pos.xyz, 1.0f), vector(_01,_11,_21,_31) );
Output.z = dot( vector(pos.xyz, 1.0f), vector(_02,_12,_22,_32) );

Or you also can use mul(WoldMtx, pos); , and don't transposeWorldMtx. The result is:
Output.x = mul( vector(_00,_01,_02,_03), pos.x ) + mul(_30, 1.0f);
Output.y = mul( vector(_10,_11,_12,_13), pos.y ) + mul(_31, 1.0f);
Output.z = mul( vector(_20,_21,_22,_23), pos.z ) + …

好黑暗的會議

昨天的會議後來真是讓我覺得有點灰心,可能這就是社會...會有人被覺得不適合,然後就從你身邊被拔掉了。如果平常表現的能力無法取得信任,又沒有時時為自己充電新知識,就會被有所挑剔。其實在旁邊聽的我,也覺得自己要有所警惕,效率要快,要多汲取新知,不然隨時都會被後來的新血給取代...

SSAO - Screen Space Ambient Occlusion

Hardware Occlusion Queries

GPU provides query mode to test pixel whether or not would be draw. In the previous frame, render model only its bounding box, and query if success in the view indeed. Furthermore, we just render objects the bounding box is in the clip space .
But sometimes this method will have more drawcalls, and need to wait some frames. It will have overcome.
http://www.gamedev.net/community/forums/topic.asp?topic_id=377484

http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter06.html

http://cggmwww.csie.nctu.edu.tw/~danki/myweb/projects/hom/index.html

Fast and Simple Occlusion Culling using Hardware-Based.pdf

Occlusion Culling Using DirectX 9
http://www.gamedev.net/reference/programming/features/occlusionculling/

Website introduce shader

Shadow Mapping and Shadow Volumes

http://www.devmaster.net/articles/shadow_techniques/

AbstractIn recent years, both Williams’ original Z-buffer shadow mapping algorithm [Williams 1978] and Crow’s shadow volumes [Crow 1977] have seen many variations, additions and enhancements, greatly increasing the visual quality and efficiency of renderings using these techniques. Additionally, the fast evolution of commodity graphics hardware allows for a nearly complete mapping of these algorithms to such devices (depending on the GPU’s capabilities) which results in real-time display rates as seen in the real-time version of Pixar’s Luxo Jr and the use of hardware shadow maps therein. In this article, we describe the major contributions since Williams’ and Crow’s original publications in 1978 and 1977 respectively, briefly present both the shadow mapping (which is computed in image space) and the shadow volume algorithms, present more sophisticated approaches to shadow mapping which are better suited to high q…

Shadow map save as texture

http://www.gamedev.net/community/forums/topic.asp?topic_id=180931
當first pass做好的shadow map它的格式可能為R32F,應該要轉成ARGB8的格式才能被儲存。

http://www.gamedev.net/community/forums/topic.asp?topic_id=534392&whichpage=1&#3453499
告訴我,dx9並沒有辦法直接透過surface去取得depth texture(xbox360好像可以辦到),除非是Nvidia的PCF,或是ATI的Fetch4,但是這兩種方法都會造成d3d debug runtime invaild。

http://www.gamedev.net/community/forums/topic.asp?topic_id=566495
這邊說到OpenGL可以透過FBO取得depth texture,但是D3D要到10.0之後才支援這項功能,最後提到了如果要cross不同API還是做R32F的color texture。

目前最單純又cheap的方法還是用R32F的color map。

http://developer.nvidia.com/object/hwshadowmap_paper.html
Hardware Shadow mapping - Nvidia

關於render target的注意事項

Image
引述自: http://blog.csdn.net/xoyojank/archive/2009/02/11/3876843.aspx

1. 设置一个RenderTarget会导致viewport变成跟RenderTarget一样大2. 反锯齿类型必须跟DepthStencilBuffer一样3. RenderTarget的类型必须跟DepthStencilBuffer的类型兼容, 可以用IDirect3D9::CheckDepthStencilMatch进行检测4. DepthStencilBuffer的大小必须>=RenderTarget的大小5. IDirect3DDevice9::SetRenderTarget的第0个不能为NULL6. Usage为D3DUSAGE_RENDERTARGET的Texture不能进行反锯齿, 而且Pool必须为D3DPOOL_DEFAULT. 如果想利用RenderTarget做为纹理又想反锯齿, 可以先把场景渲染到一个CreateRenderTarget创建的Surface(或BackBuffer)上, 再用IDirect3DDevice9::StretchRect拷贝到纹理上7. D3DX提供了一个ID3DXRenderToSurface, 简化了RenderTarget的使用. 注意它的BeginScene跟EndScene与再用IDirect3DDevice9的同名函数对不能嵌套, 因为实际上内部还是调用的IDirect3DDevice9的, 用PIX可以看到它进行了哪些调用. 还有就是这个接口仍然不能反锯齿, 而且每次都要保存/恢复一堆状态, 总觉得不爽8. RTT不能既做为输入就做为输出目标, 某些显卡上可能只会给一个warning, 有些显卡上则会发生报错/黑屏/死机之类不可预计的事情...另外, Depth stencil texture(参见Hareware shadow map)也有同样的问题, 用完之后要SetTexture(n, NULL)清空, 不然A卡会黑屏/花屏/深度错误, 既使你没有使用, 只要它被寄存器引用了, 显卡还是会当做是正在使用的, 这时就不能做为depth stencil buffer9. RTT如果想保存到文件中, 是不能直接SaveToTexture的. …

葉師傅贏了~~~

今天看了葉問二,無論對手犯規在鐘響後偷拳,還被限制無法使用腳。在總總不公平的環境下,仍然能夠突顯出自己的決心,這才能讓別人明瞭出自己的價值。

真想只專心寫我的shader...

當我的工作跟別人卡在一起,就會被催促先去弄什麼什麼,不然誰的畫面看起來就不能夠先正確。我也知道這樣說可能會不太負責,我們本來就是開發中的引擎,畫面現在不太對也是正常的,我也會盡快去處理的阿...現在引擎很多架構都還在制定中,我真的很想早早把底層給寫完,不然其他人用我們引擎寫shader,還要去動到底層的結構...。 有時候真想躲起來,專心的寫shader就好了,話說最近為了支援lighting,不停的拆解shader讓我有點累了...之後每支援一種效果都要build 14種shader嗎?

John Carmack GDC 2010 Lifetime Award

John Carmack是PC game史上的傳奇人物,隨著他的技術進展,PC game的硬體規格也跟著進化,甚至許多rendering相關的專有名詞,也是跟著他們的成就而被命名。

關於J.C.以及id software他們的自傳,可以閱讀Masters of Doom(DOOM啟示錄)一書來了解,或是觀看猴子靈藥的文章介紹。

《DOOM啟世錄》:遊戲界的傳奇英雄:John Carmack & John Romero
http://blog.monkeypotion.net/book/gamedevbook/masters-of-doom-2

Rage PC Games Interview - John Carmak Interview

Rage at IGN.com

對於John Carmack的傳奇我就不用多說了,光從他獲得今年GDC 2010的Lifetime Achievement Award就可以被了解。

這次公布了他的新引擎id tech 5開發的Rage,使用了mega texture。id的引擎每一代都有它的創新技術,諸如tech1採用了BSP, PVS的scene graph,tech 2支援3D硬體加速以及lightmap,tech 3的fixed-function shader script和dynamic shadow,tech 4支援了shader model,提供了per-pixel lighting and bump mapping,並且採用skeletal animation等等。
http://en.wikipedia.org/wiki/Id_Tech

投上Journal了~~~

等了兩年,我終於等到我以前念碩班的論文,投上SCI級的Journal了~~~

Alan Wake -Building the Technology- Trailer

Alan Wake(心靈殺手)這部2010年三月release的Xbox 360獨占遊戲,其開發商為Remedy Entertainment,公司團隊位於芬蘭,團隊創立於1995年,已具有15年的開發經驗,近年來的作品為Max Payne(江湖本色)系列。
http://www.remedygames.com/

這次本作品的引擎為自行開發,由於市面上的引擎無法滿足其光影以及技術上的需求。

我的shader架構訂錯了?...

最近每天上班都越來越不像在寫shader code,反而像是在build shader code。因為我很不想把shader拆成很多份,但是隨著功能的需求,已經變成不拆不行了...我現在每天都要拆個七八份去rebuild...

還有我覺得我的shader相關的script訂的可能也不是很周密,今天做相關tool的人跟我抱怨:早叫你當初就訂的完備一點,現在可能整體工具介面都要被改掉...我知道這已經是他很重的口氣了,但是當初的我也是第一次做,我沒有那個經驗可以弄得這麼縝密...好吧~~~希望能在這個prototype後找出所有的問題點,一口氣被改正@@~

數學好的人較可能領高薪

過去一年中階人員平均最高薪的前五名是數學家、精算師、軟體工程師、電腦系統分析師、統計學家,他們平均年薪可以在七十三萬美元以上。不過要進這幾行,優異的數學能力將 是必備。《華爾街日報》特別分析,在金融業紛紛裁員減薪時,數學家、精算師反而不怕。因為此 刻企業需要他們的精準的評估損失、計算風險報酬,反而穩坐高薪一族。有好的數學頭腦進入金融、軟體、工程等領域,是未來要拿到高薪的保證。http://tw.news.yahoo.com/article/url/d/a/100430/122/24tno.html
.....七十三萬美元會不會太多了一點,我懷疑是不是數字有問題。

老大請吃飯

今天老大說我們做的產品在北美已經通過發行認證了,所以中午請我們吃飯,吃的好飽喔~~這個產品其實我才幫忙兩個禮拜而已,個人貢獻度有限,有這樣的結果 多虧了同事們的努力,真是辛苦了!!

Shader use register constant array index

Today I wonder if using register constant array index will influence performance. I design my experiment is a three point light shading shader:
Lighting color, intensity, and position all them use three constant floating to set into shader separately.
Use color, intensity, and position array, their size is three for shader computing.In the experiment result, they all have the same frame rate are 2725. So in my experiment, using register constant array index doesn't has performance penalty.

The experiment of shader programming with if-condition instruction

Today I make an experiment to talk about the if-else syntax in shader codes, how this way influences my application performance.
The first one has three shaders are one point light, two point lights, and three point lights. We apply them to three different surface model.The second one, we use a shader code which writes three if condition instructions to show three types light which also be applied to three different surface model(same with upper experiment).
Finally, the experiment results show us the first one has 2639 FPS, the other is 1762 FPS. Although the second one use the same code, we can avoid reset shader to device. It tells us the if instruction will effect our performance, and it hurts degree more than reset shader to device. So all we can do is avoid to use if instruction possibly.

好想買ShaderX7

最近在研究如何規劃lighting系統,和實作shadow map的演算法。剛好這本書都有提到我想探知的topic。如果有誰有購買的可以借我翻一翻嗎?...最近的我有點窮

Shadow map develope (I)

Image
Shadow map can help us generate full-screen shadow result. It can achieve natural self-shadow effect。But its bottleneck is hard shadow and resolution problem。
First, I want to use this dx9 sample to describe shadow map implement, shadow map is a multiple pass technique, so I divide into different term to decribe shadow map。
The first pass, we need to generate the shadow map. Store the current render target and DepthStencil, then set render target to the shadow map texture, then tranform view space to lighting's view space(including projection space), write the depth value to the render target, finally resort previous render target and DepthStencil。The End pass, render the scene. Use camera's view space and projection space, set the shadow map texture, generate the shadow map martix that means transform from view space to lighting space. And in the pixel shader, we can at shadow map space to sink our shadow map texel, and using pcf(percentage closest filtering) to blur…

NVemulate

NVemulate allows you to emulate the functionality of various GPUs (very slowly) in software. In addition, you can use it to control GLSL Support and Open GL 3.0 Support.

http://developer.nvidia.com/object/nvemulate.html

Dynamic Branching in Shader

Static branching vs. Dynamic branching:
Static one to switch on or off, based of a boolean shader constant, for disable or enable a code path。Between draw calls, you can decude which features you want to support, and set the Boolean flags to support this behavior。Dynamic one the comparison condition resides in a variable, done for each vertex or each pixel at run time( not at compiling time or between two draw calls)。The performance hit is the cost of branch + the cost of the instructions on the side of the branch token。Implemented in shader model 3.0 or higher。
http://msdn.microsoft.com/en-us/library/bb944006%28VS.85%29.aspx

該討論串討論: 當GLSL在Nvidia的GPU時,遇到了雙層迴圈,執行8x8的braching運算,發生效能剩下2fps的問題。 當對於looping做unroll的動作,發現又可以回到30fps。
因此一開始猜想的內容是對於GPU的branching能力,後來發現arry index中使用了constant register index才是影響效能的關鍵,當將array的內容儲存在一個texture上,或是改用uniform array of vec3's都可以獲得改善。
http://www.gamedev.net/community/forums/topic.asp?topic_id=559196


此討論串:討論了使用if condition可能會造成執行了both branching的問題,因為compiler判斷…

The first article

這是我的第一篇文章,在這個空間我想要將我所學習到的程式經驗給記錄下來,記錄著自己一點一點進步留下的痕跡~~~