Bring AAA graphics to mobile platforms
Hardware: ImgTex SGX GPU
Software: how to bring console graphics to mobile platform.
Shaders, RTT, depth texture , MSAA
Architecture
- tile-based deferred rendering GPU (ARM Mali, SGX)
Tiled -based
- split the screen into tiles, 16x16 or 32x32 pixels
GPU fits an entire tile on chip - doesn't have frame buffer memory
Process all draw calls for one tile
- repeat each tile
Each tile is written to memory as all finished
vertex primitives to GPU cores
- split vertex
vertex preshader: fetch input data ( attribute and uniforms)
vertex shader: Universal scalable shader engine, the same for the pixel
- mutli-thread
Tiling:
Optimizes vertex shader output
Bins resulting primitives into tile data
parameter buffer
- stored in sys. memory
- don't want to overflow buffer!! ( will need to flush )
- reads parameter buffer ( reading tile data from vertex processing)
-distribute to all cores
- one tile at a time
- a tile is in full one core
- process all tiles until finish
Pixel setup
- receive tile commands
- fetch vertex shader
-triangle rasterization
- Hidden surface removal- depth, stencil test
Pixel pre shader
- fills in interpolator and uniform data
- kicks off non-depdendent data
Pixel shader
- multi thread ALUs
-each thread can be vertex or pixel;
- multiple USSEs in each GPU core
Pixel backend
- trigger when all pixel in tiles are done
- performs data conversions, MSAA
- write finish.
- without dynamic flow-control
- with dynamic flow-control
Alpha-blending
- not separate specialize GPU
Mobile is the new PC
- wide feature and performance range
-scalable graphics
-user graphics setting
- low/med/high/ultra
- render buffer size scaling
Render target is on die
- MSAA is cheap and use less memory
- only data in RAM
- Have 0-5 ms cost
- Be wary of buffer restores ( color or depth)
- see usage case for shadows
No bandwidth cost for alpha blending
Free hidden surface removal
Mobile vs Console
- very large CPU overhead for OpenGLES API
- max CPU usage at 100-300 draw calls
- avoid too mush data per scene
- shader patching
- reduce render state change
Alpha test / discard
- conditional z write is very slow
Render buffer management
- each RT is a whole new scene
- avoid switchRT back and forth
- can cause a new restore
- new resolve
- avoid buffer restore
- clear everything, color depth stencil
- a clear just set some dirty bits in register
- avoid buffer resolve
- use discard extension discard_framebuffer
unnecessary different FBO combos
- don't let driver think it needs to start resolving and restoring any buffers~~!!!
texture lookups
- let pre shader queue then up ahead of time
- don't compute texture coordinate with math
don't use .zw components for texture coordinate
mobile shader material system: original is too complicated
Mobile material shaders
- separate shader by mobile
- Lots of #ifdef
Shader Offline processing
- Run C pre-processor offline
- reduce ingame copile time
- eliminate duplicates at offline time
Shader compiling
- compile all shaders at startup
- avoids hitching at runtime
- compile on GL thread, while loading on game thread
- compiling is no enough
- must issue dummy draw calls
- how certains states affect shaders
- avoid shader....??
- separate shader by mobile
- Lots of #ifdef
Shader Offline processing
- Run C pre-processor offline
- reduce ingame copile time
- eliminate duplicates at offline time
Shader compiling
- compile all shaders at startup
- avoids hitching at runtime
- compile on GL thread, while loading on game thread
- compiling is no enough
- must issue dummy draw calls
- how certains states affect shaders
- avoid shader....??
God Rays on mobile
- fewer texture fetch
optimize for mobile
- move all math to VS
- pass down data through interpolation
- split radial filter into 4 draw calls: 4x 8 = 32 texture lookups ???
- from 30ms to 5 ms
God rays
- 1st pass
- 2nd pass
-3rd
- 4th
- 5th
-6th
Character shadows
port from xbox
- projected , modulate dynamic shadow
- compare scene depth and character depth
- draw character on top ( no self-shader)
shadow optis
- shadow depth
- avoids RT switch ( resolve & restore)
- Resolve sceneDepth just before shadows*
- write out tile depth to RAM to read as texture
- use glDiscardFrameBuffer to avoid resolve
- encode depth in F16 / RGBA8 color
- Draw screen-space quad instead of cube
- avoid dependent texture lookup
Tool tips
- Use an OpenGL ES wrapper on PC
- Almost WYSIWYG
- debug on visual studio
- Apple Xcode GL debugger, iOS 5!
- full capture of one frame
- show each drawcall used by each draw call
Next-Generation
- ImgTex "Rogue" (6xxx series):
20x on graphics
- 100+ GFLOPS
- DX10, OGLES Halti
- fewer texture fetch
optimize for mobile
- move all math to VS
- pass down data through interpolation
- split radial filter into 4 draw calls: 4x 8 = 32 texture lookups ???
- from 30ms to 5 ms
God rays
- 1st pass
- 2nd pass
-3rd
- 4th
- 5th
-6th
Character shadows
port from xbox
- projected , modulate dynamic shadow
- compare scene depth and character depth
- draw character on top ( no self-shader)
shadow optis
- shadow depth
- avoids RT switch ( resolve & restore)
- Resolve sceneDepth just before shadows*
- write out tile depth to RAM to read as texture
- use glDiscardFrameBuffer to avoid resolve
- encode depth in F16 / RGBA8 color
- Draw screen-space quad instead of cube
- avoid dependent texture lookup
Tool tips
- Use an OpenGL ES wrapper on PC
- Almost WYSIWYG
- debug on visual studio
- Apple Xcode GL debugger, iOS 5!
- full capture of one frame
- show each drawcall used by each draw call
Next-Generation
- ImgTex "Rogue" (6xxx series):
20x on graphics
- 100+ GFLOPS
- DX10, OGLES Halti
Comments
Post a Comment