Bring AAA graphics to mobile platforms
Hardware: ImgTex SGX GPU
Software: how to bring console graphics to mobile platform.
Shaders, RTT, depth texture , MSAA
Architecture
- tile-based deferred rendering GPU (ARM Mali, SGX)
Tiled -based
- split the screen into tiles, 16x16 or 32x32 pixels
GPU fits an entire tile on chip - doesn't have frame buffer memory
Process all draw calls for one tile
 - repeat each tile
Each tile is written to memory as all finished
  vertex primitives to GPU cores
     - split vertex
vertex preshader:  fetch input data ( attribute and uniforms)
vertex shader: Universal scalable shader engine, the same for the pixel 
             - mutli-thread
Tiling:
   Optimizes vertex shader output
    Bins resulting primitives into tile data
parameter buffer
- stored in sys. memory
- don't want to overflow buffer!! ( will need to flush )
- reads parameter buffer ( reading tile data from vertex processing)
-distribute  to all cores
   - one tile at a  time
  - a tile is in full one core
  - process all tiles until finish
Pixel setup
- receive tile commands 
- fetch vertex shader
-triangle rasterization
- Hidden surface removal- depth, stencil test
Pixel pre shader
- fills in interpolator and uniform data
- kicks off non-depdendent data
Pixel shader
- multi thread ALUs
-each thread can be vertex or pixel;
- multiple USSEs in each GPU core
Pixel backend
 - trigger when all pixel in tiles are done
- performs data conversions, MSAA
- write finish.
- without dynamic flow-control
- with dynamic flow-control
Alpha-blending
- not separate specialize GPU
Mobile is the new PC
- wide feature and performance range
-scalable graphics
-user graphics setting
     - low/med/high/ultra
     - render buffer size scaling
Render target is on die
 - MSAA is cheap and use less memory
  - only data in RAM
 - Have 0-5 ms cost
  - Be wary of buffer restores ( color or depth)
 - see usage case for shadows
No bandwidth cost for alpha blending
Free hidden surface removal
Mobile vs Console
 - very large CPU overhead for OpenGLES API
     - max CPU usage at 100-300 draw calls
- avoid too mush data per scene
- shader patching
   - reduce render state change
Alpha test / discard
 - conditional z write is very slow
Render buffer management
 - each RT is a whole new scene
 - avoid switchRT back and forth
 - can cause a new restore 
 -                     new resolve
- avoid buffer restore
  - clear everything, color depth stencil
 - a clear just set some dirty bits in register
  - avoid buffer resolve
 - use discard extension discard_framebuffer
unnecessary different FBO combos
  - don't let driver think it needs to start resolving and restoring any buffers~~!!!
texture lookups
 - let pre shader queue then up ahead of time
- don't compute texture coordinate with math
don't use .zw components for texture coordinate
mobile shader material system: original is too complicated
Mobile material shaders
- separate shader by mobile
- Lots of #ifdef
Shader Offline processing
- Run C pre-processor offline
- reduce ingame copile time
- eliminate duplicates at offline time
Shader compiling
- compile all shaders at startup
- avoids hitching at runtime
- compile on GL thread, while loading on game thread
- compiling is no enough
- must issue dummy draw calls
- how certains states affect shaders
- avoid shader....??
- separate shader by mobile
- Lots of #ifdef
Shader Offline processing
- Run C pre-processor offline
- reduce ingame copile time
- eliminate duplicates at offline time
Shader compiling
- compile all shaders at startup
- avoids hitching at runtime
- compile on GL thread, while loading on game thread
- compiling is no enough
- must issue dummy draw calls
- how certains states affect shaders
- avoid shader....??
God Rays on mobile
- fewer texture fetch
optimize for mobile
- move all math to VS
- pass down data through interpolation
- split radial filter into 4 draw calls: 4x 8 = 32 texture lookups ???
- from 30ms to 5 ms
God rays
- 1st pass
- 2nd pass
-3rd
- 4th
- 5th
-6th
Character shadows
port from xbox
- projected , modulate dynamic shadow
- compare scene depth and character depth
- draw character on top ( no self-shader)
shadow optis
- shadow depth
- avoids RT switch ( resolve & restore)
- Resolve sceneDepth just before shadows*
- write out tile depth to RAM to read as texture
- use glDiscardFrameBuffer to avoid resolve
- encode depth in F16 / RGBA8 color
- Draw screen-space quad instead of cube
- avoid dependent texture lookup
Tool tips
- Use an OpenGL ES wrapper on PC
- Almost WYSIWYG
- debug on visual studio
- Apple Xcode GL debugger, iOS 5!
- full capture of one frame
- show each drawcall used by each draw call
Next-Generation
- ImgTex "Rogue" (6xxx series):
20x on graphics
- 100+ GFLOPS
- DX10, OGLES Halti
- fewer texture fetch
optimize for mobile
- move all math to VS
- pass down data through interpolation
- split radial filter into 4 draw calls: 4x 8 = 32 texture lookups ???
- from 30ms to 5 ms
God rays
- 1st pass
- 2nd pass
-3rd
- 4th
- 5th
-6th
Character shadows
port from xbox
- projected , modulate dynamic shadow
- compare scene depth and character depth
- draw character on top ( no self-shader)
shadow optis
- shadow depth
- avoids RT switch ( resolve & restore)
- Resolve sceneDepth just before shadows*
- write out tile depth to RAM to read as texture
- use glDiscardFrameBuffer to avoid resolve
- encode depth in F16 / RGBA8 color
- Draw screen-space quad instead of cube
- avoid dependent texture lookup
Tool tips
- Use an OpenGL ES wrapper on PC
- Almost WYSIWYG
- debug on visual studio
- Apple Xcode GL debugger, iOS 5!
- full capture of one frame
- show each drawcall used by each draw call
Next-Generation
- ImgTex "Rogue" (6xxx series):
20x on graphics
- 100+ GFLOPS
- DX10, OGLES Halti














 
 
 
Comments
Post a Comment