2012年3月9日 星期五

Bring AAA graphics to mobile platforms




Hardware: ImgTex SGX GPU
Software: how to bring console graphics to mobile platform.

Shaders, RTT, depth texture , MSAA

Architecture
- tile-based deferred rendering GPU (ARM Mali, SGX)

Tiled -based
- split the screen into tiles, 16x16 or 32x32 pixels

GPU fits an entire tile on chip - doesn't have frame buffer memory

Process all draw calls for one tile
 - repeat each tile

Each tile is written to memory as all finished

Vertex frontend
  vertex primitives to GPU cores
     - split vertex

vertex preshader:  fetch input data ( attribute and uniforms)
vertex shader: Universal scalable shader engine, the same for the pixel 
             - mutli-thread

Tiling:
   Optimizes vertex shader output
    Bins resulting primitives into tile data

parameter buffer
- stored in sys. memory
- don't want to overflow buffer!! ( will need to flush )

Pixel frontend


- reads parameter buffer ( reading tile data from vertex processing)
-distribute  to all cores
   - one tile at a  time
  - a tile is in full one core
  - process all tiles until finish


Pixel setup
- receive tile commands 
- fetch vertex shader
-triangle rasterization
- Hidden surface removal- depth, stencil test

Pixel pre shader
- fills in interpolator and uniform data
- kicks off non-depdendent data
Pixel shader
- multi thread ALUs
-each thread can be vertex or pixel;
- multiple USSEs in each GPU core

Pixel backend
 - trigger when all pixel in tiles are done
- performs data conversions, MSAA
- write finish.

Shader Unit Caveats


- without dynamic flow-control
- with dynamic flow-control

Alpha-blending
- not separate specialize GPU

Mobile is the new PC
- wide feature and performance range
-scalable graphics
-user graphics setting
     - low/med/high/ultra
     - render buffer size scaling

Render target is on die
 - MSAA is cheap and use less memory
  - only data in RAM
 - Have 0-5 ms cost
  - Be wary of buffer restores ( color or depth)

 - see usage case for shadows

No bandwidth cost for alpha blending

Free hidden surface removal

Mobile vs Console
 - very large CPU overhead for OpenGLES API
     - max CPU usage at 100-300 draw calls

- avoid too mush data per scene

- shader patching
   - reduce render state change


Alpha test / discard
 - conditional z write is very slow

Render buffer management
 - each RT is a whole new scene
 - avoid switchRT back and forth
 - can cause a new restore 
 -                     new resolve

- avoid buffer restore
  - clear everything, color depth stencil
 - a clear just set some dirty bits in register

  - avoid buffer resolve
 - use discard extension discard_framebuffer

unnecessary different FBO combos
  - don't let driver think it needs to start resolving and restoring any buffers~~!!!

texture lookups
   - don't perform texture lookups in pixel shader
 - let pre shader queue then up ahead of time
- don't compute texture coordinate with math

don't use .zw components for texture coordinate

mobile shader material system: original is too complicated

Mobile material shaders
  - separate shader by mobile
 - Lots of #ifdef

Shader Offline processing
- Run C pre-processor offline
 - reduce ingame copile time
 - eliminate duplicates at offline time

Shader compiling
 - compile all shaders at startup
 - avoids hitching at runtime
- compile on GL thread, while loading on game thread
 - compiling is no enough
   - must issue dummy draw calls
   - how certains states affect shaders
 - avoid shader....??


God Rays on mobile
 - fewer texture fetch

optimize for mobile
  - move all math to VS
  - pass down data through interpolation
-  split radial filter into 4 draw calls: 4x 8 = 32 texture lookups  ???
 - from 30ms to 5 ms

God rays
 - 1st pass
  - 2nd pass
  -3rd
  - 4th
  - 5th
  -6th







Character shadows
  port from xbox
 - projected , modulate dynamic shadow
 - compare scene depth and character depth
  - draw character on top ( no self-shader)

shadow optis
 - shadow depth
    - avoids RT switch ( resolve & restore)
 - Resolve sceneDepth just before shadows*
    - write out tile depth to RAM to read as texture
    - use glDiscardFrameBuffer to avoid resolve
    -  encode depth in F16 / RGBA8 color
  - Draw screen-space quad instead of cube
      - avoid dependent texture lookup

Tool tips
   - Use an OpenGL ES wrapper on PC
         - Almost WYSIWYG
         - debug on visual studio
   - Apple Xcode GL debugger, iOS 5!
         - full capture of one frame
         -  show each drawcall used by each draw call

Next-Generation
  - ImgTex "Rogue" (6xxx series):
                   20x on graphics
            - 100+ GFLOPS
            - DX10, OGLES Halti


沒有留言:

張貼留言