Skip to main content

Bring AAA graphics to mobile platforms

Hardware: ImgTex SGX GPU
Software: how to bring console graphics to mobile platform.

Shaders, RTT, depth texture , MSAA

- tile-based deferred rendering GPU (ARM Mali, SGX)

Tiled -based
- split the screen into tiles, 16x16 or 32x32 pixels

GPU fits an entire tile on chip - doesn't have frame buffer memory

Process all draw calls for one tile
 - repeat each tile

Each tile is written to memory as all finished

Vertex frontend
  vertex primitives to GPU cores
     - split vertex

vertex preshader:  fetch input data ( attribute and uniforms)
vertex shader: Universal scalable shader engine, the same for the pixel 
             - mutli-thread

   Optimizes vertex shader output
    Bins resulting primitives into tile data

parameter buffer
- stored in sys. memory
- don't want to overflow buffer!! ( will need to flush )

Pixel frontend

- reads parameter buffer ( reading tile data from vertex processing)
-distribute  to all cores
   - one tile at a  time
  - a tile is in full one core
  - process all tiles until finish

Pixel setup
- receive tile commands 
- fetch vertex shader
-triangle rasterization
- Hidden surface removal- depth, stencil test

Pixel pre shader
- fills in interpolator and uniform data
- kicks off non-depdendent data
Pixel shader
- multi thread ALUs
-each thread can be vertex or pixel;
- multiple USSEs in each GPU core

Pixel backend
 - trigger when all pixel in tiles are done
- performs data conversions, MSAA
- write finish.

Shader Unit Caveats

- without dynamic flow-control
- with dynamic flow-control

- not separate specialize GPU

Mobile is the new PC
- wide feature and performance range
-scalable graphics
-user graphics setting
     - low/med/high/ultra
     - render buffer size scaling

Render target is on die
 - MSAA is cheap and use less memory
  - only data in RAM
 - Have 0-5 ms cost
  - Be wary of buffer restores ( color or depth)

 - see usage case for shadows

No bandwidth cost for alpha blending

Free hidden surface removal

Mobile vs Console
 - very large CPU overhead for OpenGLES API
     - max CPU usage at 100-300 draw calls

- avoid too mush data per scene

- shader patching
   - reduce render state change

Alpha test / discard
 - conditional z write is very slow

Render buffer management
 - each RT is a whole new scene
 - avoid switchRT back and forth
 - can cause a new restore 
 -                     new resolve

- avoid buffer restore
  - clear everything, color depth stencil
 - a clear just set some dirty bits in register

  - avoid buffer resolve
 - use discard extension discard_framebuffer

unnecessary different FBO combos
  - don't let driver think it needs to start resolving and restoring any buffers~~!!!

texture lookups
   - don't perform texture lookups in pixel shader
 - let pre shader queue then up ahead of time
- don't compute texture coordinate with math

don't use .zw components for texture coordinate

mobile shader material system: original is too complicated

Mobile material shaders
  - separate shader by mobile
 - Lots of #ifdef

Shader Offline processing
- Run C pre-processor offline
 - reduce ingame copile time
 - eliminate duplicates at offline time

Shader compiling
 - compile all shaders at startup
 - avoids hitching at runtime
- compile on GL thread, while loading on game thread
 - compiling is no enough
   - must issue dummy draw calls
   - how certains states affect shaders
 - avoid shader....??

God Rays on mobile
 - fewer texture fetch

optimize for mobile
  - move all math to VS
  - pass down data through interpolation
-  split radial filter into 4 draw calls: 4x 8 = 32 texture lookups  ???
 - from 30ms to 5 ms

God rays
 - 1st pass
  - 2nd pass
  - 4th
  - 5th

Character shadows
  port from xbox
 - projected , modulate dynamic shadow
 - compare scene depth and character depth
  - draw character on top ( no self-shader)

shadow optis
 - shadow depth
    - avoids RT switch ( resolve & restore)
 - Resolve sceneDepth just before shadows*
    - write out tile depth to RAM to read as texture
    - use glDiscardFrameBuffer to avoid resolve
    -  encode depth in F16 / RGBA8 color
  - Draw screen-space quad instead of cube
      - avoid dependent texture lookup

Tool tips
   - Use an OpenGL ES wrapper on PC
         - Almost WYSIWYG
         - debug on visual studio
   - Apple Xcode GL debugger, iOS 5!
         - full capture of one frame
         -  show each drawcall used by each draw call

  - ImgTex "Rogue" (6xxx series):
                   20x on graphics
            - 100+ GFLOPS
            - DX10, OGLES Halti


Popular posts from this blog

tex2D vs. tex2Dproj

texCoord(  texX, texY, texZ , texW ) means texture coordinate after transforming from the texture matrix [ position goes through world, view, projection, and texture ] . We can use texCoord to fetch texture: 1.  float4 color = tex2D( sampler,  float2( texX / texW, texY/texW ); 2.  float4 color = tex2Dporj( sampler, float4( texX, texY, texZ, texW ) ); The top-two methods will have the same result. Because tex2Dproj operator supports divide w in its interface.

Fast subsurface scattering

Fig.1 - Fast Subsurface scattering of Stanford Bunny Based on the implementation of three.js . It provides a cheap, fast, and convincing approach to do ray-tracing in translucent surfaces. It refers the sharing in GDC 2011  [1], and the approach is used by Frostbite 2 and Unity engines [1][2][3]. Traditionally, when a ray intersects with surfaces, it needs to calculate the bouncing result after intersections. Materials can be divided into three types roughly.  Opaque , lights can't go through its geometry and the ray will be bounced back.  Transparency , the ray passes and allow it through the surface totally, it probably would loose a little energy after leaving.  Translucency , the ray after entering the surface will be bounced internally like below Fig. 2. Fig.2 - BSSRDF [1] In the case of translucency, we have several subsurface scattering approaches to solve our problem. When a light is traveling inside the shape, that needs to consider the diffuse value influe

Bringing Large Scale Console Game to iOS

Y8ObbsOsNQEhMCEhMCEt The Bard's Tale Why? Device are fast enough Market segment is not as crowded Hugh potential user base Rich back data log of content Potentially very low cost Session1: port process - application framework  - development workflow Session 2 port-port - performance/memory opt. OpenGL Cocoa Touch App Bring it all together Workflow: Data deployment OpenAL limit 32 active sources, DirectSound 256 channels on XBOX -Sluggish -5.8G -> 2Glimited on iOS 60Hz?    - terriable for battery life 30 Hz   - Game may depend on 60Hz Hybrid    -   60hz, functionsliity intact   - 30hz : low GPU/ save energy    -60hz, at 5th device is optional VFP?? Candidate for NEON SIMD SGX GPUs: a few stats Render opt. -minimal vertex format size -texture size and mipmapping Alpha test and SGX - Fragment discard expensive -huge impact on fill rate -use alpha blend at all costs Eliminating Alpha blend