Bring AAA graphics to mobile platforms

- March 08, 2012

Hardware: ImgTex SGX GPU

Software: how to bring console graphics to mobile platform.

Shaders, RTT, depth texture , MSAA

Architecture

- tile-based deferred rendering GPU (ARM Mali, SGX)

Tiled -based

- split the screen into tiles, 16x16 or 32x32 pixels

GPU fits an entire tile on chip - doesn't have frame buffer memory

Process all draw calls for one tile

- repeat each tile

Each tile is written to memory as all finished

Vertex frontend

vertex primitives to GPU cores

- split vertex

vertex preshader: fetch input data ( attribute and uniforms)

vertex shader: Universal scalable shader engine, the same for the pixel

- mutli-thread

Tiling:

Optimizes vertex shader output

Bins resulting primitives into tile data

parameter buffer

- stored in sys. memory

- don't want to overflow buffer!! ( will need to flush )

Pixel frontend

- reads parameter buffer ( reading tile data from vertex processing)

-distribute to all cores

- one tile at a time

- a tile is in full one core

- process all tiles until finish

Pixel setup

- receive tile commands

- fetch vertex shader

-triangle rasterization

- Hidden surface removal- depth, stencil test

Pixel pre shader

- fills in interpolator and uniform data

- kicks off non-depdendent data

Pixel shader

- multi thread ALUs

-each thread can be vertex or pixel;

- multiple USSEs in each GPU core

Pixel backend

- trigger when all pixel in tiles are done

- performs data conversions, MSAA

- write finish.

Shader Unit Caveats

- without dynamic flow-control

- with dynamic flow-control

Alpha-blending

- not separate specialize GPU

Mobile is the new PC

- wide feature and performance range

-scalable graphics

-user graphics setting

- low/med/high/ultra

- render buffer size scaling

Render target is on die

- MSAA is cheap and use less memory

- only data in RAM

- Have 0-5 ms cost

- Be wary of buffer restores ( color or depth)

- see usage case for shadows

No bandwidth cost for alpha blending

Free hidden surface removal

Mobile vs Console

- very large CPU overhead for OpenGLES API

- max CPU usage at 100-300 draw calls

- avoid too mush data per scene

- shader patching

- reduce render state change

Alpha test / discard

- conditional z write is very slow

Render buffer management

- each RT is a whole new scene

- avoid switchRT back and forth

- can cause a new restore

- new resolve

- avoid buffer restore

- clear everything, color depth stencil

- a clear just set some dirty bits in register

- avoid buffer resolve

- use discard extension discard_framebuffer

unnecessary different FBO combos

- don't let driver think it needs to start resolving and restoring any buffers~~!!!

texture lookups

- don't perform texture lookups in pixel shader

- let pre shader queue then up ahead of time

- don't compute texture coordinate with math

don't use .zw components for texture coordinate

mobile shader material system: original is too complicated

Mobile material shaders
- separate shader by mobile
- Lots of #ifdef

Shader Offline processing
- Run C pre-processor offline
- reduce ingame copile time
- eliminate duplicates at offline time

Shader compiling
- compile all shaders at startup
- avoids hitching at runtime
- compile on GL thread, while loading on game thread
- compiling is no enough
- must issue dummy draw calls
- how certains states affect shaders
- avoid shader....??

God Rays on mobile
- fewer texture fetch

optimize for mobile
- move all math to VS
- pass down data through interpolation
- split radial filter into 4 draw calls: 4x 8 = 32 texture lookups ???
- from 30ms to 5 ms

God rays
- 1st pass
- 2nd pass
-3rd
- 4th
- 5th
-6th

Character shadows
port from xbox
- projected , modulate dynamic shadow
- compare scene depth and character depth
- draw character on top ( no self-shader)

shadow optis
- shadow depth
- avoids RT switch ( resolve & restore)
- Resolve sceneDepth just before shadows*
- write out tile depth to RAM to read as texture
- use glDiscardFrameBuffer to avoid resolve
- encode depth in F16 / RGBA8 color
- Draw screen-space quad instead of cube
- avoid dependent texture lookup

Tool tips
- Use an OpenGL ES wrapper on PC
- Almost WYSIWYG
- debug on visual studio
- Apple Xcode GL debugger, iOS 5!
- full capture of one frame
- show each drawcall used by each draw call

Next-Generation
- ImgTex "Rogue" (6xxx series):
20x on graphics
- 100+ GFLOPS
- DX10, OGLES Halti

Search This Blog

Chromatic Coder

Bring AAA graphics to mobile platforms

Comments

Post a Comment

Popular posts from this blog

Drawing textured cube with Vulkan on Android

Fast subsurface scattering

glTF loader for Android Vulkan