Optimization

From DeSmuME
Revision as of 14:24, 5 March 2014 by AlvinWong (Talk | contribs)
Jump to: navigation, search

Contents

Windows

Here is how to make a profiler-guided build (Please note that this is not available in the Windows 'Express' Editions of MSVC):


rightclick the project and pick profiler-guided optimization > instrument. Once it is done go back to the menu and pick "run instrumented/optimized application". You can repeat this as many times as you need in order to do the profiling runs.


The idea here is to exercise every different part of the emulator at least once. Once is all it takes. But be aware that exercising each code path in the GPU, for example, would take lots of 2d games. (one day I would like to make a list of games to do in order to cover each part, but that hasnt been done yet).


When you are done, go back to the menu and pick "optimize". The standard location for the output .exe will contain the profiler-guided build. This build should run a few fps faster. Another step to How to make your PC safer and faster


Here are the list of steps to be followed as use cases while crafting a profiler-guided-optimized build.

  • run with --num-procs=1 at least one
  • Run SPP menus and ingame (tests arm cpu, 2d)
  • run with and without sound; toggle streaming modes and methods and all interpolation optoins
  • save and load
  • use each tool
  • use with/without advanced bus timing
  • toggle both gpus
  • make sure hud shows some stuff (especially fps and frame counter)
  • load a movie
  • run hulk (tests thumb cpu, 3d)
  • switch to each 3d core (toggle the soft rasterizer interpolation option)
  • scale and rotate window each way. use the screen separations. use resize filters.
  • load ff4, show worldmap and menu (more 3d and 2d tests)
  • run a starfy underground level (eg 5-1) to make sure you test sound capture. also pause it to get some edge marking
  • load pokemon white title to get some video capture going
  • dump an avi


It is generally best to keep savestates sprinkled around games in each game mode so that you can quickly exercise them all. Only a frame or two is necessary to exercise the mode.


When you're done, you may want to upx the executable to cut its size down and alert dumbasses virus scanners. just use "upx -9 desmume.exe".

Linux

The GCC is also capable of doing some optimizations. In general there are three types: generic, machine specific and profile guided optimization. I am going to describe some optimizations that I tried out and worked fine for me.

Building with specific flags

To build with specific flags you have to set the environment variables CFLAGS, CXXFLAGS and LDFLAGS while executing ./configure, e.g.:

CFLAGS="-O3" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure

Generic flags

-O2 or -O3

Level of general optimizations. -O3 will do more excessive optimization. Test them yourself to see if it provides speed boost. You should use -O2 instead, if you think that some optimizations break compatibility.

-ffastmath

Improves math performance but may break compatibility to common floating point number standards. Note that this option may have adverse effect.

-funroll-loops

Unrolls loops. It may improve performance, but if it turns out to generate code larger than the CPU cache, it may slow down.

-flto and -fuse-linker-plugin

Enables link-time optimization. On my computer it provides a huge speed boost. However it increases compilation time by *a lot*. You can specify -flto=n where n is the number of parallel jobs to perform the optimization. You can substitute it with the number of cores or threads of your CPU to get a faster compilation time.

Machine specific flags

To make GCC produce optimized code for your machine (e.g. using SSE units for calculations) you may set the -march=xxx option. Find the right choice for your CPU model in this list: [[1]] (Alvin: dead link?).

-march=native and -mtune=native

Specifies GCC to optimize based on your CPU. Specifying -mtune=native may be able to get even more optimization.

-minline-all-stringops

This will increase performance of array operations like copying on machines with multi-byte words, like x86 or AMD64 (Intel64).

Profile guided build

  • Use -fprofile-generate flag first to build a executable that collects information about execution
  • Run some tests (like loading some game states - the more different things are tested, the better the result). Basically what is in the above #Windows section.
  • Use -fprofile-use flag to build the executable by using the collected information for optimization

Example

Erlenmayr: I personally use the following flags for CFLAGS, CXXFLAGS and LDFLAGS and they work fine for me. (I am mostly running Chinatown Wars at the moment). My CPU is a Intel Core 2 Duo with Ubuntu 9.10 Karmic (amd64 port) running.

  • building program version to generate profile (this will be slower)
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -combine -fprofile-generate -pg"
  • building the final program
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -fprofile-use -combine"

Alvin: I use this:

CFLAGS="-O3 -flto=4 -fuse-linker-plugin -funroll-loops -march=native -mfpmath=sse -msse -msse2" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure --enable-hud
Personal tools