Optimization

From DeSmuME
(Difference between revisions)
Jump to: navigation, search
m (Windows)
(Linux)
Line 45: Line 45:
  
 
==Building with specific flags==
 
==Building with specific flags==
To build with specific flags you have to set the environment variables CFLAGS, CXXFLAGS and LDFLAGS while executing "./configure", e.g.:
+
To build with specific flags you have to set the environment variables <code>CFLAGS</code>, <code>CXXFLAGS</code> and <code>LDFLAGS</code> while executing <code>./configure</code>, e.g.:
::CFLAGS="-O3" CXXFLAGS="-O3" LDFLAGS="-O3" ./configure
+
 
 +
CFLAGS="-O3" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure
  
 
==Generic flags==
 
==Generic flags==
* -O2 or -O3
+
=== <code>-O2</code> or <code>-O3</code> ===
:Level of general optimizations. -O3 will do more excessive optimization. You should use -O2 instead, if you think that some optimizations break compatibility.
+
Level of general optimizations. -O3 will do more excessive optimization. Test them yourself to see if it provides speed boost. You should use -O2 instead, if you think that some optimizations break compatibility.
* -ffastmath
+
=== <code>-ffastmath</code> ===
:Improves math performance but may break compatibility to common floating point number standards
+
Improves math performance but may break compatibility to common floating point number standards
* -funroll-loops
+
=== <code>-funroll-loops</code> ===
:Unrolls loops
+
Unrolls loops, may increase execution speed.
* -combine
+
=== <code>-combine</code> ===
:Deals with C source (not C++) like it was a single file
+
Deals with C source (not C++) like it was a single file. ''Don't use this if your compiler supports linker-time optimization.''
 +
=== <code>-flto</code> and <code>-fuse-linker-plugin</code> ===
 +
Enables linker-time optimization. On my computer it provides a huge speed boost. However it increases compilation time by *a lot*.
 +
You can specify <code>-flto='''n'''</code> where '''n''' is the number of parallel jobs to perform the optimization. You can substitute it with the number of cores or threads of your CPU.
 +
Note that you should not use <code>-combine</code> because linker-time optimization already does what it does.
  
 
==Machine specific flags==
 
==Machine specific flags==
To make GCC produce optimized code for your machine (e.g. using SSE units for calculations) you may set the -march=xxx option. Find the right choice for your CPU model in this list: [[http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html]].
+
To make GCC produce optimized code for your machine (e.g. using SSE units for calculations) you may set the -march=xxx option. Find the right choice for your CPU model in this list: [[http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html]] (Alvin: dead link?).
  
-minline-all-stringops will increase performance of array operations like copying on machines with multi-byte words, like x86 or AMD64 (Intel64).
+
<code>-minline-all-stringops</code> will increase performance of array operations like copying on machines with multi-byte words, like x86 or AMD64 (Intel64).
  
 
==Profile guided build==
 
==Profile guided build==
* Use -fprofile-generate flag first to build a executable that collects information about execution
+
* Use <code>-fprofile-generate</code> flag first to build a executable that collects information about execution
* Run some tests (like loading some game states - the more different things are tested, the better the result)
+
* Run some tests (like loading some game states - the more different things are tested, the better the result). Basically what is in the above [[#Windows]] section.
* Use -fprofile-use flag to build the executable by using the collected information for optimization
+
* Use <code>-fprofile-use</code> flag to build the executable by using the collected information for optimization
  
 
==Example==
 
==Example==
I personally use the following flags for CFLAGS, CXXFLAGS and LDFLAGS and they work fine for me. (I am mostly running Chinatown Wars at the moment). My CPU is a Intel Core 2 Duo with Ubuntu 9.10 Karmic (amd64 port) running.  
+
Erlenmayr: I personally use the following flags for CFLAGS, CXXFLAGS and LDFLAGS and they work fine for me. (I am mostly running Chinatown Wars at the moment). My CPU is a Intel Core 2 Duo with Ubuntu 9.10 Karmic (amd64 port) running.  
 
* building program version to generate profile (this will be slower)
 
* building program version to generate profile (this will be slower)
:CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -combine -fprofile-generate -pg"
+
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -combine -fprofile-generate -pg"
 
* building the final program
 
* building the final program
:CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -fprofile-use -combine"
+
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -fprofile-use -combine"
 +
 
 +
Alvin: I use this:
 +
CFLAGS="-O3 -flto=4 -fuse-linker-plugin -funroll-loops -march=native -mfpmath=sse -msse -msse2" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure --enable-hud

Revision as of 05:59, 4 March 2014

Contents

Windows

Here is how to make a profiler-guided build (Please note that this is not available in the Windows 'Express' Editions of MSVC):


rightclick the project and pick profiler-guided optimization > instrument. Once it is done go back to the menu and pick "run instrumented/optimized application". You can repeat this as many times as you need in order to do the profiling runs.


The idea here is to exercise every different part of the emulator at least once. Once is all it takes. But be aware that exercising each code path in the GPU, for example, would take lots of 2d games. (one day I would like to make a list of games to do in order to cover each part, but that hasnt been done yet).


When you are done, go back to the menu and pick "optimize". The standard location for the output .exe will contain the profiler-guided build. This build should run a few fps faster. Another step to How to make your PC safer and faster


Here are the list of steps to be followed as use cases while crafting a profiler-guided-optimized build.

  • run with --num-procs=1 at least one
  • Run SPP menus and ingame (tests arm cpu, 2d)
  • run with and without sound; toggle streaming modes and methods and all interpolation optoins
  • save and load
  • use each tool
  • use with/without advanced bus timing
  • toggle both gpus
  • make sure hud shows some stuff (especially fps and frame counter)
  • load a movie
  • run hulk (tests thumb cpu, 3d)
  • switch to each 3d core (toggle the soft rasterizer interpolation option)
  • scale and rotate window each way. use the screen separations. use resize filters.
  • load ff4, show worldmap and menu (more 3d and 2d tests)
  • run a starfy underground level (eg 5-1) to make sure you test sound capture. also pause it to get some edge marking
  • load pokemon white title to get some video capture going
  • dump an avi


It is generally best to keep savestates sprinkled around games in each game mode so that you can quickly exercise them all. Only a frame or two is necessary to exercise the mode.


When you're done, you may want to upx the executable to cut its size down and alert dumbasses virus scanners. just use "upx -9 desmume.exe".

Linux

The GCC is also capable of doing some optimizations. In general there are three types: generic, machine specific and profile guided optimization. I am going to describe some optimizations that I tried out and worked fine for me.

Building with specific flags

To build with specific flags you have to set the environment variables CFLAGS, CXXFLAGS and LDFLAGS while executing ./configure, e.g.:

CFLAGS="-O3" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure

Generic flags

-O2 or -O3

Level of general optimizations. -O3 will do more excessive optimization. Test them yourself to see if it provides speed boost. You should use -O2 instead, if you think that some optimizations break compatibility.

-ffastmath

Improves math performance but may break compatibility to common floating point number standards

-funroll-loops

Unrolls loops, may increase execution speed.

-combine

Deals with C source (not C++) like it was a single file. Don't use this if your compiler supports linker-time optimization.

-flto and -fuse-linker-plugin

Enables linker-time optimization. On my computer it provides a huge speed boost. However it increases compilation time by *a lot*. You can specify -flto=n where n is the number of parallel jobs to perform the optimization. You can substitute it with the number of cores or threads of your CPU. Note that you should not use -combine because linker-time optimization already does what it does.

Machine specific flags

To make GCC produce optimized code for your machine (e.g. using SSE units for calculations) you may set the -march=xxx option. Find the right choice for your CPU model in this list: [[1]] (Alvin: dead link?).

-minline-all-stringops will increase performance of array operations like copying on machines with multi-byte words, like x86 or AMD64 (Intel64).

Profile guided build

  • Use -fprofile-generate flag first to build a executable that collects information about execution
  • Run some tests (like loading some game states - the more different things are tested, the better the result). Basically what is in the above #Windows section.
  • Use -fprofile-use flag to build the executable by using the collected information for optimization

Example

Erlenmayr: I personally use the following flags for CFLAGS, CXXFLAGS and LDFLAGS and they work fine for me. (I am mostly running Chinatown Wars at the moment). My CPU is a Intel Core 2 Duo with Ubuntu 9.10 Karmic (amd64 port) running.

  • building program version to generate profile (this will be slower)
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -combine -fprofile-generate -pg"
  • building the final program
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -fprofile-use -combine"

Alvin: I use this:

CFLAGS="-O3 -flto=4 -fuse-linker-plugin -funroll-loops -march=native -mfpmath=sse -msse -msse2" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure --enable-hud
Personal tools