Optimization

From DeSmuME
(Difference between revisions)
Jump to: navigation, search
(Generic flags)
(Example)
 
(10 intermediate revisions by 3 users not shown)
Line 10: Line 10:
  
  
When you are done, go back to the menu and pick "optimize". The standard location for the output .exe will contain the profiler-guided build. This build should run a few fps faster.
+
When you are done, go back to the menu and pick "optimize". The standard location for the output .exe will contain the profiler-guided build. This build should run a few fps faster. Another step to  [http://www.top5optimizers.com/ How to make your PC safer and faster]
  
  
Line 16: Line 16:
 
Here are the list of steps to be followed as use cases while crafting a profiler-guided-optimized build.
 
Here are the list of steps to be followed as use cases while crafting a profiler-guided-optimized build.
  
 
+
* run with --num-procs=1 at least one
 
* Run SPP menus and ingame (tests arm cpu, 2d)
 
* Run SPP menus and ingame (tests arm cpu, 2d)
* run with and without sound; toggle adpcm caching and each interpolation mode
+
* run with and without sound; toggle streaming modes and methods and all interpolation optoins
 
* save and load
 
* save and load
 
* use each tool
 
* use each tool
 +
* use with/without advanced bus timing
 
* toggle both gpus
 
* toggle both gpus
 
* make sure hud shows some stuff (especially fps and frame counter)
 
* make sure hud shows some stuff (especially fps and frame counter)
Line 26: Line 27:
 
* run hulk (tests thumb cpu, 3d)
 
* run hulk (tests thumb cpu, 3d)
 
* switch to each 3d core (toggle the soft rasterizer interpolation option)
 
* switch to each 3d core (toggle the soft rasterizer interpolation option)
* scale and rotate window each way. use the screen separations.
+
* scale and rotate window each way. use the screen separations. use resize filters.
 
* load ff4, show worldmap and menu (more 3d and 2d tests)
 
* load ff4, show worldmap and menu (more 3d and 2d tests)
 +
* run a starfy underground level (eg 5-1) to make sure you test sound capture. also pause it to get some edge marking
 +
* load pokemon white title to get some video capture going
 
* dump an avi
 
* dump an avi
  
Line 42: Line 45:
  
 
==Building with specific flags==
 
==Building with specific flags==
To build with specific flags you have to set the environment variables CFLAGS, CXXFLAGS and LDFLAGS while executing "./configure", e.g.:
+
To build with specific flags you have to set the environment variables <code>CFLAGS</code>, <code>CXXFLAGS</code> and <code>LDFLAGS</code> while executing <code>./configure</code>, e.g.:
::CFLAGS="-O3" CXXFLAGS="-O3" LDFLAGS="-O3" ./configure
+
 
 +
CFLAGS="-O3" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure
  
 
==Generic flags==
 
==Generic flags==
* -O2 or -O3
+
=== <code>-O2</code> or <code>-O3</code> ===
:Level of general optimizations. -O3 will do more excessive optimization. You should use -O2 instead, if you think that some optimizations break compatibility.
+
Level of general optimizations. -O3 will do more excessive optimization. Test them yourself to see if it provides speed boost. You should use -O2 instead, if you think that some optimizations break compatibility.
* -ffastmath
+
 
:Improves math performance but may break compatibility to common floating point number standards
+
=== <code>-ffastmath</code> ===
* -funroll-loops
+
Improves math performance but may break compatibility to common floating point number standards.
:Unrolls loops
+
'''Note that this option may have adverse effect.'''
* -combine
+
 
:Deals with C source (not C++) like it was a single file
+
=== <code>-funroll-loops</code> ===
 +
Unrolls loops. It may improve performance, but if it turns out to generate code larger than the CPU cache, it may slow down.
 +
 
 +
=== <code>-flto</code> and <code>-fuse-linker-plugin</code> ===
 +
Enables link-time optimization. On my computer it provides a huge speed boost. However it increases compilation time by *a lot*.
 +
You can specify <code>-flto='''n'''</code> where '''n''' is the number of parallel jobs to perform the optimization. You can substitute it with the number of cores or threads of your CPU to get a faster compilation time.
  
 
==Machine specific flags==
 
==Machine specific flags==
To make GCC produce optimized code for your machine (e.g. using SSE units for calculations) you may set the -march=xxx option. Find the right choice for your CPU model in this list: [[http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html]].
+
To make GCC produce optimized code for your machine (e.g. using SSE units for calculations) you may set the -march=xxx option. Find the right choice for your CPU model in this list: [[http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html]] (Alvin: dead link?).
 +
 
 +
=== <code>-march=native and -mtune=native</code> ===
 +
Specifies GCC to optimize based on your CPU. Specifying <code>-mtune=native</code> may be able to get even more optimization.
 +
<code>-march=native</code> implies <code>-mtune=native</code>.
 +
 
 +
Note that code compiled with <code>-march=native</code> may not run on other CPUs.
  
-minline-all-stringops will increase performance of array operations like copying on machines with multi-byte words, like x86 or AMD64 (Intel64).
+
=== <code>-minline-all-stringops</code> ===
 +
This will increase performance of array operations like copying on machines with multi-byte words, like x86 or AMD64 (Intel64).
  
 
==Profile guided build==
 
==Profile guided build==
* Use -fprofile-generate flag first to build a executable that collects information about execution
+
* Use <code>-fprofile-generate</code> flag first to build a executable that collects information about execution
* Run some tests (like loading some game states - the more different things are tested, the better the result)
+
* Run some tests (like loading some game states - the more different things are tested, the better the result). Basically what is in the above [[#Windows]] section.
* Use -fprofile-use flag to build the executable by using the collected information for optimization
+
* Use <code>-fprofile-use</code> flag to build the executable by using the collected information for optimization
  
 
==Example==
 
==Example==
I personally use the following flags for CFLAGS, CXXFLAGS and LDFLAGS and they work fine for me. (I am mostly running Chinatown Wars at the moment). My CPU is a Intel Core 2 Duo with Ubuntu 9.10 Karmic (amd64 port) running.  
+
Erlenmayr: I personally use the following flags for CFLAGS, CXXFLAGS and LDFLAGS and they work fine for me. (I am mostly running Chinatown Wars at the moment). My CPU is a Intel Core 2 Duo with Ubuntu 9.10 Karmic (amd64 port) running.  
 
* building program version to generate profile (this will be slower)
 
* building program version to generate profile (this will be slower)
:CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -combine -fprofile-generate -pg"
+
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -combine -fprofile-generate -pg"
 
* building the final program
 
* building the final program
:CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -fprofile-use -combine"
+
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -fprofile-use -combine"
 +
 
 +
Alvin: I use this:
 +
CFLAGS="-O3 -flto=4 -fuse-linker-plugin -funroll-loops -march=native -minline-all-stringops" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure --enable-hud

Latest revision as of 14:53, 5 March 2014

Contents

Windows

Here is how to make a profiler-guided build (Please note that this is not available in the Windows 'Express' Editions of MSVC):


rightclick the project and pick profiler-guided optimization > instrument. Once it is done go back to the menu and pick "run instrumented/optimized application". You can repeat this as many times as you need in order to do the profiling runs.


The idea here is to exercise every different part of the emulator at least once. Once is all it takes. But be aware that exercising each code path in the GPU, for example, would take lots of 2d games. (one day I would like to make a list of games to do in order to cover each part, but that hasnt been done yet).


When you are done, go back to the menu and pick "optimize". The standard location for the output .exe will contain the profiler-guided build. This build should run a few fps faster. Another step to How to make your PC safer and faster


Here are the list of steps to be followed as use cases while crafting a profiler-guided-optimized build.

  • run with --num-procs=1 at least one
  • Run SPP menus and ingame (tests arm cpu, 2d)
  • run with and without sound; toggle streaming modes and methods and all interpolation optoins
  • save and load
  • use each tool
  • use with/without advanced bus timing
  • toggle both gpus
  • make sure hud shows some stuff (especially fps and frame counter)
  • load a movie
  • run hulk (tests thumb cpu, 3d)
  • switch to each 3d core (toggle the soft rasterizer interpolation option)
  • scale and rotate window each way. use the screen separations. use resize filters.
  • load ff4, show worldmap and menu (more 3d and 2d tests)
  • run a starfy underground level (eg 5-1) to make sure you test sound capture. also pause it to get some edge marking
  • load pokemon white title to get some video capture going
  • dump an avi


It is generally best to keep savestates sprinkled around games in each game mode so that you can quickly exercise them all. Only a frame or two is necessary to exercise the mode.


When you're done, you may want to upx the executable to cut its size down and alert dumbasses virus scanners. just use "upx -9 desmume.exe".

Linux

The GCC is also capable of doing some optimizations. In general there are three types: generic, machine specific and profile guided optimization. I am going to describe some optimizations that I tried out and worked fine for me.

Building with specific flags

To build with specific flags you have to set the environment variables CFLAGS, CXXFLAGS and LDFLAGS while executing ./configure, e.g.:

CFLAGS="-O3" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure

Generic flags

-O2 or -O3

Level of general optimizations. -O3 will do more excessive optimization. Test them yourself to see if it provides speed boost. You should use -O2 instead, if you think that some optimizations break compatibility.

-ffastmath

Improves math performance but may break compatibility to common floating point number standards. Note that this option may have adverse effect.

-funroll-loops

Unrolls loops. It may improve performance, but if it turns out to generate code larger than the CPU cache, it may slow down.

-flto and -fuse-linker-plugin

Enables link-time optimization. On my computer it provides a huge speed boost. However it increases compilation time by *a lot*. You can specify -flto=n where n is the number of parallel jobs to perform the optimization. You can substitute it with the number of cores or threads of your CPU to get a faster compilation time.

Machine specific flags

To make GCC produce optimized code for your machine (e.g. using SSE units for calculations) you may set the -march=xxx option. Find the right choice for your CPU model in this list: [[1]] (Alvin: dead link?).

-march=native and -mtune=native

Specifies GCC to optimize based on your CPU. Specifying -mtune=native may be able to get even more optimization. -march=native implies -mtune=native.

Note that code compiled with -march=native may not run on other CPUs.

-minline-all-stringops

This will increase performance of array operations like copying on machines with multi-byte words, like x86 or AMD64 (Intel64).

Profile guided build

  • Use -fprofile-generate flag first to build a executable that collects information about execution
  • Run some tests (like loading some game states - the more different things are tested, the better the result). Basically what is in the above #Windows section.
  • Use -fprofile-use flag to build the executable by using the collected information for optimization

Example

Erlenmayr: I personally use the following flags for CFLAGS, CXXFLAGS and LDFLAGS and they work fine for me. (I am mostly running Chinatown Wars at the moment). My CPU is a Intel Core 2 Duo with Ubuntu 9.10 Karmic (amd64 port) running.

  • building program version to generate profile (this will be slower)
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -combine -fprofile-generate -pg"
  • building the final program
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -fprofile-use -combine"

Alvin: I use this:

CFLAGS="-O3 -flto=4 -fuse-linker-plugin -funroll-loops -march=native -minline-all-stringops" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure --enable-hud
Personal tools