Here is how to make a profiler-guided build (Please note that this is not available in the Windows 'Express' Editions of MSVC):
rightclick the project and pick profiler-guided optimization > instrument. Once it is done go back to the menu and pick "run instrumented/optimized application". You can repeat this as many times as you need in order to do the profiling runs.
The idea here is to exercise every different part of the emulator at least once. Once is all it takes. But be aware that exercising each code path in the GPU, for example, would take lots of 2d games. (one day I would like to make a list of games to do in order to cover each part, but that hasnt been done yet).
When you are done, go back to the menu and pick "optimize". The standard location for the output .exe will contain the profiler-guided build. This build should run a few fps faster.
Here are the list of steps to be followed as use cases while crafting a profiler-guided-optimized build.
- Run SPP menus and ingame (tests arm cpu, 2d)
- run with and without sound; toggle adpcm caching and each interpolation mode
- save and load
- use each tool
- toggle both gpus
- make sure hud shows some stuff (especially fps and frame counter)
- load a movie
- run hulk (tests thumb cpu, 3d)
- switch to each 3d core (toggle the soft rasterizer interpolation option)
- scale and rotate window each way. use the screen separations.
- load ff4, show worldmap and menu (more 3d and 2d tests)
- dump an avi
It is generally best to keep savestates sprinkled around games in each game mode so that you can quickly exercise them all. Only a frame or two is necessary to exercise the mode.
When you're done, you may want to upx the executable to cut its size down and alert dumbasses virus scanners. just use "upx -9 desmume.exe".
The GCC is also capable of doing some optimizations. In general there are three types: generic, machine specific and profile guided optimization. I am going to describe some optimizations that I tried out and worked fine for me.
Building with specific flags
To build with specific flags you have to set the environment variables CFLAGS, CXXFLAGS and LDFLAGS while executing "./configure", e.g.:
- CFLAGS="-O3" CXXFLAGS="-O3" LDFLAGS="-O3" ./configure
- -O2 or -O3
- Level of general optimizations. -O3 will do more excessive optimization. You should use -O2 instead, if you think that some optimizations break compatibility.
- Improves math performance but may break compatibility to common floating point number standards
- Unrolls loops
- Deals with C source (not C++) like it was a single file
Machine specific flags
To make GCC produce optimized code for your machine (e.g. using SSE units for calculations) you may set the -march=xxx option. Find the right choice for your CPU model in this list: [].
-minline-all-stringops will increase performance of array operations like copying on machines with multi-byte words, like x86 or AMD64 (Intel64).
Profile guided build
- Use -fprofile-generate flag first to build a executable that collects information about execution
- Run some tests (like loading some game states - the more different things are tested, the better the result)
- Use -fprofile-use flag to build the executable by using the collected information for optimization
I personally use the following flags for CFLAGS, CXXFLAGS and LDFLAGS and they work fine for me. (I am mostly running Chinatown Wars at the moment). My CPU is a Intel Core 2 Duo with Ubuntu 9.10 Karmic (amd64 port) running.
- building program version to generate profile (this will be slower)
- CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -combine -fprofile-generate -pg"
- building the final program
- CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -fprofile-use -combine"