Here is how to make a profiler-guided build (Please note that this is not available in the Windows 'Express' Editions of MSVC):
rightclick the project and pick profiler-guided optimization > instrument. Once it is done go back to the menu and pick "run instrumented/optimized application". You can repeat this as many times as you need in order to do the profiling runs.
The idea here is to exercise every different part of the emulator at least once. Once is all it takes. But be aware that exercising each code path in the GPU, for example, would take lots of 2d games. (one day I would like to make a list of games to do in order to cover each part, but that hasnt been done yet).
When you are done, go back to the menu and pick "optimize". The standard location for the output .exe will contain the profiler-guided build. This build should run a few fps faster. Another step to How to make your PC safer and faster
Here are the list of steps to be followed as use cases while crafting a profiler-guided-optimized build.
- run with --num-procs=1 at least one
- Run SPP menus and ingame (tests arm cpu, 2d)
- run with and without sound; toggle streaming modes and methods and all interpolation optoins
- save and load
- use each tool
- use with/without advanced bus timing
- toggle both gpus
- make sure hud shows some stuff (especially fps and frame counter)
- load a movie
- run hulk (tests thumb cpu, 3d)
- switch to each 3d core (toggle the soft rasterizer interpolation option)
- scale and rotate window each way. use the screen separations. use resize filters.
- load ff4, show worldmap and menu (more 3d and 2d tests)
- run a starfy underground level (eg 5-1) to make sure you test sound capture. also pause it to get some edge marking
- load pokemon white title to get some video capture going
- dump an avi
It is generally best to keep savestates sprinkled around games in each game mode so that you can quickly exercise them all. Only a frame or two is necessary to exercise the mode.
When you're done, you may want to upx the executable to cut its size down and alert dumbasses virus scanners. just use "upx -9 desmume.exe".
The GCC is also capable of doing some optimizations. In general there are three types: generic, machine specific and profile guided optimization. I am going to describe some optimizations that I tried out and worked fine for me.
Building with specific flags
To build with specific flags you have to set the environment variables
LDFLAGS while executing
CFLAGS="-O3" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure
Level of general optimizations. -O3 will do more excessive optimization. Test them yourself to see if it provides speed boost. You should use -O2 instead, if you think that some optimizations break compatibility.
Improves math performance but may break compatibility to common floating point number standards. Note that this option may have adverse effect.
Unrolls loops. It may improve performance, but if it turns out to generate code larger than the CPU cache, it may slow down.
Enables link-time optimization. On my computer it provides a huge speed boost. However it increases compilation time by *a lot*.
You can specify
-flto=n where n is the number of parallel jobs to perform the optimization. You can substitute it with the number of cores or threads of your CPU to get a faster compilation time.
Machine specific flags
To make GCC produce optimized code for your machine (e.g. using SSE units for calculations) you may set the -march=xxx option. Find the right choice for your CPU model in this list: [] (Alvin: dead link?).
-march=native and -mtune=native
Specifies GCC to optimize based on your CPU. Specifying
-mtune=native may be able to get even more optimization.
Note that code compiled with
-march=native may not run on other CPUs.
This will increase performance of array operations like copying on machines with multi-byte words, like x86 or AMD64 (Intel64).
Profile guided build
-fprofile-generateflag first to build a executable that collects information about execution
- Run some tests (like loading some game states - the more different things are tested, the better the result). Basically what is in the above #Windows section.
-fprofile-useflag to build the executable by using the collected information for optimization
Erlenmayr: I personally use the following flags for CFLAGS, CXXFLAGS and LDFLAGS and they work fine for me. (I am mostly running Chinatown Wars at the moment). My CPU is a Intel Core 2 Duo with Ubuntu 9.10 Karmic (amd64 port) running.
- building program version to generate profile (this will be slower)
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -combine -fprofile-generate -pg"
- building the final program
CFLAGS="-O3 -march=core2 -minline-all-stringops -funroll-loops -ffast-math -fprofile-use -combine"
Alvin: I use this:
CFLAGS="-O3 -flto=4 -fuse-linker-plugin -funroll-loops -march=native -minline-all-stringops" CXXFLAGS=$CFLAGS LDFLAGS=$CFLAGS ./configure --enable-hud