Computer Vision Software » Blog Archive » Fine tuning of compiler options to increase application performance

Fine tuning of compiler options to increase application performance

Posted on : 21-03-2011 | By : Alexander Permyakov | In : Uncategorized

2

Performance is essential for video analytic applications since algorithms are usually computationally heavy and such systems are supposed to work almost in real time. From one side it can be increased by improving & changing algorithms. This is a major way since it allows to increase performance dramatically. From another side performance can be increased little bit more by relatively simple way – using of good compiler and by tuning of compile options. Let see how it can be done in real programs.

For the first example I used LAME encoder (http://lame.sourceforge.net/) . Why LAME? First of all because it open source and I can recompile it with different compilers and options. In the second place the simplicity of performance measurement. Performance will be a time required to reencode mp3 file. In the third place it shows well determinate results what allow better understand how different compile options affect speed.

The testing has been performed on computers with different CPUs under Windows operation system.

Intel Pentium 4 3GHz
Intel Core 2 Duo 2.8 GHz
AMD Athlon2x4 (635) 2.9 GHz overclocked to 3.3 GHz
Intel Core i5 (2500) 3.3 GHz

Compilation has been done by VisualStudio9 and GCC4.5.1(using MinGW)

Encoding time has been measured 10 times and average value placed to the table.

As the base 0.00% I used safe options (-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse) that will work on most modern AMD and Intel CPUs. Option -march=core2 may use ssse3 instructions and therefore code may fail to work on AMD and Intel Pentimum 4 family CPUs.

Intel Pentium 4 3GHz

Compiler	Compiler options	Average Time	%
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -ffast-math -profile-use	13.206153 sec	-7.83 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -profile-use	13.537400 sec	-5.52 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -ffast-math	13.999892 sec	-2.29 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse	14.328020 sec	0.00 %
GCC4.5.1	-O3 -fomit-frame-pointer -mfpmath=sse	14.621770 sec	2.05 %
Visual_Studio_9	/GS- /fp:fast /O2	14.646769 sec	2.22 %

Optimization to prescott architecture gives 2% speed increase.
–ffast-math gives 2% more
Profile guided optimization gives 5% speed increase

Intel Core 2 Duo 2.8 GHz

Compiler	Compiler options	Average Time	%
GCC4.5.1	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse -ffast-math -profile-use	7.818235 sec	-6.65 %
GCC4.5.1	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse -profile-use	7.824039 sec	-6.58 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -ffast-math -profile-use	7.893243 sec	-5.75 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -profile-use	7.976644 sec	-4.75 %
GCC4.5.1	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse -ffast-math	8.234858 sec	-1.67 %
GCC4.5.1	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse	8.374867 sec	0.00 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -ffast-math	8.415269 sec	0.48 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse	8.423270 sec	0.58 %
Visual_Studio_9	/GS- /fp:fast /O2	8.814092 sec	5.24 %
GCC4.5.1	-O3 -fomit-frame-pointer -mfpmath=sse	9.224519 sec	10.15 %

Optimization to core2 architecture gives 10% speed increase.
–ffast-math gives only 1% increase
Profile guided optimization gives 6% increase

Intel Core i5 (2500) 3.3 GHz

There is no special -march option for core i3,5,7 CPUs. Option -march=core2 can be used for them.

Compiler	Compiler options	Average Time	%
GCC4.5.1	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse -ffast-math -profile-use	4.059390 sec	-8.52 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -ffast-math -profile-use	4.093767 sec	-7.75 %
GCC4.5.1	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse -ffast-math	4.156268 sec	-6.34 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -ffast-math	4.200015 sec	-5.35 %
GCC4.5.1	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse -profile-use	4.253143 sec	-4.15 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -profile-use	4.321892 sec	-2.61 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse	4.437519 sec	0.00 %
GCC4.5.1	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse	4.468770 sec	0.70 %
Visual_Studio_9	/GS- /fp:fast /O2	4.737522 sec	6.76 %
GCC4.5.1	-O3 -fomit-frame-pointer -mfpmath=sse	4.815647 sec	8.52 %

Optimization to core2 architecture gives 9% speed increase.
–ffast-math gives 6% increase
Profile guided optimization gives 4% increase

AMD Athlon2x4 (635) 2.9 GHz overclocked to 3.3 GHz

Compiler	Compiler options	Average Time	%
GCC4.5.1	-O3 -march=amdfam10 -fomit-frame-pointer -mfpmath=sse -ffast-math -profile-use	6.078386 sec	-6.14 %
GCC4.5.1	-O3 -march=amdfam10 -fomit-frame-pointer -mfpmath=sse -ffast-math	6.170114 sec	-4.73 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -ffast-math -profile-use	6.308954 sec	-2.58 %
GCC4.5.1	-O3 -march=amdfam10 -fomit-frame-pointer -mfpmath=sse -profile-use	6.388826 sec	-1.35 %
GCC4.5.1	-O3 -march=amdfam10 -fomit-frame-pointer -mfpmath=sse	6.476186 sec	0.00 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -profile-use	6.527979 sec	0.80 %
Visual_Studio_9	/GS- /fp:fast /O2	6.942938 sec	7.21 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -ffast-math	7.293316 sec	12.62 %
GCC4.5.1	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse	7.372564 sec	13.84 %
GCC4.5.1	-O3 -fomit-frame-pointer -mfpmath=sse	7.661477 sec	18.30 %

Optimization to amdfam10 architecture gives 18% speed increase
–ffast-math gives 5 %
Profile guided optimization gives only 1%.

Total results

Optimization to particular architecture and profile guided optimization may give up to 20 % speed increase.

As I already said LAME is simple example. Let see how performance options affect real video analytic application .

For the second example I used critical part of real video analytic application (myAudience). It uses boost, opencv and ffmpeg libraries. Also it runs in several threads. In comparison with LAME encoder performance measurement for this application was not so simple. Moreover because of inaccuracy of measurements in multithreading dynamic enviroment results were not so well determinate. So I have prepared just one table which shows results in general how I understand them.

Compilation has been done by GCC4.5.2 and GCC4.1.2 on CentOS_5.5

Intel Core i5 (2500) 3.3 GHz

Compiler	Compiler options	Average Time	%
GCC4.5.2	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse -ffast-math -profile-use	19591.16	-4.90 %
GCC4.5.2	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse -profile-use	19873.74	-3.53 %
GCC4.5.2	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse -ffast-math	20010.55	-2.86 %
GCC4.5.2	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse	20410.36	-2.09 %
GCC4.5.2	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -ffast-math	20410.36	-0.92 %
GCC4.1.2	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse -ffast-math	20532.91	-0.33 %
GCC4.5.2	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse	20600.55	0.00 %
GCC4.1.2	-O3 -march=core2 -fomit-frame-pointer -mfpmath=sse	20816.26	1.05 %
GCC4.1.2	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse -ffast-math	21962.44	6.61 %
GCC4.1.2	-O3 -march=prescott -fomit-frame-pointer -mfpmath=sse	22221.88	7.87 %

What can we conclude after that? Few things

GCC4.5.2 little bit faster than GCC4.1.2 plus it allow to use profile guided optimization, and “amdfam10”, “atom” architecture options.
Profile guided optimization give about 4% speed increase.
–ffast-math gives about 2% speed increase

As you can see, tuning compiler options allows to get real improvement in performance, not so huge sometimes, but almost free, so it should be kept in mind.

Rhonda Software

Fine tuning of compiler options to increase application performance

Posted on : 21-03-2011 | By : Alexander Permyakov | In : Uncategorized

2

Comments (2)

Write a comment

Recent Downloads

Recent Posts

Recent Comments

Categories

Archives

Links

tArKi said on 25-03-2011
	I played around with this kind of options, a long time a go. Now, thanks to your post, I’m going to give it a new try. Best regards 😉 Reply

Shervin Emami said on 08-06-2011
	Nice post, some of those settings made a big difference, and its good to see that newer versions of GCC are faster than VS2008 🙂 Reply

Rhonda Software

Fine tuning of compiler options to increase application performance

Posted on : 21-03-2011 | By : Alexander Permyakov | In : Uncategorized

2

Comments (2)

Write a comment

Recent Downloads

Recent Posts

Recent Comments

Categories

Archives

Links

Tags