20110331, 12:21  #1 
May 2009
Russia, Moscow
2×3^{2}×149 Posts 
GMP 5.0.1 vs GMP 4.1.4 benchmarking
I compared two GMPECM 6.3 builds under Linux. One compiled with GMP 5.0.1 and another with GMP 4.1.4
I got several strange results. In overall GMP 5.0.1 is better by 515% but with B1=11e6 with some ranges (tested 100300digits) 4.1.4 was better. Some examples follows. Code:
1. C121 from nearrepdigits GMPECM 6.3 [configured with GMP 4.1.4 and enableasmredc] [ECM] Input number is 1800485013924273616277080302416213714297702488568072032612888194660755338496630976045963259724803581322873645120627538429 (121 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=334640802 Step 1 took 36869ms Step 2 took 19737ms GMPECM 6.3 [configured with GMP 5.0.1 and enableasmredc] [ECM] Input number is 1800485013924273616277080302416213714297702488568072032612888194660755338496630976045963259724803581322873645120627538429 (121 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2340904304 Step 1 took 35097ms Step 2 took 33626ms GMP 5.0.1 is significantly slower again on step 2. 2. C156 from aliquot seq 283752:i7004 GMPECM 6.3 [configured with GMP 4.1.4 and enableasmredc] [ECM] Input number is 150334450606011724019777200211010468220565590046299234402254345532711750018652367487259651931850319063498312781804011647293058067263942651704486104870980321 (156 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=4153245810 Step 1 took 55526ms Step 2 took 26975ms GMPECM 6.3 [configured with GMP 5.0.1 and enableasmredc] [ECM] Input number is 150334450606011724019777200211010468220565590046299234402254345532711750018652367487259651931850319063498312781804011647293058067263942651704486104870980321 (156 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2955949299 Step 1 took 57614ms Step 2 took 39257ms Again step 2 with GMP 5.0.1 is much slower. 3. C209 from nearrepdigits GMPECM 6.3 [configured with GMP 4.1.4 and enableasmredc] [ECM] Input number is 99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999899999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (209 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=2560444052 Step 1 took 75055ms Step 2 took 36402ms GMPECM 6.3 [configured with GMP 5.0.1 and enableasmredc] [ECM] Input number is 99999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999899999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 (209 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=3908589128 Step 1 took 76103ms Step 2 took 46634ms Step 2 with GMP 5.0.1 is slower by 10sec. With B1=3e6 all is OK  5.0.1 is slightly better than 4.1.4 1. C121 Step 1 took 9562ms Step 2 took 4803ms vs. Step 1 took 10009ms Step 2 took 6219ms 2. C156 Step 1 took 15440ms Step 2 took 6315ms vs. Step 1 took 15102ms Step 2 took 8532ms 3. C209 Step 1 took 20846ms Step 2 took 8188ms vs. Step 1 took 20306ms Step 2 took 11598ms Compile options: enableopenmp withgmp=/usr/local/ enableshellcmd enablesse2 enableasmredc Test system: Xeon E5620 2.40GHz Centos 5.5 x86_64 on 2.6.18 kernel Last fiddled with by unconnected on 20110331 at 12:23 
20110401, 02:04  #2 
Sep 2008
Krefeld, Germany
E6_{16} Posts 
Thats exactly what I figured out some time ago. Especially on step 2 GMP 4.x is a lot faster  and I have no idea why.
The fastest combination for my Phenom 2 1090T is GMP 4.3.2 combined with GMPECM 6.3, all compiled with march=barcelona and, of cause, linked statically. For large numbers > ~ 400 digits linking against gwnum gave a huge speedup. Table attached: All times in ms, mesaured on Phenom 2, 3.6Ghz, Linux kernel 2.6.35, 64 bit Last fiddled with by Syd on 20110401 at 02:10 
20110401, 12:40  #3  
May 2009
Russia, Moscow
2×3^{2}×149 Posts 
Quote:
I decided to recomplile binaries from scratch and there are some questions again. Why ecmparams.h.athlon64 is used instead of ecmparams.h.core2 ? Why SSE2 instructions were not used in NTT code? Code:
config.status: linking ecmparams.h.athlon64 to ecmparams.h config.status: linking mul_fftparams.h.athlon64 to mul_fftparams.h config.status: executing depfiles commands config.status: executing libtool commands configure: Configuration: configure: Build for host type x86_64unknownlinuxgnu configure: CC=gcc std=gnu99, CFLAGS=W Wall Wundef O2 pedantic m64 mtune=core2 march=core2 configure: Linking GMP with /usr/local//lib/libgmp.a configure: Using asm redc code from directory x86_64 configure: Not using SSE2 instructions in NTT code 

20110401, 16:12  #4 
Tribal Bullet
Oct 2004
3×1,181 Posts 

20110402, 16:48  #5  
Einyen
Dec 2003
Denmark
2·3·13·41 Posts 
How are you compiling GMP 4.3.2 for 64bit?
I get this error: Quote:
./configure CC=gcc CFLAGS="O2 pedantic m64 std=gnu99 mtune=core2 march=core2" ABI=64 build=x86_64w64mingw32 I also tried just: ./configure ABI=64 and variations. I read on GMP website: "Gcc 4.3.2 miscompiles GMP on 64bit machines", but I'm using gcc 4.6.0. Last fiddled with by ATH on 20110402 at 16:49 

20110403, 16:16  #6 
Einyen
Dec 2003
Denmark
2·3·13·41 Posts 
Here is my 32bit test of GMP 4.3.2 vs 5.0.1 and MPIR: gmp4test.html
I can't see the effect you describe. On a core2 the GMP 4.3.2 binary is alot slower than both GMP 5.0.1 and MPIR 2.3.0/2.2.1. On a pentium4 its only slightly slower than GMP 5.0.1 and faster than MPIR. If you have a link to GMP 4.1.4 I'm willing to test it. 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Looking for benchmarking help with a Phenom or PhenomII X6  mrolle  Software  25  20120314 14:15 
Benchmarking dualCPU machines  garo  Software  2  20100927 20:33 
Benchmarking suite discussion  Mystwalker  GMPECM  7  20060611 10:08 
Benchmarking problem with Prime95  jasong  Factoring  6  20060323 05:12 
Benchmarking challenge!  Xyzzy  Software  17  20030826 15:43 