tcatm's 4-way SSE2 for Linux 32/64-bit is in 0.3.10

7 messages Satoshi Nakamoto August 15, 2010 — August 28, 2010
Satoshi Nakamoto August 15, 2010 Source · Permalink

0.3.10 has tcatm’s 4-way SSE2 as an option switch.

Use the switch “-4way” to turn it on.  Without the switch you get Crypto++ ASM SHA-256.

I could only get this working with Linux.

Download: Get 0.3.10 from http://bitcointalk.org/index.php?topic=827.0

Please report back your CPU and results!  I think it’s pretty clear that Core 2 and lower are slower, i5 faster.  I don’t think we’ve heard any i7 results yet.  We need to know about the different models of AMD or other less common CPUs.

Satoshi Nakamoto August 16, 2010 Source · Permalink

Quote from: jgarzik on August 16, 2010, 03:35:28 AMCode:cpu family : 6 model : 26 model name : Genuine Intel(R) CPU             000  @ 3.20GHz stepping : 4cpu family 6 model 26 stepping 4 is an Intel Core i7. That’s a 23% speedup with -4way, 63% total speedup with -4way + hyperthreading. 33% faster with hyperthreading than without it.

Satoshi Nakamoto August 16, 2010 Source · Permalink

I wrapped sha256.cpp in #ifdef FOURWAYSSE2 #endif // FOURWAYSSE2

try it now.

Satoshi Nakamoto August 19, 2010 Source · Permalink

Quote from: Ground Loop on August 18, 2010, 11:14:26 PMAny non-Mac i5 love? Windows i5 64-bit got slower here. That’s the first I’ve heard anyone say i5 was slower.  Everyone else has said 4way was faster on i5.  Moreso with hyperthreading enabled.

Quote from: nelisky on August 18, 2010, 11:02:25 PMAnd i5, at least on my macbookpro Good, so I take it that’s a confirmation that it’s working on Mac as well?

Laszlo told me he did compile in the -4way stuff on Mac, so the -4way switch is also available to try on Mac.  I don’t think makefile.osx on SVN has it yet, just the built version.

Satoshi Nakamoto August 22, 2010 Source · Permalink

Thanks for clearing that up.  I read the link someone posted about AMD making that change around 2007, but I didn’t know what the story was for Intel.

There’s no hope for Core/Core2 then.  They only have half the SSE2 hardware.

Strange that Intel has 3 128bit units, but AMD with 2 128bit units is the faster one.

Satoshi Nakamoto August 24, 2010 Source · Permalink

Quote from: ArtForz on August 21, 2010, 04:56:31 PM- AMD K10: 2 128bit units

  • intel nehalem: 3 128bit units This probably explains why hyperthreading increases performance with -4way.  If three SSE2 units is excessive, then hyperthreading would help keep them all busy.
Satoshi Nakamoto August 28, 2010 Source · Permalink

The simplification is intentional.  There will only be more than one thash[7]=0 in one out of 134,217,728 cases.  It only makes it 0.0000007% slower.