tcatm's 4-way SSE2 for Linux 32/64-bit 0.3.9 rc2
I hope someone can test an i5 or AMD to check that I built it right. I don’t have either to test with.
I’m also curious if it performs much worse on 32-bit linux vs 64-bit.
I just uploaded a quick build so testers can check if I built it right. (I don’t have an i5 or AMD) If it checks out, I’ll put together the full package and do all the release stuff.
Quote from: tcatm on August 16, 2010, 12:43:39 AMI propose to compile sha256.cpp with -O3 -march=amdfamk10 (will work on 32bit and 64bit) as only CPUs supporting this instruction set (AMD Phenom, Intel i5 and newer) benefit from -4way and it’ll improve performance by ~9%. GCC 4.3.3 doesn’t support -march=amdfamk10. I get: sha256.cpp:1: error: bad value (amdfamk10) for -march= switch
Quote from: NewLibertyStandard on August 16, 2010, 01:49:01 AMWith 4way, I get significantly better performance when I have all my virtual cores enabled. I think I get about the same amount of hashes when hyper threading is turned off with or without 4way. Hey, you may be onto something!
hyperthreading didn’t help before because all the work was in the arithmetic and logic units, which the hyperthreads share.
tcatm’s SSE2 code must be a mix of normal x86 instructions and SSE2 instructions, so while one is doing x86 code, the other can do SSE2.
How much of an improvement do you get with hyperthreading?
Some numbers? What CPU is that?
Quote from: Vasiliev on August 16, 2010, 03:17:07 AMtry -march=amdfam10 That works.
That’s strange… are we sure that’s the same thing? tcatm, try amdfam10 and make sure you get the same speed measurement.