rieMiner - Solo + pooled Riecoin mining

Riecoin mining software & pools
Post Reply
Pttn
Posts: 131
Joined: 24 Aug 2018, 13:37

Re: rieMiner - Solo + pooled Riecoin mining

Post by Pttn » 22 Nov 2018, 10:17

Rockhawk wrote:
21 Nov 2018, 23:57
On the offsets, I realised there was a small optimization if the difference between the offsets was a constant, so I've found 2 groups of 4 offsets with constant difference that work.
Any possible explanation on why/how it would be better to use such offsets?
rieMiner - Riecoin solo + pooled miner
Personal Riecoin page (links, download,...)
freebitco.in - earn up to $200 in BTC each hour!

Rockhawk
Posts: 48
Joined: 29 Oct 2018, 21:12

Re: rieMiner - Solo + pooled Riecoin mining

Post by Rockhawk » 22 Nov 2018, 15:25

Pttn wrote:
22 Nov 2018, 10:17
Any possible explanation on why/how it would be better to use such offsets?
Yes, sorry. The maths behind the first part of the sieving is to solve k.n# + X + {tuple constallation offsets} = 0 mod p, for each prime p > n and < Sieve max, where n# is the primorial and X = (2^265+S).2^(D-265) + n# - ((2^265+S).2^(D-265) % n#) + offset, where S is the hash and D is the difficulty.

X is in that form so that X = offset mod n#, and offset is chosen so that offset + each of the tuple constellation offsets are not divisible by any of the primes in the primorial. Therefore, k.n# + X + {tuple constallation offsets} are guaranteed not to be divisible by any primes in the primorial.

To solve k.n# + x = 0 mod p, you need to find n#^-1 mod p (this is relatively expensive, but is precomputed in the init method) and x mod p (this is fairly expensive, but is now made a bit cheaper by precomputing 2^64/p so that it can be calculated using only multiplications), finally, you multiply them to get k_1 = -x.n#^-1 mod p, that gives you the first k for which k.n# + X is divisible by p. (We record this k_1, and then in the sieving step we will remove all the candidates with k = k_1 + m.p, for all integer m such that k < sieve max.)

The above gives you the k to eliminate for the first element in the tuple, for each subsequent element you need to compute k_n = -(x + tupleOffset_n).n#^-1 mod p, but this is just k_n = k_1 - (tupleOffset_n).n#^-1, and all the offsets are small so you can calculate it by using shifts and adds of n#^-1, instead of having to do another full multiply mod p.

That gives you the computation for the first tuple, now we are considering a second tuple, with it's k_1 at some offset from the previous tuple's k_1. We need to do a full multiply mod p to find (offset difference).n#^-1 in order to subtract that to get the new k_1.

However, if we're going to consider more than two tuples, and we can find a set of tuples with fixed difference between them, we can store the result of (offset difference).n#^-1 and just subtract it again to move between the subsequent tuples.

Hope that maths makes sense!

Rockhawk
Posts: 48
Joined: 29 Oct 2018, 21:12

Re: rieMiner - Solo + pooled Riecoin mining

Post by Rockhawk » 23 Nov 2018, 12:33

I made some tweaks to ensure that we were taking advantage of the constant difference offsets as much as possible, and submitted the pull request. I'll be away for the next few days so won't be able to get back to you immediately if you have comments/questions.

After this, my next thoughts for improvements are:
- Look at reducing the amount of precomputed data without impacting performance too much. Possibly using AVX/AVX2 instructions will allow any performance loss to be regained.
- Look at options for an AVX2/AVX512 implementation of the primality test - I have an AVX512 Fermat test implementation (not integrated with rieMiner) that is faster than GMP, but obviously most machines don't have AVX512 so I don't want to add this unless I can get a perf improvement with the more generally available AVX2.
- Look at an OpenCL implementation of the primality test for mixed CPU/GPU mining (see the OpenCL branch in my repo for a first go at this, although it needs merging to pick up all the recent changes - the sieve change should help considerably).

Pttn
Posts: 131
Joined: 24 Aug 2018, 13:37

Re: rieMiner - Solo + pooled Riecoin mining

Post by Pttn » 24 Nov 2018, 02:51

Great job as usual, merged, thank you so much! And thank you for the explanation. In practice, there indeed seem to be a very minor but measurable performance increase if using such offsets instead of the previous {16057, 19417, 43777, 1091257, 1615837, 1954357, 2822707, 2839927}.

For me, you can use AVX as you want, I will still merge such code if there is a performance increase. Anything that works on Sandy Bridge (2011) or more recent is good to merge; it does not make much sense to mine on older than that anyway.

Also, I plan to make a separate branch without any assembly optimization, precomputed data, nor external code, that will only be updated with every stable releases. So it would even be possible to mine on older computers and different architectures.

Now, there will be a freeze period where I will not add any new feature nor merge anything. I will only do a full code review and refactoring, and testing until ~December 2, the date where I will release the first stable version. Note that I might be absent during very long times from half December to end of January.
rieMiner - Riecoin solo + pooled miner
Personal Riecoin page (links, download,...)
freebitco.in - earn up to $200 in BTC each hour!

Pttn
Posts: 131
Joined: 24 Aug 2018, 13:37

Re: rieMiner - Solo + pooled Riecoin mining

Post by Pttn » 24 Nov 2018, 19:18

While correcting a bug:

Code: Select all

[0000:00:00] Started mining at block 992895Block timing: 32908806, 27754880, 100272744  Tests out: 0, 0
, difficulty 1359
[0000:00:10] 5-tuple found
[0000:00:10] (1-3t/s) = (479.7 19.71 1.160) ; (2-6t) = (119 7 1 1 0) | 4.94 h
[0000:00:20] (1-3t/s) = (411.1 18.07 0.686) ; (2-6t) = (290 11 1 1 0) | 4.11 h
Sieve 1 Time: 22752139
Sieve 0 Time: 24717766
Min work outstanding during sieving: 0
Work target before starting next block now: 1536
Block timing: 36647619, 47469905, 298297726  Tests out: 1087, 0
Sieve 2 Time: 24808618
[0000:00:30] (1-3t/s) = (396.6 16.58 0.614) ; (2-6t) = (432 16 1 1 0) | 5.49 h
[0000:00:35] 4-tuple found
[0000:00:40] (1-3t/s) = (377.6 15.74 0.526) ; (2-6t) = (568 19 2 1 0) | 5.84 h
[0000:00:50] 6-tuple found, this is a block!
Sent: {"method": "submitblock", "params": ["00000020cc859e93e78e17234f8c7ffa51e67da737a26095237bac43f11dd1e3e7ece0888a4880aa5ddf42d9faaa713b7d370ce4c99ca2a894b65842a5c1716de4a51cd5004f0502a376f95b00000000f5aaaf9f81e899f0884f59429b85298daedcbe7204f91dfe7a307e50a4e97f270101000000010000000000000000000000000000000000000000000000000000000000000000ffffffff100380260f7269654d696e65726d01d434ffffffff0100f90295000000001976a914a0524c39408c405357010881226da2c6c467e72488ac00000000"], "id": 0}
Submission accepted :D !
[0000:00:50] Blockheight = 992896, average 45.8 s, difficulty = 1359
Sieve 2 Time: 15959469
Achieved: find a block in less than 1 minute :D ! I still have to get such luck for the Superblock...

At some point, a bug was introduced, which made rieMiner continue mining (weirdly with only half of threads) if a disconnection occurred. This is now fixed, and a Debug Mode option was added to avoid having to comment/uncomment messages. Do not forget to test this case when introducing threading changes.

Also Rockhawk, are your optimizations needing at least 2 threads? I just tested and nothing happens if I use only 1 Thread.
This is Ok as we can always use 2 Threads instead, even with a 1 thread CPU, but still.

Maybe I should make a checklist in the contributing section. I will extend testing to ~December 9.
rieMiner - Riecoin solo + pooled miner
Personal Riecoin page (links, download,...)
freebitco.in - earn up to $200 in BTC each hour!

Rockhawk
Posts: 48
Joined: 29 Oct 2018, 21:12

Re: rieMiner - Solo + pooled Riecoin mining

Post by Rockhawk » 26 Nov 2018, 17:24

Pttn wrote:
24 Nov 2018, 19:18

Code: Select all

[0000:00:50] 6-tuple found, this is a block!
Achieved: find a block in less than 1 minute :D ! I still have to get such luck for the Superblock...
Impressive!
Pttn wrote:
24 Nov 2018, 19:18
At some point, a bug was introduced, which made rieMiner continue mining (weirdly with only half of threads) if a disconnection occurred. This is now fixed, and a Debug Mode option was added to avoid having to comment/uncomment messages. Do not forget to test this case when introducing threading changes.
I'm not quite sure when this was broken. I had tested disconnection a while back but not with the most recent changes - I'll remember to check that in future when making significant changes!
Pttn wrote:
24 Nov 2018, 19:18
Also Rockhawk, are your optimizations needing at least 2 threads? I just tested and nothing happens if I use only 1 Thread.
This is Ok as we can always use 2 Threads instead, even with a 1 thread CPU, but still.
Ah, yes - now that the sieve thread generates verification tests directly rather than going via the "main" thread, there need to be 2 threads otherwise the work queue can fill up before the sieve job finishes and it will just lock up, as you say. I'll consider whether this could be handled better, but easiest for now will be to make the minimum threads 2 and not allow sieve workers to be set higher than threads minus 1. On the plus side the "main" thread is now almost completely idle - previously it was doing real work a significant fraction of the time, which meant CPU use would be higher than the number of threads you specified if you didn't want to use all the CPU threads you have.

On AVX etc, my idea here would be to make basic AVX a minimum, but using anything introduced after Sandy Bridge would first check on CPUID flags and have fall back implementations in place. I also like the idea of having a pure C++ branch that would work on any architecture.

Finally, thank you for the generous, mathematical constant related donation!

Rockhawk
Posts: 48
Joined: 29 Oct 2018, 21:12

Re: rieMiner - Solo + pooled Riecoin mining

Post by Rockhawk » 01 Dec 2018, 20:53

I have completed a new AVX implementation of the remainder calculation.

Sieving to 2^32, I see a ~5% performance improvement in the remainder calculation, giving a barely noticeable improvement to overall performance, but this uses ~4.5GB less RAM. Sieving beyond 2^32 is slightly slower than before but the memory saving makes it possible to go to 2^33 with 16GB RAM.

The code autodetects whether you have AVX and falls back to the non-AVX implementation if the CPU doesn't support it. This is mostly to prove that system works as I'm considering AVX2 specific changes in future.

It would be possible to make the sieving slightly faster by removing the memory saving, but I believe you're generally better using this reduced memory usage version and sieving further to get a better ratio.

Pttn, the feature-reduce-memuse branch has these changes in and is ready to merge from my point of view, but I haven't submitted a pull request as I know you're stablizing master ready for a first release, and this is quite a big change. Let me know what you want me to do - maybe it's time to branch the stable version off and then this can be merged to master? Or you could create a dev branch for new work.

Pttn
Posts: 131
Joined: 24 Aug 2018, 13:37

Re: rieMiner - Solo + pooled Riecoin mining

Post by Pttn » 02 Dec 2018, 14:47

Thank you a lot!

It is a shame that this memory save comes with a significant loss of performance, but you are right, we can set Sieve much higher to compensate. For Difficulty 1600, I got better results with Sieve 2^33 over Sieve 2^32, which was itself significantly better than Sieve 2^31 (I could barely run more than 2^32 before). For current Difficulties, such high Sieve will only give very minor improvements, or even decrease the performance (probably because there is a lot of CPU usage blips, which are each huge).

With Sieve 2^34, these blips are just catastrophic for Difficulty 1400, but there might still be a slight performance increase for 1600 (not tested enough). In the other hand, this setting seems good for Difficulty 1800, though more testing is also needed. As I aim for long term, focusing for Difficulties 1600 and more to anticipate Riecoin getting decent mining power again, this is fine. 32 GiB of RAM is not enough for 2^35.

For equal Sieves, there seem to be a negligible improvement thanks to AVX, but I can now set Sieve at 2^31 as default again and run this on 8 GiB computers. This reason is actually enough for me to include your code for Stable Release, so I merged your code. I will extend again the testing period to December 16, but now it is final, rieMiner 0.9 will be released this date. One minor thing though, what is the purpose of -L/usr/local/lib in the MakeFile?

You and everyone else are welcome to test rieMiner in various situations and report/fix bugs. Once, I got a Segmentation Fault some time after that the miner restarted itself because I closed the wallet, but I do not know if this is reproducible or really related to the restart... These bugs are annoying to debug! This should come from my fix that fixed the activity while disconnected, I will try mining with Gdb and hope that I will catch this bug... Else, it will haunt me...

I prefer to keep things simple and not create a Develop branch. It is enough to test if some new feature is good and not introducing bugs, and then add it directly.

I wonder if rieMiner is as efficient as software used to beat world records. With my 2700X, I would need a few years to beat the current record (Difficulty ~3450), but at least not centuries. Any idea what are record holders using, are their software performing much better than rieMiner? Did you even develop some of them as you seem to have so much knowledge in this field? If these people could use rieMiner Benchmark Mode instead for new records, we can gain some exposure!

What will your ISPC branch provide? If it completes the AVX/AVX2/AVX512 implementation, I can gladly wait up to Wednesday for a final merge. Unfortunately, I do not have any way to test AVX512, so I will assume that you did enough testing.
rieMiner - Riecoin solo + pooled miner
Personal Riecoin page (links, download,...)
freebitco.in - earn up to $200 in BTC each hour!

Rockhawk
Posts: 48
Joined: 29 Oct 2018, 21:12

Re: rieMiner - Solo + pooled Riecoin mining

Post by Rockhawk » 03 Dec 2018, 00:55

Pttn wrote:
02 Dec 2018, 14:47
It is a shame that this memory save comes with a significant loss of performance, but you are right, we can set Sieve much higher to compensate. For Difficulty 1600, I got better results with Sieve 2^33 over Sieve 2^32, which was itself significantly better than Sieve 2^31 (I could barely run more than 2^32 before). For current Difficulties, such high Sieve will only give very minor improvements, or even decrease the performance (probably because there is a lot of CPU usage blips, which are each huge).

With Sieve 2^34, these blips are just catastrophic for Difficulty 1400, but there might still be a slight performance increase for 1600 (not tested enough). In the other hand, this setting seems good for Difficulty 1800, though more testing is also needed. As I aim for long term, focusing for Difficulties 1600 and more to anticipate Riecoin getting decent mining power again, this is fine. 32 GiB of RAM is not enough for 2^35.
I've only tested to 2^33 and didn't notice CPU blips, although I might have missed them. Do you have enough sieve workers to keep the queue full? Or do you just mean when the block switches when doing actual mining?
Pttn wrote:
02 Dec 2018, 14:47
For equal Sieves, there seem to be a negligible improvement thanks to AVX, but I can now set Sieve at 2^31 as default again and run this on 8 GiB computers. This reason is actually enough for me to include your code for Stable Release, so I merged your code. I will extend again the testing period to December 16, but now it is final, rieMiner 0.9 will be released this date. One minor thing though, what is the purpose of -L/usr/local/lib in the MakeFile?
Great! Yes the main benefit is reducing memory usage with no perf loss, up to 2^32 at least.

The -L/usr/local/lib was committed by mistake but it might actually hehlp other people too. I have a native compiled version of gmp in /usr/local/lib which speeds things up a bit compared to the default Ubuntu one.
Pttn wrote:
02 Dec 2018, 14:47
I wonder if rieMiner is as efficient as software used to beat world records. With my 2700X, I would need a few years to beat the current record (Difficulty ~3450), but at least not centuries. Any idea what are record holders using, are their software performing much better than rieMiner? Did you even develop some of them as you seem to have so much knowledge in this field? If these people could use rieMiner Benchmark Mode instead for new records, we can gain some exposure!
Nearly 20 years ago I wrote "APSieve" which is basically a generalised version of the sieving part of the riecoin miner. I was surprised to see that some relatively recent k-tuplet records still credit it, including the top 6-tuplet! I don't think I even have the source for that any more, and I'm certain that the code in rieMiner is faster. That said, being able to sieve more deeply before testing would be necessary to be competitive for a record. This can be done with less memory when you only have one "block" - rieMiner is structured a bit differently because it needs to start testing primes within seconds not days or weeks!

I actually found the riecoin project from the k-tuplet page as I was thinking of developing a new sieve program!
Pttn wrote:
02 Dec 2018, 14:47
What will your ISPC branch provide? If it completes the AVX/AVX2/AVX512 implementation, I can gladly wait up to Wednesday for a final merge. Unfortunately, I do not have any way to test AVX512, so I will assume that you did enough testing.
The ISPC branch gives AVX2 and AVX512 versions of the primality test, these do 16 tests at a time using a SPMD (single program multiple data) approach. The test is written in ISPC which is a C-like language for creating SPMD programs for x64, much like OpenCL or CUDA is for the GPU. Unfortuately you'll need the trunk version of ISPC and LLVM to compile the ispc code, as there was a bug that significantly affected performance that was fixed after I reported it a couple of weeks ago. I'm committing the generated assembler so that you don't have to install ISPC to build the miner, and also I've made a couple of hand edits to the output to improve the performance of the innermost loop.

The AVX2 version is a little faster than the standard GMP implementation, and the AVX512 one is significantly faster - although obviously AVX512 isn't available on most CPUs yet. I've been testing on an EC2 instance which only costs a few cents an hour which is fine for development. I haven't tested AVX512 on Windows, but it should work(!)

I also added an AVX2 version of the remainder calculation to that branch. The branch is close to ready but I'd prefer to do some more testing before merge. I think it could be good to go by Wednesday depending how busy I am this week.

tgspring
Posts: 7
Joined: 17 Sep 2018, 01:15

Re: rieMiner - Solo + pooled Riecoin mining

Post by tgspring » 03 Dec 2018, 02:43

@Pttn
Are there any wrong with your 0.9RC3 when make using MSYS2?

Post Reply