More BitCoin questions

8 messages Satoshi Nakamoto, Mike Hearn December 27, 2010 — January 10, 2011
Mike Hearn December 27, 2010 Source · Permalink

Happy Christmas Satoshi, assuming you celebrate it wherever you are in the world :-)

I have been working on a Java implementation of the simplified payment verification, with an eye to building a client that runs on Android phones. So I’ve been thinking a lot about storage requirements and the scalability of BitCoin, which led to some questions that the paper did not answer (maybe there could be a new version of the paper at some point, as I think aspects of it are now out of date).

Specifically, BitCoin has a variety of magic numbers and neither the code nor the paper explain where they came from. For example, the fact that inflation ceases when 21 million coins have been issued. This number must have been arrived at somehow, but I can’t see how.

Another is the 10 minute block target. I understand this was chosen to allow transactions to propagate through the network. However existing large P2P networks like BGP can propagate new data worldwide in <1 minute.

The final number I’m interested in is the 500kb limit on block sizes. According to Wikipedia, Visa alone processed 62 billion transactions in 2009. Dividing through we get an average of 2000 transactions per second, so peak rate is probably around double that at 4000 transactions/sec. With a ten minute block target, at peak a block might need to contain 2.4 million transactions, which just won’t fit into 500kb. Is this 500kb a temporary limitation that will be slowly removed over time from the official client or something more fundamental?

Satoshi Nakamoto December 29, 2010 Source · Permalink

I have been working on a Java implementation of the simplified payment verification, with an eye to building a client that runs on Android phones. So I’ve been thinking a lot about storage requirements and the scalability of BitCoin, which led to some questions that the paper did not answer (maybe there could be a new version of the paper at some point, as I think aspects of it are now out of date).

The simplified payment verification in the paper imagined you would receive transactions directly, as with sending to IP address which nobody uses, or a node would index all transactions by public key and you could download them like downloading mail from a mail server.

Instead, I think client-only nodes should receive full blocks so they can scan them for their own transactions. They don’t need to store them or index them. For the initial download, they only need to download headers, since there couldn’t be any payments before the first time the program was run (a header download command was added in 0.3.18). From then on, they download full blocks (but only store the headers).

Code for client-only mode is mostly implemented. There’s a feature branch on github with it, also I’m attaching the patch to this message.

Here’s some more about it:

“Here’s my client-mode implementation so far. Client-only mode only records block headers and doesn’t use the tx index. It can’t generate, but it can still send and receive transactions. It’s not fully finished for use by end-users, but it doesn’t matter because it’s a complete no-op if fClient is not enabled. At this point it’s mainly documentation showing the cut-lines for client-only re-implementers.

With fClient=true, I’ve only tested the header-only initial download.

A little background. CBlockIndex contains all the information of the block header, so to operate with headers only, I just maintain the CBlockIndex structure as usual. The nFile/nBlockPos are null, since the full block is not recorded on disk.

The code to gracefully switch between client-mode on/off without deleting blk*.dat in between is not implemented yet. It would mostly be a matter of having non-client LoadBlockIndex ignore block index entries with null block pos. That would make it re-download those as full blocks. Switching back to client-mode is no problem, it doesn’t mind if the full blocks are there.

If the initial block download becomes too long, we’ll want client mode as an option so new users can get running quickly. With graceful switch-off of client mode, they can later turn off client mode and have it download the full blocks if they want to start generating. They should rather just use a getwork miner to join a pool instead.

Client-only re-implementations would not need to implement EvalScript at all, or at most just implement the five ops used by the standard transaction templates.”

Specifically, BitCoin has a variety of magic numbers and neither the code nor the paper explain where they came from. For example, the fact that inflation ceases when 21 million coins have been issued. This number must have been arrived at somehow, but I can’t see how.

Educated guess, and the maths work out to round numbers. I wanted something that would be not too low if it was very popular and not too high if it wasn’t.

Another is the 10 minute block target. I understand this was chosen to allow transactions to propagate through the network. However existing large P2P networks like BGP can propagate new data worldwide in <1 minute.

If propagation is 1 minute, then 10 minutes was a good guess. Then nodes are only losing 10% of their work (1 minute/10 minutes). If the CPU time wasted by latency was a more significant share, there may be weaknesses I haven’t thought of. An attacker would not be affected by latency, since he’s chaining his own blocks, so he would have an advantage. The chain would temporarily fork more often due to latency.

The final number I’m interested in is the 500kb limit on block sizes. According to Wikipedia, Visa alone processed 62 billion transactions in 2009. Dividing through we get an average of 2000 transactions per second, so peak rate is probably around double that at 4000 transactions/sec. With a ten minute block target, at peak a block might need to contain 2.4 million transactions, which just won’t fit into 500kb. Is this 500kb a temporary limitation that will be slowly removed over time from the official client or something more fundamental?

A higher limit can be phased in once we have actual use closer to the limit and make sure it’s working OK.

Eventually when we have client-only implementations, the block chain size won’t matter much. Until then, while all users still have to download the entire block chain to start, it’s nice if we can keep it down to a reasonable size.

With very high transaction volume, network nodes would consolidate and there would be more pooled mining and GPU farms, and users would run client-only. With dev work on optimising and parallelising, it can keep scaling up.

Whatever the current capacity of the software is, it automatically grows at the rate of Moore’s Law, about 60% per year.

Mike Hearn December 30, 2010 Source · Permalink

Thanks for the info.

I reached the same conclusions about client only nodes and this is what I’ve been implementing. I’m nearly there … I have block chain download, parsing and verification of the blocks/transactions done, with creation of spend transactions almost done.

v1 will basically do as you propose, with the possible optimization of storing only the blocks needed to form the block locator (with the exponential thinning). As Android provides local storage that is private to the app, you don’t need to store the entire block chain to be able to accept new blocks … just enough to ensure you can always stay on the longest chain.

By the way, your code is easy to read and has been an invaluable reference. So thanks for that.

In v2 I’m thinking of showing transactions before they are integrated into the block chain by running secure/locked down relay nodes that send messages to the phones when a transaction is accepted into the memory pool. Android provides a secure, low power back channel to every phone. Messages are stored server side if the device is offline and apps are automatically started on the phone to handle incoming messages.

So as long as the relay nodes are unhacked, this system should give enough trust that low value transactions can be shown in the UI immediately. It introduces some centralization/single points of failure, but if the relay mechanism dies or is hacked, the damage only lasts for 10 minutes until the new blocks are downloaded.

Client-only re-implementations would not need to implement EvalScript at all, or at most just implement the five ops used by the standard transaction templates.”

Indeed, there’s no point in client-only implementations implementing EvalScript because they can’t verify transactions aren’t being double spent without storing and indexing the entire block chain. My code parses the scripts and then relies on them having a standard structure, but doesn’t actually run them.

Educated guess, and the maths work out to round numbers. I wanted something that would be not too low if it was very popular and not too high if it wasn’t.

It’d be interesting to see the working for this. In some sense the number of coins is arbitrary as the nanocoin representation means the issuance is so huge it’s practically infinite.

A higher limit can be phased in once we have actual use closer to the limit and make sure it’s working OK.

It’d be worth implementing some kind of more robust auto update mechanism, or a schedule for the phase in of this, if only because when people evaluate “is BitCoin worth my time and effort” a solid plan for scaling up is good to have written down.

I’m not worried about the physical capabilities of the hardware, but more protocol ossification as the app is reimplemented and nodes which don’t auto-update themselves increase in number. Client only reimplementations pose no problems of course, but other systems like SMTP have proven impossible to globally upgrade despite having extension mechanisms built in … just too many implementations and too many installations.

Satoshi Nakamoto January 7, 2011 Source · Permalink

I reached the same conclusions about client only nodes and this is what I’ve been implementing. I’m nearly there … I have block chain download, parsing and verification of the blocks/transactions done, with creation of spend transactions almost done.

That’s great! The first client-only implementation will really start to move things to the next step. Is it going to be open source, or Google proprietary?

Mike Hearn January 7, 2011 Source · Permalink

That’s great! The first client-only implementation will really start to move things to the next step. Is it going to be open source, or Google proprietary?

Open source. It has to be - I am developing it as a personal project in my spare time and Googles policy is that this is only allowed if you open source the results. But I would have done that anyway.

I managed to spend my first coins on the testnet with my app a few days ago, hopefully will get another chance to make progress this weekend. Probably will have something to show publically sometime in Feb, touch wood.

Satoshi Nakamoto January 10, 2011 Source · Permalink

Open source.

Perfect. Once your code shows how to simplify it down, other authors can follow your lead. Client is a less daunting challenge than full implementation. If it’s within reach of more developers, they’ll come up with more polished UI and other things I didn’t think of. I expect the original software will become the industrial old thing used by GPU farms and pool servers.

BTW, later a good feature for a client version is to keep your private keys encrypted and you give your password each time you send.

I managed to spend my first coins on the testnet with my app a few days ago, hopefully will get another chance to make progress this weekend. Probably will have something to show publically sometime in Feb, touch wood.

Great, keep me updated.

I wanted something that would be not too low if it was very popular and not too high if it wasn’t.

It’d be interesting to see the working for this. In some sense the number of coins is arbitrary as the nanocoin representation means the issuance is so huge it’s practically infinite.

It works out to an even 10 minutes per block: 21000000 / (50 BTC * 24hrs * 365days * 4years * 2) = 5.99 blocks/hour

I fudged it to 364.58333 days/year. The halving of 50 BTC to 25 BTC is after 210000 blocks or around 3.9954 years, which is approximate anyway based on the retargeting mechanism’s best effort.

I thought about 100 BTC and 42 million, but 42 million seemed high.

I wanted typical amounts to be in a familiar range. If you’re tossing around 100000 units, it doesn’t feel scarce. The brain is better able to work with numbers from 0.01 to 1000.

If it gets really big, the decimal can move two places and cents become the new coins.

Mike Hearn January 10, 2011 Source · Permalink

Ah, of course, that makes sense.

By the way, if you didn’t see it already, there’s a discussion on the security of secp256k1 on the forum:

http://www.bitcoin.org/smf/index.php?topic=2699.0

Hal (i presume this is Hal Finney) seems to think the curve is at higher risk of attack than random curves. I guess you chose secp256k1 for the mentioned performance improvement?

Satoshi Nakamoto January 10, 2011 Source · Permalink

By the way, if you didn’t see it already, there’s a discussion on the security of secp256k1 on the forum:

http://www.bitcoin.org/smf/index.php?topic=2699.0

Hal (i presume this is Hal Finney)

Yes, it’s him. He was supportive on the Cryptography list and ran one of the first nodes.

seems to think the curve is at higher risk of attack than random curves. I guess you chose secp256k1 for the mentioned performance improvement?

I must admit, this project was 2 years of development before release, and I could only spend so much time on each of the many issues. I found guidance on the recommended size for SHA and RSA, but nothing on ECDSA which was relatively new. I took the recommended key size for RSA and converted to equivalent key size for ECDSA, but then increased it so the whole app could be said to be 256-bit security. I didn’t find anything to recommend a curve type so I just… picked one. Hopefully there is enough key size to make up for any deficiency.

At the time, I was concerned whether the bandwidth and storage sizes would be practical even with ECDSA. RSA’s huge keys were out of the question. Storage and bandwidth seemed tighter back then. I felt the size was either only just becoming practical, or would be soon. When I presented it, I was surprised nobody else was concerned about size, though I was also surprised how many issues they argued, and more surprised that every single one was something I had thought of and solved.

As it turns out, ECDSA verification time may be the greater bottleneck. (In my tests, OpenSSL was taking 3.5ms per ECDSA verify, or about 285 verifies per second) Client versions bypass the problem.

As things have evolved, the number of people who need to run full nodes is less than I originally imagined. The network would be fine with a small number of nodes if processing load becomes heavy.