2018-04-02

Understanding Bitcoin Transactions

transactions? ...all I wanted to do was send some bitcoin

How hard could it be to send some bitcoin from a JavaScript library? I decided to try on the "testnet". I downloaded Bitcoin Core, configured it for the testnet, waited a few hours for it to sync the testnet blockchain, created an "address", and found a "testnet faucet" to send my new address some testnet bitcoin.

So now that the Bitcoin Core GUI client (Bitcoin QT) is showing a balance in my "account" I started looking through the bitcoinjs-lib JavaScript library. I CMD+F for "send", but couldn't find anything so I scroll through the examples and see a bunch of things I'd never heard of and a builder design pattern for something called a "transaction". The builder pattern made me nervous because now it felt like I was crafting something complicated. Instead I told myself: "it's probably like a database transaction for a SQL insert" (which is partially true since mining "commits" the transaction). But after a few hours of failure, error messages that didn't make sense, and no answers from web searches, I gave up because I finally acknowledged I couldn't make sense of it. Whatever mental model I had for how Bitcoin works was woefully inadequate or probably just wrong.

I started slowly reading through the documentation and Andreas Antonopoulos' Mastering Bitcoin (and lots of YouTube).

I quickly realized that there are different kinds of addresses and my wallet had created a SegWit address which required special treatment from the JavaScript library - hence the builder pattern. But that was just the beginning of my misconceptions.

The protocol has spent 9 years evolving towards real world demands (transactions per second, ease-of-use), anonymity, centralization fears, and security requirements so what you see in the wild today is slightly different but backwards compatible (which is incredible) with the original v0.1 2009 release. First I was struggling with the impedance mismatch of my intuition and reality. Then, after reading a random website, Reddit comment or Twitter thread explaining a concept, you find newer documentation or articles explaining why things no longer work that way.

studying bitcoin:

0. have mental model
  1. see new info that breaks mental model
    2. intuit why that is
      3. ahh, OK, that makes sense
        4. adjust mental model
          5. GOTO step 1

Some of my misconceptions:

I've since developed deep respect for wallet software. Modern wallets abstract the details (maybe too many details) of an evolving, backwards compatible protocol into something that looks like a bank account.

Many addresses (many keypairs)

I did read the whitepaper many years ago but I guess my brain synthesized it as "public-key cryptography that solves consensus with untrusted parties using economic incentives". Part of this misconception was that people might use their existing public key they were already using for PGP (not possible) or SSH keypairs (also not possible). You could indeed use Bitcoin with a single keypair (assuming it was ECDSA), but then everything would be trivially traceable. Anyone using modern wallet software will have tens or hundreds of keypairs because the best practice (an attempt at anonymity) is to generate a new keypair per transaction. Your public keys are also Bitcoin addresses (but ripemd160 and sha256 hashed (plus a few extra details) and base 58 encoded).

When someone "sends" bitcoin to you (one of your addresses), your "wallet" software will show a total balance, but this is merely the bank account abstraction. There may be one, tens or hundreds of individual "unspent transaction(s)" all belonging to different keypairs that add up to that total balance. If you're concerned this has the potential to create a lot of leaf nodes, you're correct. Do a web search for "utxo chart" to see how quickly UTXOs are being created.

Transactions, not account balances

To move a bitcoin, you create something called a transaction.

You'll see transactions represented as a hex string, but when deserialized, they look something like:

{
  inputs: [
    { id, index, script }
  ],
  outputs: [
    { script, value }
  ]
}

Note: Any Bitcoin implementation may deserialize the raw hex transaction slightly differently (and in fact most online decoders will show them differently). At time of writing, for SegWit, https://btc.com/tools/tx/decode was the most useful. If any of those links fail, just web search for "bitcoin raw transaction decode".

You can think of Bitcoin as a graph of these transactions. The leaf nodes, also known as "UTXOs" or "unspent transactions" are the amounts available to spend.

To spend a UTXO (leaf node), you create a transaction (node) and add an input (edge) referencing the UTXO (parent node) and finally an input script that satisfies the output script of the UTXO (parent node) and a signature that proves you had the right to spend it.

Wallet software will find all leaf nodes belonging to any of your public keys and sum them up into an "account balance".

Here's roughly what this graph of transactions looks like:

Note: each vertex (node, or circle) is either an input and output for a transaction or just an output, in which case, it's unspent. The rectangles demarcate the transactions. The names within the vertices are akin to public keys. Originally they were public keys in the clear, but now they're a hash of a public key.

Sidebar: Most wallets nudge you towards generating new public and private keys per transaction. This means your wallet may contain multiple keypairs hence why you must backup your wallets regularly (since new transactions may create new keys for change transactions), unless you use a deterministic wallet that spawns keys from a seed.

To build account balances for the image above, we'd traverse the graph until we found all leaf nodes.

Note: most clients keep a separate data structure for those leaf nodes called the "UTXO set"

Why do we have to leave some bitcoin to the miners?

If you don't leave any, they won't include your transaction when creating the next block in the blockchain and therefore your transaction won't ever get committed. It will just float around in the mempool for days. The miners prioritize which transactions to include in every block based on which transactions are the most profitable to include. How much fee should you include? It differs based on how much demand there is in the network. Do a websearch for "calculate bitcoin fee". Most wallet software will calculate this (and hide the fact that it exists) for you. What should you do if you sent a transaction without a fee? Wait it out or "double spend" it again with a bigger fee (using a partially supported protocol called RBP - replace by fee).

If Satoshi only wanted to send 20 of their 25 bitcoins to Alice, why did they send 4.9 to themselves?

The 4.9 amount for Satoshi in transaction 1 is a Bitcoin idiom known as a "change transaction". You can't use part of a UTXO, it's all or nothing. So if you have a UTXO of 25 bitcoin but you only want to send 20 to Alice, you'll create 1 transaction with 2 outputs: 1 output to Alice for 20 bitcoin, and another output for yourself for 4.9. Wait, why not 5 for yourself? Because you need to leave a little for the miner (which does not require a line item in your transaction). Instead of the account balance analogy, use the coin analogy where each transaction "melts" the original inputs (coins) into new outputs (coins).
Note: wallet software abstracts change transactions away from you - but remember to backup your wallet after every send or use a deterministic wallet.

How did we find which leaf nodes (UTXOs) belong to us?

Parse the transaction, and within the outputs, look at the script field and find either one of our many public keys or a hash of our public keys. See how Bitcoin core does it. Notice how it's not as simple as looking for a public key (the majority of transactions specify a hash of a public key in the script field).

...wait a second, why is there more than 1 way to do it? Why isn't there a standard? Why is there a "script" in the first place? Shouldn't there just be a public key of the recipient? These were the nagging questions (assumptions?) in the back of my head as I stumbled through the tutorials and documentation.

Script

Scripts control how bitcoin is transferred. Every transaction's outputs and inputs need some code from the scripting language on how to enforce the transfer unless you don't care who gets the bitcoin. Fortunately, the Bitcoin miners ignore all but a few scripts, so practically speaking there are only a few things you can put in there.

What was non-intuitive to me was why there was a scripting language in the first place, why wasn't a cryptographic signature sufficient? The short answer is flexibility and extensibility. e.g. multi sig, funds available after a certain time.

In the majority of non-SegWit Bitcoin transactions, you'll see a script in the output that looks like:

OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG

which means: whoever wants to redeem this bitcoin must create a transaction with a script in the input that has the public key that can be hashed to match this hardcoded hash and a signature that can be built from that public key and the hash of this (yet) unspent transaction. That input script will look like:

<sig> <pubKey>

When Bitcoin starts processing/verifying a transaction it will first execute the input script, then it will execute the output script that the input refers to.

So, all together, the non-SegWit execution stack will look like:

<sig> <pubKey> OP_DUP OP_HASH160 <pubKeyHash> OP_EQUALVERIFY OP_CHECKSIG

See Andreas Antonopoulos annotate this process on a whiteboard here.

SegWit

But there's something new (August 24, 2017) in the Bitcoin protocol called SegWit. It's a different way of structuring a transaction but amazingly still compatible with older Bitcoin nodes.

In the SegWit Bitcoin transactions, you'll see a script in the output that looks like:

OP_HASH160 <hash160ofAnyScriptButProbablyPublicKey> OP_EQUAL

wait... wha?? There's no OP_CHECKSIG! Doesn't that mean that I just need to know their public key (and then hash it) to redeem this? Yes! Unless the Bitcoin node that's doing the verification has SegWit support (which is pratically everyone), in which case it expects the redeeming transaction to have a witness field in the input and then automatically knows to OP_CHECKSIG. (I think this explains why SegWit was hardcoded not to activate until 95% of the miners had signaled support for it.)

This is how SegWit maintains backwards compatibility with older nodes. The older nodes don't know there's a world beyond the script field so they just run the script only (which, again, doesn't include an OP_CHECKSIG).

So OP_CHECKSIG in SegWit transactions appear to be implied when the Bitcoin node detects a SegWit transaction.

Also see Peter Rizun discuss the incentives around attacking SegWit.

All together, the SegWit execution stack will look like:

<someScriptButProbablyPublicKey> OP_HASH160 <hash160ofAnyScriptButProbablyPublicKey> OP_EQUAL <sig> <pubKey> OP_CHECKSIG

Some Standard Bitcoin scripts

the bolded scripts are what was mentioned in this post

In conclusion, how do we "send" bitoin?

Someone who wants to send Bitcoin must:

  1. Find UTXOs with an output script they can satisfy (unlock)
  2. Create a transaction with inputs to satisfy the output of the UTXO they're spending
  3. Serialize the transaction into hex
  4. Broadcast the transaction (paste the hex into something like: https://blockchain.info/pushtx)

Other resources

Hosted on Roast.io