So here's some post-mortem, I think an event like this needs some examination.
First, huge thanks to @undeath for not only reading his logs, but actually switching his brain on and noticing this; as he expressed himself, it's pretty surprising that no one else noticed since it went into a release in August last year.
Second, a big apology from me for introducing the bug. I guess I'll start by explaining what I remember about that code edit.
The change is from this highlighted line in the commit, down to the return at the end of that function:
(if that doesn't auto-link it's line 507 of taker.py in that huge commit diff).
The commit as a whole is replacing the preexisting Joinmarket custom bitcoin backend with the python-bitcointx package (a fork of python-bitcoinlib for those who remember that).
This changed all code relating to bitcoin transactions, and that was the purpose of these edits.
If you look carefully, however, you will notice there that in addition to syntax changes for new bitcoin transaction code, there is one additional change: instead of having a variable `tx`, which is in the return, and a separate `self.latest-tx` variable, there is, after the edit, only `self.latest_tx`.
My memory is hazy, but I believe my thought process was just 'oh that's an unnecessary duplication; probably from earlier refactoring; get rid of it'.
(undeath told me he thought, in review of the PR, it looked like a good code cleanup at the time too; but more on review in a moment).
So this was the error - as a result, the "local copy" with watermarks is sent instead of the un-watermarked copy.
A few notes on testing:
Joinmarket actually has a pretty large test suite (albeit CI testing has stopped working a while ago sadly, and no one has yet got around to setting it up again), which we run all the time.
But it has big limitations (4/n)
First, some of it is old and not updated much for long periods (it was a *huge* amount of work to originally create it, for one person (me)). It definitely needs attention.
But: most of it is focused on a primary safety goal: ensure the coins move from the intended place to the other intended place, and are not lost or given away.
So the tests are much more about *functionality* and much less about *privacy* even though the latter is the distinguishing element of what this application is.
Thus, while the tests on that huge PR succeeded in checking that all bitcoin transactions and other features *functioned* as intended (for example i could do entire tumbler runs on a simulated regtest-based joinmarket test pit), those same tests told me absolutely nothing about the information being passed from the taker to the maker.
It could only have been noticed by-eye, by reading logs.
I think this is really tricky, in general. How can tests ensure that information is not leaked?
I'll leave that question hanging, interested to hear opinions/thoughts.
But on the question of the PR itself:
I think this is a partial cause of the issue. Ideally we would make only small, focused commits that probably should be merged separately.
It's really hard sometimes to do that. I was just looking at the lnd codebase yesterday and noticed a similarly huge refactoring PR consisting of thousands of lines of code.
Notice this comment I made at the time: https://github.com/JoinMarket-Org/joinmarket-clientserver/pull/536#issuecomment-599068162
quote: "I realise that code review is a bit daunting but you could always just review a small part"
- while there might be rational reasons to make such a large pull request, it's really setting a high bar to expect people to read all that code.
All the same, people did - and there is no guarantee they would magically have found this bug if the edit were smaller; it's just, a lot more likely, I think.
A final, fairly minor observation about this case relates to the strange '0xdeadbeef' string.
I remember way back in Dec2014/Jan2015, which is when @belcher wrote the first version of the code, and when I first read it, that this was slightly confusing, it is of course in some vague sense 'hacky', but functionally it worked fine through many different versions of the code. The patch by undeath removes it; we agreed offline that it's best to do so to make the code easier to reason about.
Making code clean and easy to reason about is always a priority, but the inertia/conservatism of not wanting to change security-critical code counteracts it. That tension is in play here.
So in conclusion, I would say lessons learned are:
(1) *strongly* avoid PRs that have a very huge code change
(2) be stricter in avoiding 'pollution' of changes with other unrelated changes
(3) investigate how testing can catch privacy failures (I don't really know at the moment).
@waxwing organising commits takes a lot of my attention and time tbh...it's surprisingly hard to get right (at least, to my eyes) and seems to matter a lot to the success of the patch
@waxwing when i make small focused commits, i make sure each one is hygienic (builds cleanly, tests pass) and ideally is stand-alone. it takes much more time.
at the same time, doing so often attracts "can you squash your commits" comments or scares off reviewers (too many commits looks like more work, and sometimes is indeed more annoying to review rather than easier).
i find it hard to get the balance right
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!