If you’re writing software that works across more than one computer, you often run into situations where you want to sync things. I certainly did, and I came up with a way of thinking about sync that I want to share.
Sync Can Be Easy
Some scenarios are actually easy to sync. For example, if you have associations between any two classes of things, like Contacts and Labels. Imagine a spreadsheet where Contacts are rows and Labels are columns. In each cell, you store a 0 if there is no association, or a 1 if there is.
Let’s also assume that at some point the spreadsheets on both computers have been synchronized. Since that point, they have diverged: some 0s have changed to 1s, and some 1s have changed to 0s. New rows or columns can be thought of as putting 1s where “virtual” 0s were previously in the synced spreadsheet, and deleted rows or columns are thought of a putting 0s in the corresponding cells.
Now, to sync, all you have to do is find out the cells that have changed on either side since the sync, and use the new values. You can actually represent all the cells as bits in a long string and do the sync in just four bitwise operations on all the bits in parallel:
sync = ((original xor versionA) or (original xor versionB)) xor original
And just like that, all the new changes have been incorporated. I’m sure this has been rediscovered again and again by various people, but since I haven’t seen it described anywhere, I’m writing it up here and calling it the Binary Sync.
The binary sync is easy to implement and understand. It works because whenever there is a change from the original, it either happens in one or both copies, there are no other possibilities. It also works for syncing N devices at the same time. However, to use this sync, each device needs to store a copy of the last sync result that was performed.
Although the sync produces a useful result, there is a caveat. The result depends on when you sync. If since the last sync, one device makes a change and reverts it, and the other makes the same change but don’t revert it, then the sync reflects the change. However, if there was an intermediate sync after the change was made, then the sync reflects the reversion. If you want full consistency among N participants, you have to have a consistency protocol in conjunction with the binary sync protocol.
More Complicated Sync
The binary sync is easy to implement and understand, and it works for a great variety of cases to produce useful results. As I mentioned earlier, associations between members of sets X and Y are a prime candidate for this type of sync. This means it’s also good for symmetric graphs (which can be represented as tuples) or directed graphs (which can be represented as triples). Triples can represent predicates in RDF, and many other things. Any time you “add and remove connections” between things, you can use this sync.
Another benefit of the binary sync is theoretical: by expressing non-binary sync in terms of binary sync, you can see the trade-offs in various syncing strategies! For example, operational transformation and modern multi-client sync strategies are designed to handle things like trees where children may be re-ordered besides simply added and removed. All the changes between nodes and edges can be done with the binary sync, but the changes in ordering between edges of a node must be done in some other, separate algorithm.
Even this other algorithm, can sort of be analyzed in terms of the binary sync. For example you can express it as storing an ordinal along with each triple, making it a quadruple. Now two different devices do two different permutations on the order of children of a certain node, changing fourth member of the quadruple of some/all of them. You can do a binary sync and then apply some operation to “flatten” or “project” the quadruples into some canonical ordering.
Once you start wanting to synchronize multiple clients editing a single document, you may want to consider simply storing the operations that were done, in order, and then sync will consist of trying to do as many operations as possible until you get stuck. That’s the approach we take at Qbix for arbitrary multi-user “streams” and “messages”.
But if all you want to do is synchronize some associations between contacts and labels, consider binary sync