ergonomic 64bit uid in 11 characters
Assume we encode millisecond resolution timestamps with each uid. Due to the birthday paradox, generating just a couple thousand items per millisecond would already result in a 50% chance of collisions. Twitter had an average of 5700 TPS back in 2013. UUIDs with 128 bit were introduced to allow basically collision free universally unique ids across distributed systems. However, the extra space is not needed when care is taken in the design and usage. E.g. Twitter introduced their Snowflake id that splits load among multiple data centers (10 bit), with each appending a running sequence (12 bit).
As for performance, 64-bit ids are hard to beat, as they're the bloodline of todays computer architectures.
Similarly, basically any database has a native int64 type.
In terms of compactness, 128-bit numbers and string representations are a mouthful to read and cumbersome to select to copy.
The improvements in efficiency of cutting information in half is significant (from transmitting bytes over fiber, to copy-pasting string rep to a friend).
Using
64 bitoffers a great tradeoff between efficiency (performance, space, ergonomics) and information density, for both centralized and distributed systems.
Let's take a look at how many characters we need to encode 64 bits in different bases.
log_16(2^64) = 16.000 -> 16
...
log_32(2^64) = 12.800 -> 13
...
log_41(2^64) ≈ 11.945 -> 12
...
log_56(2^64) ≈ 11.020 -> 12
log_57(2^64) ≈ 10.972 -> 11
log_58(2^64) ≈ 10.925 -> 11
log_59(2^64) ≈ 10.879 -> 11
...
log_64(2^64) ≈ 10.666 -> 11
...
log_85(2^64) ≈ 9.985 -> 10
- Using
hexencoding, we need of 16 characters to encode the64 / 4nibbles. - Moving up,
base32requires 13 characters. - We get the next improvement at
base41with 12 characters. As there's no standard alphabet for this, and there's better alternatives coming, let's move on. - With
base57we're getting the first 11 character encoding. This is very close to the well standardizedbase58. - Now
base58which has a confusion-free alphanumeric alphabet, popularized by bitcoin and wallet addresses. - Anything larger than this up to
base64turns out also doesn't add any value. Only atbase85we could safe another character, however there's no good (i.e. url-safe, readable, standardized) set of symbols.
Base58is a great choice, with an almost perfect information density for the given alphabet and number of characters. It offers a great tradeoff between ergonomics and compactness.
We want to encode time. What resolution should we aim for? Time since unix epoch is usually counted in either seconds or milliseconds. UUIDv7 and co. suggest milliseconds or even finer. Snowflake and many derivatives use milliseconds. On the other hand, Sonyflake uses 10 ms discretization.
Ultimately, while a coarser discretization like 10 ms allows for significantly longer lifetimes (e.g. 69.7 vs. 697 years at 41 bits), we feel like this doesn't outweigh the lack of resolution for use cases that benefit from it.
Real-time applications benefit from full millisecond-resolution timestamps, e.g. for logging or tracing.
Let's look at the end of the range that we can represent with different number of bits.
| Bits | Max range (ms) | Human range | Last timestamp (UTC, from 1970‑01‑01) |
|---|---|---|---|
| 16 | 65,535 | 0 years, 0 days, 00:01:05.535 | 1970-01-01T00:01:05.535Z |
| 24 | 16,777,215 | 0 years, 0 days, 04:39:37.215 | 1970-01-01T04:39:37.215Z |
| 32 | 4,294,967,295 | 0 years, 49 days, 17:02:47.295 | 1970-02-19T17:02:47.295Z |
| 35 | 34,359,738,367 | 1 years, 32 days, 16:22:18.367 | 1971-02-02T16:22:18.367Z |
| 41 | 2,199,023,255,551 | 69 years, 249 days, 15:47:35.551 | 2039-09-07T15:47:35.551Z |
| 42 | 4,398,046,511,103 | 139 years, 134 days, 07:35:11.103 | 2109-05-15T07:35:11.103Z |
| 44 | 17,592,186,044,415 | 557 years, 173 days, 06:20:44.415 | 2527-06-23T06:20:44.415Z |
| 48 | 281,474,976,710,655 | ≈8,920 years | — |
| 64 | 18,446,744,073,709,551,615 | ≈584,554,049 years | — |
The design goal is for entries to last at least 100 years. While 41 bit is close, adding the extra bit gets us comfortably beyond 100 years into the future. This also leaves the 22 bits for randomness or machine / sequence fields, similar to Snowflake and others.
42 bitmillisecond resolution timestamp lasts us easily over 100 years, forming a practical sweetspot.
The name is a combination of the roman numeral
XIfor the number eleven, andIDfor unique identifier.
We don't expect the timestamp to be used much in retrospect, especially for large-scale modern-tech systems. So choosing a custom epoch beyond unix epoch adds multiple decades of lifetime basically for free. However, we do want to accomodate the use case, and some time around the introduction of Snowflake feels like a reasonable cutoff.
To lean into the 11 and
XItheme, we chose the11th of November '11, at 11:11:11 o'clock and 111 msas epoch forxid.
75mc2sQdSji
AiD2n4c2d9p
Uve9KXZjpY5
Ca3Z7BAVbp9
Wjg3reUoieQ
fCN4sYuF64d
deGzSwfGZSo
cfUS2bVCP31
GWmu4gA7UqE
iYsvKPYXFo7
1Ag98BdKNx7
over sequential time points
5Gg7kK8XiNh -> 2025-08-05T18:27:15.704
5Gg7kKAiLEN -> 2025-08-05T18:27:15.728
5Gg7kKCEzHz -> 2025-08-05T18:27:15.744
5Gg7kKDeKyY -> 2025-08-05T18:27:15.760
5Gg7kKF2aNq -> 2025-08-05T18:27:15.775
5Gg7kKGZjLp -> 2025-08-05T18:27:15.791
5Gg7kKHvHCV -> 2025-08-05T18:27:15.806
5Gg7kKKTBZU -> 2025-08-05T18:27:15.822
5Gg7kKLrqbW -> 2025-08-05T18:27:15.837
5Gg7kKNSBXc -> 2025-08-05T18:27:15.854
5Gg7kKPegwg -> 2025-08-05T18:27:15.868
at single time point
5Gg7ttomy2c -> 2025-08-05T18:32:27.225
5Gg7ttojr4N -> 2025-08-05T18:32:27.225
5Gg7ttohTDi -> 2025-08-05T18:32:27.225
5Gg7ttojASV -> 2025-08-05T18:32:27.225
5Gg7ttoktTR -> 2025-08-05T18:32:27.225
5Gg7ttoibpF -> 2025-08-05T18:32:27.225
5Gg7ttohZFU -> 2025-08-05T18:32:27.225
5Gg7ttoieuz -> 2025-08-05T18:32:27.225
5Gg7ttokCg7 -> 2025-08-05T18:32:27.225
5Gg7ttojjRu -> 2025-08-05T18:32:27.225
5Gg7ttok3LG -> 2025-08-05T18:32:27.225
- David Blackman and Sebastiano Vigna for their work on fast PRNGs
- Ex-Twitter team and their fantastic engineering blog posts