-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Distributed Seal Storage with Logarithmic Retention
Problem Statement
Users with older seals must aggregate more proofs of non-inclusion into their account history proof. While aggregating one Merkle path per month requires minimal computational effort, this creates a potential challenge for zk-based cryptocurrencies as stores of value. Users shouldn't risk losing their savings if they become temporarily inactive.
Proposed Solution: Distributed Data Availability
Full nodes can implement a logarithmically decreasing storage strategy:
- Generate a random ID for each node
- Store 100% of seals from the current month
- Store a decreasing percentage of seals from previous months, based on proximity to the node's ID
- Proximity is determined by the number of leading zeros in the XOR of the seal and the node's ID
Storage Distribution by Month
| Months ago | % of seals stored | Nodes needed for 99% availability |
|---|---|---|
| 0 | 100% | 1 |
| 1 | 50% | 8 |
| 2 | 25% | 18 |
| 3 | 12.5% | 36 |
| 4 | 6.25% | 73 |
| 5 | 3.125% | 147 |
| 6 | 1.56% | 294 |
| 12 | 0.024% | 18861 |
Mathematical Model
The number of nodes required can be expressed as:
Where:
-
$P$ is the target probability of finding a seal from$M$ months ago -
$N$ is the required number of nodes -
$R$ is the reduction factor (rate of falloff) for seal storage each month
For example, with
Storage Overhead Analysis
The total storage requirement for each node depends on the reduction factor
This can be generalized as:
Where:
-
$S$ is the storage overhead factor -
$R$ and$M$ are as defined above
Query Efficiency
The system can be enhanced with a structured network topology (like a k-bucket DHT) allowing:
- Logarithmic query complexity (
$O(\log_k(n))$ roundtrips) - Fast retrieval of historical seals (~300ms based on Mainline DHT experience)
Conclusion
This approach enables:
- 99% data availability for seals up to 12 months old, for a network size comparable to the current Bitcoin node count.
- Only double the storage requirement per node
- Fast, logarithmic-time queries for historical data
This pattern could potentially be applied to Bitcoin's UTXO set once ZeroSync enables historical data pruning, reducing the the cost of running a fully validating node.