Skip to content

DHTs for data availability of seals in monthly roots #1

@Nuhvi

Description

@Nuhvi

Distributed Seal Storage with Logarithmic Retention

Problem Statement

Users with older seals must aggregate more proofs of non-inclusion into their account history proof. While aggregating one Merkle path per month requires minimal computational effort, this creates a potential challenge for zk-based cryptocurrencies as stores of value. Users shouldn't risk losing their savings if they become temporarily inactive.

Proposed Solution: Distributed Data Availability

Full nodes can implement a logarithmically decreasing storage strategy:

  1. Generate a random ID for each node
  2. Store 100% of seals from the current month
  3. Store a decreasing percentage of seals from previous months, based on proximity to the node's ID
  4. Proximity is determined by the number of leading zeros in the XOR of the seal and the node's ID

Storage Distribution by Month

Months ago % of seals stored Nodes needed for 99% availability
0 100% 1
1 50% 8
2 25% 18
3 12.5% 36
4 6.25% 73
5 3.125% 147
6 1.56% 294
12 0.024% 18861

Mathematical Model

The number of nodes required can be expressed as:

$$ N = \left\lceil \frac{\log(1 - P)}{\log\left(1 - \frac{1}{R^M}\right)} \right\rceil $$

Where:

  • $P$ is the target probability of finding a seal from $M$ months ago
  • $N$ is the required number of nodes
  • $R$ is the reduction factor (rate of falloff) for seal storage each month

For example, with $R = 2$, we need $N = 18,862$ nodes to achieve a $P = 99$% chance of finding a seal from $M = 12$ months ago.

Storage Overhead Analysis

The total storage requirement for each node depends on the reduction factor $R$. For $R = 2$ the storage overhead asymptotically approach twice the average number of seals published per month.

This can be generalized as:

$$ S = \frac{1 - R^{-M}}{1 - \frac{1}{R}}, \quad R \neq 1 $$

Where:

  • $S$ is the storage overhead factor
  • $R$ and $M$ are as defined above

Query Efficiency

The system can be enhanced with a structured network topology (like a k-bucket DHT) allowing:

  • Logarithmic query complexity ($O(\log_k(n))$ roundtrips)
  • Fast retrieval of historical seals (~300ms based on Mainline DHT experience)

Conclusion

This approach enables:

  1. 99% data availability for seals up to 12 months old, for a network size comparable to the current Bitcoin node count.
  2. Only double the storage requirement per node
  3. Fast, logarithmic-time queries for historical data

This pattern could potentially be applied to Bitcoin's UTXO set once ZeroSync enables historical data pruning, reducing the the cost of running a fully validating node.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions