DHTs for data availability of seals in monthly roots

# Distributed Seal Storage with Logarithmic Retention

## Problem Statement

Users with older seals must aggregate more proofs of non-inclusion into their account history proof. While aggregating one Merkle path per month requires minimal computational effort, this creates a potential challenge for zk-based cryptocurrencies as stores of value. Users shouldn't risk losing their savings if they become temporarily inactive.

## Proposed Solution: Distributed Data Availability

Full nodes can implement a logarithmically decreasing storage strategy:

1. Generate a random ID for each node
2. Store 100% of seals from the current month
3. Store a decreasing percentage of seals from previous months, based on proximity to the node's ID
4. Proximity is determined by the number of leading zeros in the XOR of the seal and the node's ID

## Storage Distribution by Month

| Months ago | % of seals stored | Nodes needed for 99% availability |
|------------|-------------------|-----------------------------------|
| 0                 | 100%                      | 1                  
| 1                  | 50%                       | 8                           
| 2                 | 25%                        | 18                             
| 3                 | 12.5%                     | 36                  
| 4                 | 6.25%                     | 73                   
| 5                 | 3.125%                   | 147                         
| 6                 | 1.56%                     | 294                              
| 12               | 0.024%                   | 18861

## Mathematical Model

The number of nodes required can be expressed as:

$$
N = \left\lceil \frac{\log(1 - P)}{\log\left(1 - \frac{1}{R^M}\right)} \right\rceil
$$

Where:
- $P$ is the target probability of finding a seal from $M$ months ago
- $N$ is the required number of nodes
- $R$ is the reduction factor (rate of falloff) for seal storage each month

For example, with $R = 2$, we need $N = 18,862$ nodes to achieve a $P = 99$% chance of finding a seal from $M = 12$ months ago.

## Storage Overhead Analysis

The total storage requirement for each node depends on the reduction factor $R$. For $R = 2$ the storage overhead asymptotically approach twice the average number of seals published per month.

This can be generalized as:

$$
S = \frac{1 - R^{-M}}{1 - \frac{1}{R}}, \quad R \neq 1
$$

Where:
- $S$ is the storage overhead factor
- $R$ and $M$ are as defined above

## Query Efficiency

The system can be enhanced with a structured network topology (like a k-bucket DHT) allowing:

- Logarithmic query complexity ($O(\log_k(n))$ roundtrips)
- Fast retrieval of historical seals (~300ms based on Mainline DHT experience)

## Conclusion

This approach enables:

1. 99% data availability for seals up to 12 months old, for a network size comparable to the current Bitcoin node count.
3. Only double the storage requirement per node
4. Fast, logarithmic-time queries for historical data

This pattern could potentially be applied to Bitcoin's UTXO set once ZeroSync enables historical data pruning, reducing the the cost of running a fully validating node.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DHTs for data availability of seals in monthly roots #1

Distributed Seal Storage with Logarithmic Retention

Problem Statement

Proposed Solution: Distributed Data Availability

Storage Distribution by Month

Mathematical Model

Storage Overhead Analysis

Query Efficiency

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Months ago	% of seals stored	Nodes needed for 99% availability
0	100%	1
1	50%	8
2	25%	18
3	12.5%	36
4	6.25%	73
5	3.125%	147
6	1.56%	294
12	0.024%	18861

DHTs for data availability of seals in monthly roots #1

Description

Distributed Seal Storage with Logarithmic Retention

Problem Statement

Proposed Solution: Distributed Data Availability

Storage Distribution by Month

Mathematical Model

Storage Overhead Analysis

Query Efficiency

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions