Skip to content

Average imputed earnings percentiles with national values#312

Open
vahid-ahmadi wants to merge 1 commit intomainfrom
fix/average-missing-percentiles-68
Open

Average imputed earnings percentiles with national values#312
vahid-ahmadi wants to merge 1 commit intomainfrom
fix/average-missing-percentiles-68

Conversation

@vahid-ahmadi
Copy link
Collaborator

Summary

  • Missing ASHE percentiles (91–98, 100) were imputed using ratio scaling only — this assumes every area has the same income tail shape as the national distribution, causing instability for areas with unusual tails (e.g. wealthy London boroughs get inflated upper percentiles, poorer areas get theirs deflated)
  • Now blends the ratio-scaled estimate with the national reference value (50/50 by default), pulling extreme local estimates toward the national centre (shrinkage)
  • Extracts the duplicated fill_missing_percentiles() and reference_values from both LA and constituency scripts into a shared earnings_percentiles.py module
  • The national_weight parameter controls blending: 0.0 = old behaviour (pure ratio), 0.5 = equal blend (default), 1.0 = pure national

Test plan

  • 8 unit tests covering: known values preserved, missing values filled, all-NaN unchanged, blending pulls toward national, weight=0 matches old behaviour, weight=1 gives national values, monotonicity, downward extrapolation
  • CI passes (ruff, pytest)

Closes #68

🤖 Generated with Claude Code

Missing ASHE percentiles (91-98, 100) were imputed using ratio scaling
from national reference values alone. This causes instability for areas
with unusual income tails. Now blends the ratio estimate with the
national reference value (50/50 by default) to shrink extreme estimates.

- Extract shared fill_missing_percentiles into earnings_percentiles.py
- Add national_weight parameter (0=old behaviour, 0.5=default blend)
- Remove duplicated code from LA and constituency target scripts
- Add 8 unit tests covering preservation, blending, monotonicity

Closes #68

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate averaging for missing percentiles in local area earnings

1 participant