Skip to content

Add extend_single_year_dataset for fast dataset year projection#7700

Open
anth-volk wants to merge 10 commits intomainfrom
add-extend-single-year-dataset
Open

Add extend_single_year_dataset for fast dataset year projection#7700
anth-volk wants to merge 10 commits intomainfrom
add-extend-single-year-dataset

Conversation

@anth-volk
Copy link
Copy Markdown
Contributor

@anth-volk anth-volk commented Mar 4, 2026

Fixes #7699

Why this is needed

The API v2 alpha and the policyengine Python package require entity-level Pandas HDFStore datasets (one table per entity: person, household, tax_unit, etc.) to run microsimulations. The current US data pipeline (policyengine-us-data) publishes variable-centric h5py files (variable/year → array), so converting between the two formats currently requires routing every variable through sim.calculate() via create_datasets() — a process that takes over an hour per state and doesn't scale to the 500+ geographic datasets we need to serve.

The UK avoids this entirely: policyengine-uk-data publishes entity-level HDFStore files directly, and policyengine-uk has extend_single_year_dataset() which projects a single base-year dataset to multiple years via simple multiplicative uprating on DataFrames — no simulation engine involved. This PR brings the same capability to the US.

How it works

Dataset schema classes (dataset_schema.py)

USSingleYearDataset holds six entity DataFrames (person, household, tax_unit, spm_unit, family, marital_unit) plus a time_period. It can load from / save to Pandas HDFStore files, and provides .copy() for deep-copying all DataFrames.

USMultiYearDataset wraps a dict[int, USSingleYearDataset] keyed by year. Its .load() returns data in {variable: {year: array}} format (time_period_arrays), which is what policyengine-core's Microsimulation expects for multi-year datasets.

Uprating logic (economic_assumptions.py)

extend_single_year_dataset(dataset, end_year=2035) takes a single base-year dataset and produces a multi-year dataset by:

  1. Copying the base-year DataFrames for each year from base_year through end_year
  2. Applying multiplicative uprating year-over-year: for each variable column, it looks up system.variables[var].uprating to get a dotted parameter path (e.g. "calibration.gov.irs.soi.employment_income"), resolves it against system.parameters, and computes factor = param(current_year) / param(previous_year). The column values are then multiplied by that factor.
  3. Carrying forward variables without an uprating parameter unchanged (e.g. age, entity IDs).

This is the same approach used by policyengine-uk. The uprating mapping is derived entirely from system.variables at runtime — the 62 variables with explicit uprating = "..." and the 108 variables assigned via default_uprating.py are all picked up automatically. No separate list to maintain.

Dual-path loading (system.py)

Microsimulation.__init__ now auto-detects dataset format before calling super().__init__():

  • HDFStore format (entity names like person, household as top-level HDF5 keys): loads as USSingleYearDataset, extends via extend_single_year_dataset(), and passes the resulting USMultiYearDataset to policyengine-core.
  • Legacy h5py format (variable names as top-level keys): falls through to the existing CoreMicrosimulation code path, unchanged.

Format detection (_is_hdfstore_format) inspects the top-level HDF5 keys — entity names indicate HDFStore, variable names indicate h5py.

How we verify correctness

Unit tests (22 tests, ~0.3s)

The test suite in tests/microsimulation/data/ uses mock system objects (mock parameters, mock variables) to avoid loading the full tax-benefit system, keeping tests fast and deterministic. Coverage includes:

  • _resolve_parameter (3 tests): valid dotted path, invalid path, partially valid path
  • _apply_single_year_uprating (7 tests): correct multiplicative scaling, non-uprated variables unchanged, household entity uprating, unknown columns passed through, unresolvable uprating path, division-by-zero guard (previous param value = 0), zero base values preserved
  • extend_single_year_dataset (12 tests): correct year count, single-year edge case, default end year (2035), base year values unchanged, year 1 uprating, year 2 chaining (verifies uprating compounds from year N to N+1 to N+2, not from base), non-uprated variable identical across all years, row counts preserved, time_period correctness per year, return type, input dataset immutability, multi-entity uprating (person + household)

Roundtrip validation (policyengine-us-data PR #568)

A separate one-off validation script in -us-data reads an existing h5py state dataset (e.g. NV.h5), converts it to HDFStore using the same splitting logic, and compares all ~183 variables between the two formats. This passed 183/183 on the Nevada dataset.

Depends on

Test plan

  • make test-other passes (runs the 22 unit tests via pytest)
  • Load an HDFStore file via Microsimulation(dataset="path/to/STATE.hdfstore.h5") — verify it loads and extends correctly
  • Load a legacy h5py file via Microsimulation(dataset="path/to/STATE.h5") — verify existing path still works
  • Verify uprated variables (e.g. employment_income) grow year-over-year
  • Verify non-uprated variables (e.g. age) are carried forward unchanged

🤖 Generated with Claude Code

@PavelMakarchuk
Copy link
Copy Markdown
Collaborator

PR Review

🔴 Critical (Must Fix)

1. USMultiYearDataset.__init__ uses if/if instead of if/elif — double-processing bug
dataset_schema.py:175-201

If both datasets and file_path are provided, both branches execute and file_path silently overwrites self.datasets. This should be elif. Also, if neither is provided, self.datasets is never set, causing an AttributeError on line 204.

2. _is_hdfstore_format may not work correctly with actual HDFStore files
system.py:218-239

HDFStore (PyTables) files accessed via h5py expose a different key structure than pd.HDFStore.keys(). Consider using pd.HDFStore directly for detection:

with pd.HDFStore(file_path, mode="r") as store:
    return bool(entity_names & {k.strip("/") for k in store.keys()})

3. No handling of USMultiYearDataset passed directly to Microsimulation
system.py:287-308

The dual-path detection handles str and USSingleYearDataset but not USMultiYearDataset. If a caller passes an already-extended multi-year dataset, it falls through to super().__init__() unhandled.


🟡 Should Address

4. validate_file_path validates with h5py but loads with pd.HDFStore
dataset_schema.py:45-68 vs 84-94 — Using different libraries for validation vs loading could cause mismatches. Use the same library for both.

5. _resolve_dataset_path returns None silently for non-HF, non-existent paths
system.py:199-215 — A typo'd path like "data/staet.h5" returns None, skips HDFStore check, and passes the string to super().__init__() producing a confusing error. Consider raising FileNotFoundError early.

6. Test mocking strategy is fragile
test_extend_single_year_dataset.py:736-760 — Direct sys.modules manipulation is thread-unsafe and can leak state. Use unittest.mock.patch.dict("sys.modules", ...) instead.

7. No tests for file I/O paths
The save() / load() / file-based __init__ for both USSingleYearDataset and USMultiYearDataset are untested — these are the paths used in production.

8. USSingleYearDataset.load() may produce duplicate keys across entities
dataset_schema.py:142-147 — If two entities share a column name, the second silently overwrites the first in the returned dict.


🟢 Suggestions

  • Changelog fragment is very long — consider shortening to "Add extend_single_year_dataset for fast multi-year dataset projection"
  • Consider adding __repr__ to dataset classes for easier debugging

Validation Summary

Check Result
Code Patterns 3 critical issues
Test Coverage 2 gaps (no file I/O tests, fragile mocking)
CI Status No checks found
Architecture Sound — mirrors policyengine-uk approach
Documentation PR description is excellent

Recommendation: Address the if/elif bug and HDFStore detection before merge. Core approach is solid.

To auto-fix issues: /fix-pr 7700

@anth-volk anth-volk force-pushed the add-extend-single-year-dataset branch from 1677593 to 4c98a8e Compare March 17, 2026 18:22
@anth-volk
Copy link
Copy Markdown
Contributor Author

Review fixes applied

All 8 review items have been addressed in commit 4c98a8e. Here's what was done and how to verify each:

Critical fixes

1. USMultiYearDataset.__init__ if/if bug (dataset_schema.py)

  • Fix: Changed if/if to if/elif/else. Added explicit guard rejecting both args and raising ValueError when neither is provided.
  • Tests: TestUSMultiYearDatasetInit::test_given_neither_arg_then_raises_value_error, test_given_both_args_then_raises_value_error

2. _is_hdfstore_format uses h5py for PyTables files (system.py)

  • Fix: Replaced h5py.File with pd.HDFStore(file_path, mode="r"). Uses k.strip("/").split("/")[0] to handle both single-year and multi-year key formats.
  • Tests: TestIsHdfstoreFormat::test_entity_level_file_returns_true, test_variable_level_file_returns_false, test_nonexistent_file_returns_false

3. No USMultiYearDataset handling in Microsimulation.__init__ (system.py)

  • Fix: Added elif isinstance(dataset, USMultiYearDataset): pass branch so already-extended datasets are explicitly handled.
  • Verification: Visual inspection — the branch is a no-op passthrough. Full integration testing requires the tax-benefit system to load.

Should-fix items

4. validate_file_path uses h5py but __init__ loads with pd.HDFStore (dataset_schema.py)

  • Fix: Replaced h5py.File validation with pd.HDFStore(file_path, mode="r"). Removed import h5py from the module entirely.
  • Tests: Covered by TestFileIORoundtrips::test_single_year_save_and_load_roundtrip (validate runs during __init__).

5. _resolve_dataset_path returns None silently (system.py)

  • Fix: Changed return None to raise FileNotFoundError(f"Dataset file not found: {dataset_str}").
  • Tests: TestResolveDatasetPath::test_nonexistent_path_raises_file_not_found, test_existing_path_returns_path

6. Test mocking strategy is fragile (test_extend_single_year_dataset.py)

  • Fix: Replaced manual sys.modules save/restore with patch.dict(_sys.modules, ...) context manager.
  • Verification: All 22 original tests still pass — the refactored helper is used by every TestExtendSingleYearDataset test.

7. No tests for file I/O paths (test_extend_single_year_dataset.py)

  • Fix: Added TestFileIORoundtrips class with 3 tests.
  • Tests: test_single_year_save_and_load_roundtrip, test_multi_year_save_and_load_roundtrip, test_multi_year_load_returns_time_period_arrays

8. USSingleYearDataset.load() duplicate keys across entities (dataset_schema.py)

  • Fix: Added duplicate column detection — raises ValueError if a column name appears in multiple entity DataFrames.
  • Tests: TestSingleYearDatasetLoad::test_load_raises_on_duplicate_column_names, test_load_returns_all_entity_columns

Summary

Total new tests added: 12 (34 total, up from 22). All pass in ~2s.

pytest policyengine_us/tests/microsimulation/data/test_extend_single_year_dataset.py -v

@PavelMakarchuk
Copy link
Copy Markdown
Collaborator

PR Review (Updated)

Previous review findings were mostly addressed in the "Fix review items" commit. This is a re-review of the current state.

🔴 Critical (Must Fix)

1. Missing tables (pytables) dependency — CI is failing
pyproject.tomlpd.HDFStore requires the tables package. 5 tests fail in CI with ImportError: Import pytables failed. Add tables to dependencies.

2. Bare except Exception in _is_hdfstore_format
system.py — Silently swallows all errors (permission, memory, corruption) and returns False, making the file appear to be legacy format. Narrow to except (OSError, IOError, KeyError, ValueError) or at minimum add debug logging.

3. USSingleYearDataset.__init__ opens HDFStore in write mode
dataset_schema.py:~66pd.HDFStore(file_path) defaults to mode='a' (read-write). Should be mode='r' since it's only reading. validate_file_path correctly uses mode='r', but the constructor doesn't. Could fail on read-only filesystems.


🟡 Should Address

4. dataset read from kwargs twice in Microsimulation.__init__
system.py:~254-278 — First read checks for cps_2023, second read does HDFStore detection. If dataset is passed positionally via *args, the HDFStore detection is silently skipped. Consolidate into a single dataset resolution block.

5. _resolve_dataset_path return type inconsistency
system.py — The function raises FileNotFoundError for non-existent paths (never returns None), but the caller checks if local_path is not None. This dead check is misleading.

6. _apply_uprating imports full system at call time
economic_assumptions.py — The deferred from policyengine_us.system import system loads the full tax-benefit system on first call. Consider accepting system as an optional parameter to make the dependency explicit and eliminate the fragile sys.modules patching in tests.

7. Entity constant duplication
dataset_schema.py defines US_ENTITIES; system.py:_is_hdfstore_format redefines the same set inline. Import and reuse US_ENTITIES as single source of truth.

8. No validation for end_year >= start_year
economic_assumptions.py — If end_year < start_year, range() returns empty and only the base year is returned silently. Add a ValueError.

9. USMultiYearDataset.load() inconsistent duplicate-column handling
USSingleYearDataset.load() raises on duplicate columns across entities, but USMultiYearDataset.load() silently overwrites. Behavior should be consistent.

10. validate() method defined but never called
dataset_schema.py — The NaN validation method is unused. Either integrate into the loading flow or remove.

11. 14 unrelated whitespace-only changes
Removing blank lines after def formula across 14 unrelated files inflates the diff. Consider a separate formatting PR.


🟢 Suggestions

  1. end_year=2035 default is a magic number — extract to a named constant
  2. No integration test with the real tax-benefit system — all 22 tests use mocks
  3. No test for hf:// URL path in _resolve_dataset_path
  4. Changelog fragment is verbose — consider shortening

Validation Summary

Check Result
CI Status ❌ FAILING (missing pytables dep, 5 tests)
Previous Review Items ✅ Most fixed
Code Patterns 3 critical, 8 should-address
Test Coverage 22 unit tests, good mocks, missing integration + edge cases
Unrelated Changes 14 whitespace-only files

Next Steps

To auto-fix issues: /fix-pr 7700

Or address manually and re-request review.

@anth-volk
Copy link
Copy Markdown
Contributor Author

Re-review fixes (commit a820388)

All 11 findings from the re-review have been addressed, plus 3 additional issues found during a follow-up code review.

Original review findings

# Finding Resolution
1 Missing tables (pytables) dependency Added tables>=3.9 to runtime dependencies in pyproject.toml
2 Bare except Exception in _is_hdfstore_format Narrowed to except (OSError, IOError, KeyError, ValueError)
3 USSingleYearDataset.__init__ opens HDFStore in write mode Changed to mode="r"
4 dataset read from kwargs twice in Microsimulation.__init__ Consolidated into a single read
5 _resolve_dataset_path return type — dead None check Removed the dead local_path is not None guard
6 _apply_uprating imports full system at call time Added system=None parameter to both extend_single_year_dataset and _apply_uprating; tests now pass system directly instead of patching sys.modules
7 Entity constant duplication _is_hdfstore_format now imports and reuses US_ENTITIES from dataset_schema.py
8 No validation for end_year >= start_year Added ValueError guard + new test
9 USMultiYearDataset.load() inconsistent duplicate-column handling Added duplicate-column detection per year, matching USSingleYearDataset.load() behavior + new test
10 validate() method defined but never called Removed dead code
11 Whitespace-only changes in unrelated files No action — these are from make format (ruff), which is required by project guidelines

Also extracted DEFAULT_END_YEAR = 2035 constant (green suggestion).

Additional issues found (pre-existing, not regressions)

These three issues were present in the original PR code and were caught by a follow-up code review. They predate the re-review:

# Finding Resolution
A USSingleYearDataset.__init__ crashes on files missing optional entities — save() skips empty DataFrames but __init__ unconditionally reads all 6, causing KeyError on roundtrip with minimal datasets spm_unit, family, marital_unit now fall back to pd.DataFrame() when absent from the HDF5 file
B USMultiYearDataset.__init__ also opens HDFStore in append mode (same issue as finding #3 but in the multi-year class) Changed to mode="r"
C validate_file_path also has bare except Exception (same issue as finding #2) Narrowed to except (OSError, IOError, KeyError, ValueError)

anth-volk and others added 8 commits March 20, 2026 22:12
Adds USSingleYearDataset and USMultiYearDataset schema classes,
extend_single_year_dataset() with multiplicative uprating from the
parameter tree, and dual-path loading in Microsimulation that
auto-detects entity-level HDFStore files and extends them without
routing through the simulation engine.

Legacy h5py files continue to work via the existing code path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
22 tests covering _resolve_parameter, _apply_single_year_uprating,
and end-to-end extend_single_year_dataset. Uses mock system objects
to avoid loading the full tax-benefit system (~0.3s total runtime).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix USMultiYearDataset.__init__ if/if bug (use if/elif/else, reject
  both or neither args)
- Fix validate_file_path to use pd.HDFStore instead of h5py
- Fix USSingleYearDataset.load() to detect duplicate column names
- Fix _is_hdfstore_format to use pd.HDFStore instead of h5py
- Fix _resolve_dataset_path to raise FileNotFoundError instead of
  returning None silently
- Add explicit USMultiYearDataset branch in Microsimulation.__init__
- Refactor test mocking to use patch.dict for thread safety
- Add 12 new tests: init validation, duplicate keys, format detection,
  path resolution, and file I/O roundtrips

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add tables>=3.9 runtime dependency for pd.HDFStore (finding #1)
- Narrow bare except Exception to specific types in _is_hdfstore_format,
  validate_file_path (findings #2, reviewer #3)
- Open HDFStore in mode="r" in USSingleYearDataset and USMultiYearDataset
  constructors (findings #3, reviewer #2)
- Make optional entities (spm_unit, family, marital_unit) fall back to
  empty DataFrame when absent from HDF5 file (reviewer #1)
- Consolidate duplicate kwargs.get("dataset") in Microsimulation.__init__
  and remove dead None check (findings #4, #5)
- Accept system=None in extend_single_year_dataset and _apply_uprating to
  allow direct injection, eliminating sys.modules patching in tests (#6)
- Import and reuse US_ENTITIES instead of inline duplication (#7)
- Add end_year >= start_year validation in extend_single_year_dataset (#8)
- Add duplicate-column detection in USMultiYearDataset.load() (#9)
- Remove unused validate() method (#10)
- Extract DEFAULT_END_YEAR constant (green suggestion)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@anth-volk anth-volk force-pushed the add-extend-single-year-dataset branch from a820388 to 2022c2a Compare March 20, 2026 21:16
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@PavelMakarchuk
Copy link
Copy Markdown
Collaborator

Program Review: PR #7700 — Add extend_single_year_dataset for fast dataset year projection

PR Type

Infrastructure — HDFStore dataset support and multiplicative uprating for year projection

CI Status

  • Quick Feedback (Selective Tests + Coverage): FAILED — transient runner shutdown (The runner has received a shutdown signal), not a test failure. Re-run should fix.
  • All other checks: PASSING

Critical (Must Fix)

  1. USSingleYearDataset.save() uses HDFStore append mode by default (dataset_schema.py:135). Opens with pd.HDFStore(file_path) which defaults to mode="a". If the file already exists with stale data, save will append rather than overwrite, causing data corruption. USMultiYearDataset.save() correctly calls Path(file_path).unlink(missing_ok=True) before writing, but USSingleYearDataset.save() does not. Fix: add mode="w" or unlink() before write.

  2. Zero integration tests for Microsimulation.__init__ dual-path loading (system.py). The most important user-facing code path — where HDFStore format is detected and auto-extended — has no test coverage at all. All four branches (HDFStore string path, legacy h5py string path, USSingleYearDataset object, USMultiYearDataset object) are untested. This is the primary entry point for users and a regression risk.

Should Address

  1. Hard-coded time_period=2024 default (dataset_schema.py:82). Constructor default couples the schema class to a specific year. Consider using a CURRENT_YEAR constant or requiring explicit specification.

  2. Hard-coded DEFAULT_END_YEAR = 2035 (economic_assumptions.py:6). Named constant but not derived from any base year. Consider CURRENT_YEAR + 11 or add a comment explaining the choice of 2035.

  3. Early FileNotFoundError behavior change (system.py, new HDFStore detection block). When dataset is a string pointing to a non-existent local file, the new _resolve_dataset_path raises FileNotFoundError before reaching super().__init__(). Previously, invalid strings would pass through to the parent class. This silent behavior change should be documented or handled with a fallback.

  4. No test for hf:// URL path in _resolve_dataset_path. The HuggingFace download branch is completely untested. Add a mocked test to verify the download flow without network access.

  5. Missing __init__.py for test discovery (policyengine_us/tests/microsimulation/data/). The fixtures subdirectory has __init__.py but the parent data/ directory may not. Verify this exists; if missing, pytest may not discover the new tests.

  6. Validation only checks 3 of 6 entity keys (dataset_schema.py:59). validate_file_path checks person, household, tax_unit but not spm_unit, family, marital_unit. Add a named constant (e.g., REQUIRED_ENTITIES) and a comment clarifying this is intentional.

  7. Lazy import shadows parameter name (economic_assumptions.py:305-306). from policyengine_us.system import system inside _apply_uprating shadows the system function parameter. Rename the import target to avoid confusion (e.g., import system as _default_system).

  8. No backward compatibility test for legacy h5py format. When _is_hdfstore_format() returns False, code should fall through to existing behavior. No test confirms legacy datasets still load correctly through the new code path.

Suggestions

  1. Dead elif pass branch (system.py:538-539). elif isinstance(dataset, USMultiYearDataset): pass is a no-op. Either remove it or add kwargs["dataset"] = dataset for explicitness.

  2. Hard-coded BASE_YEAR = 2024 in test fixtures (economic_assumptions_fixtures.py). Acceptable for deterministic tests but will need updating when the project base year changes. Consider a comment noting this dependency.

  3. Missing edge case tests for USSingleYearDataset. Constructor validation errors (missing required DataFrames, non-.h5 extension, raise_exception=False path) and USMultiYearDataset edge cases (empty datasets list, duplicate years, get_year with nonexistent year) are untested.

  4. No test for negative income values during uprating. Per CLAUDE.md guidance on negative earnings being a known gotcha, a test confirming sign preservation through multiplicative uprating would be valuable.

  5. Cosmetic formatting changes (15 files) are clean auto-formatting from make format / ruff. No issues.

  6. tables>=3.9 dependency adds nontrivial transitive dependencies (blosc2, numexpr, etc.). Verify this is installed in all CI environments; if the selective test runner uses a cached environment, HDFStore imports may fail.

Validation Summary

Check Result
Code Patterns 2 critical, 4 should-address, 2 suggestions
Test Coverage 22 unit tests passing; 0 integration tests for primary entry point
CI Status Transient runner failure (not code-related)
Security No issues found
Changelog Present and correctly formatted
Formatting All cosmetic changes consistent with make format

Review Severity: COMMENT

Rationale: The append-mode bug in USSingleYearDataset.save() is a data corruption risk but is limited to the save path (not the primary read/extend flow). The zero integration test coverage for the Microsimulation.__init__ changes is a significant gap but does not block merge if the author commits to adding them as a follow-up. No hard blockers that would warrant REQUEST_CHANGES, but these issues should not be ignored.

Next Steps

To auto-fix issues: /fix-pr 7700

- Add name, label, file_path properties to USSingleYearDataset and
  USMultiYearDataset for policyengine-core Simulation compatibility
- Fix USSingleYearDataset.save() append-mode bug (unlink before write)
- Extract _REQUIRED_ENTITIES and _DEFAULT_TIME_PERIOD constants
- Fix import shadowing in _apply_uprating (system -> _system)
- Remove dead elif-pass branch, add core-override documentation comment
- Create missing __init__.py files for test discovery
- Add 23 new tests: constructor edge cases, hf:// URL mock, legacy h5py
  compat, Microsimulation dataset routing integration, save regression

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.83%. Comparing base (2a9fe46) to head (7e4218b).
⚠️ Report is 58 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##              main    #7700       +/-   ##
============================================
- Coverage   100.00%   55.83%   -44.17%     
============================================
  Files            1        7        +6     
  Lines           34      120       +86     
  Branches         0        1        +1     
============================================
+ Hits            34       67       +33     
- Misses           0       53       +53     
Flag Coverage Δ
unittests 55.83% <ø> (-44.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@anth-volk
Copy link
Copy Markdown
Contributor Author

@PavelMakarchuk made some updates and incorporated some of the review comments. I believe codecov fails like this because I made a couple minor edits to the US-specific Microsimulation class file, and that file is not fully tested. I would prefer not to add tests there, as they did not exist prior to these changes, either, and would require significant mocking.

@PavelMakarchuk
Copy link
Copy Markdown
Collaborator

The only remaining thing I worry about is the hard coded:

BASE_YEAR = 2024
END_YEAR_DEFAULT = 2035
END_YEAR_SHORT = 2026

@anth-volk
Copy link
Copy Markdown
Contributor Author

I chose to hard-code these because we have at times explicitly decided to generate up until a particular year, but not afterward. At the very least, I think the start year should be fixed, but if you think the end should just automatically be, e.g., 10 after current, I can adjust. This would mean that on January 1, 2027, we will automatically calculate 2037 uprating.

@PavelMakarchuk
Copy link
Copy Markdown
Collaborator

I chose to hard-code these because we have at times explicitly decided to generate up until a particular year, but not afterward. At the very least, I think the start year should be fixed, but if you think the end should just automatically be, e.g., 10 after current, I can adjust. This would mean that on January 1, 2027, we will automatically calculate 2037 uprating.

I think it should be based on our CPI projections as well but yes automatic would great since I anticipate us forgetting about this by the end of 2026

@anth-volk
Copy link
Copy Markdown
Contributor Author

Can you elaborate on basing it on the CPI projections? Do you have an envisioned method, or do you want me to propose one?

@PavelMakarchuk
Copy link
Copy Markdown
Collaborator

Can you elaborate on basing it on the CPI projections? Do you have an envisioned method, or do you want me to propose one?

We have clear CPI projections which we track here - those are updated quarterly and we will need to extend those annually with a clean updating cadence

@anth-volk
Copy link
Copy Markdown
Contributor Author

Which of the following are you saying:

  • We need to generally support uprating out to 2100
  • We need USSingleYear datasets out to 2100
  • We need USSingleYear datasets to automatically extend out to whatever year we update the CPI-U to (for now, 2035)
  • Something else?

@PavelMakarchuk
Copy link
Copy Markdown
Collaborator

Which of the following are you saying:

  • We need to generally support uprating out to 2100
  • We need USSingleYear datasets out to 2100
  • We need USSingleYear datasets to automatically extend out to whatever year we update the CPI-U to (for now, 2035)
  • Something else?

We need USSingleYear datasets to automatically extend out to whatever year we update the CPI-U to (for now, 2035)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add extend_single_year_dataset for fast dataset year projection

2 participants