Skip to content

comp-physics/cfd-nn

Repository files navigation

NN-CFD: Neural Network Turbulence Closures for Incompressible Flow

CI GPU CI

A high-performance C++ solver for incompressible turbulent flow with pluggable turbulence closures ranging from classical algebraic models to advanced transport equations and data-driven neural networks. Features a fractional-step projection method with multiple Poisson solvers, pure C++ NN inference, and comprehensive GPU acceleration via OpenMP target offload.

Features

  • Fractional-step projection method for incompressible Navier-Stokes
    • Explicit time integration (Euler, RK2, RK3) with adaptive CFL-based time stepping
    • Directional CFL constraints for stretched grids
    • Multiple Poisson solvers with automatic selection (FFT, Multigrid, HYPRE)
    • Pressure projection for divergence-free velocity
  • Staggered MAC grid with second-order central finite differences
  • DNS capability with trip forcing, velocity filter, and turbulence diagnostics
  • Recycling inflow BC for spatially-developing turbulent flows (Lund et al. 1998)
  • 10 turbulence closures: algebraic, transport, EARSM, and neural network models
  • Pure C++ NN inference - no Python/TensorFlow at runtime
  • GPU acceleration via OpenMP target directives for NVIDIA and AMD GPUs

Table of Contents


Quick Start

Build the Solver

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j4

Run Examples

# Laminar channel flow (Poiseuille, analytical validation)
./channel --Nx 32 --Ny 64 --nu 0.01 --adaptive_dt --max_steps 10000

# Turbulent channel with SST k-omega
./channel --Nx 64 --Ny 128 --Re 5000 --model sst --adaptive_dt

# Neural network turbulence model
./channel --model nn_tbnn --nn_preset tbnn_channel_caseholdout --adaptive_dt

# 3D Taylor-Green vortex
./taylor_green_3d --Nx 64 --Ny 64 --Nz 64 --Re 100 --max_steps 1000

Governing Equations

The solver implements the incompressible Reynolds-Averaged Navier-Stokes (RANS) equations:

Momentum Equation

$$\frac{\partial \bar{u}_i}{\partial t} + \bar{u}_j \frac{\partial \bar{u}_i}{\partial x_j} = -\frac{1}{\rho} \frac{\partial \bar{p}}{\partial x_i} + \frac{\partial}{\partial x_j}\left[(\nu + \nu_t) \frac{\partial \bar{u}_i}{\partial x_j}\right] + f_i$$

Continuity Equation (Incompressibility)

$$\nabla \cdot \mathbf{u} = 0$$

Variables:

Symbol Description
$\bar{u}_i$ Mean velocity components (u, v, w)
$\bar{p}$ Mean pressure
$\nu$ Kinematic viscosity
$\nu_t$ Turbulent eddy viscosity (from closure model)
$f_i$ Body force (e.g., pressure gradient driving force)
$\rho$ Density (constant for incompressible flow)

Numerical Methods

Fractional-Step Projection Method

The solver uses a three-step fractional-step method (Chorin 1968) to decouple pressure and velocity:

Step 1: Provisional Velocity (explicit momentum without pressure)

$$\frac{\mathbf{u}^* - \mathbf{u}^n}{\Delta t} = -(\mathbf{u}^n \cdot \nabla)\mathbf{u}^n + \nabla \cdot [(\nu + \nu_t) \nabla \mathbf{u}^n] + \mathbf{f}$$

Step 2: Pressure Poisson Equation (enforce incompressibility)

$$\nabla^2 p' = \frac{1}{\Delta t} \nabla \cdot \mathbf{u}^*$$

Step 3: Velocity Correction (project onto divergence-free space)

$$\mathbf{u}^{n+1} = \mathbf{u}^* - \Delta t \nabla p'$$

This ensures $\nabla \cdot \mathbf{u}^{n+1} = 0$ to machine precision.

Spatial Discretization

All spatial derivatives use second-order central finite differences on a staggered Marker-and-Cell (MAC) grid:

Variable Grid Location
u-velocity x-faces (staggered in x)
v-velocity y-faces (staggered in y)
w-velocity z-faces (staggered in z)
Pressure, scalars Cell centers

Gradient (central difference):

$$\left.\frac{\partial u}{\partial x}\right|_{i,j} = \frac{u_{i+1,j} - u_{i-1,j}}{2\Delta x}$$

Laplacian (5-point stencil in 2D, 7-point in 3D):

$$\left.\nabla^2 u\right|_{i,j} = \frac{u_{i+1,j} - 2u_{i,j} + u_{i-1,j}}{\Delta x^2} + \frac{u_{i,j+1} - 2u_{i,j} + u_{i,j-1}}{\Delta y^2}$$

Convective Schemes (selectable via convective_scheme config):

  • Central differences (default): Second-order accurate, lower numerical dissipation
  • First-order upwind: More stable at high Reynolds numbers, increased numerical dissipation

Variable Viscosity Diffusion:

$$\nabla \cdot [(\nu + \nu_t) \nabla u] = \frac{1}{\Delta x^2}\left[\nu_{e}(u_{i+1,j} - u_{i,j}) - \nu_{w}(u_{i,j} - u_{i-1,j})\right] + \ldots$$

where $\nu_e, \nu_w$ are face-averaged effective viscosities.

Time Integration

Explicit integrators (Euler, RK2, RK3) with adaptive time stepping.

Directional CFL

For stretched grids (common in wall-bounded DNS), the solver supports separate CFL numbers for different directions to avoid unnecessarily small time steps:

$$\Delta t = \text{dt_safety} \cdot \min\left(\frac{\text{CFL}_{xz} \cdot \Delta x}{|u|_{\max}},; \frac{\text{CFL}_{\max}}{(|v|/\Delta y)_{\max}},; \frac{\text{CFL}_{xz} \cdot \Delta z}{|w|_{\max}},; \Delta t_{\text{diff}}\right)$$

Parameter Default Purpose
CFL_max 0.5 CFL for y-direction (strict on stretched grids)
CFL_xz -1 CFL for x/z directions (-1 = use CFL_max)
dt_safety 1.0 Safety multiplier on computed dt

The y-direction uses CFL_max with per-cell $|v|/\Delta y$ to account for variable grid spacing. For DNS, typical values are CFL_max = 0.15, CFL_xz = 0.30, dt_safety = 0.85.

See docs/DNS_CHANNEL_GUIDE.md for details on directional CFL tuning.

  • Steady-state convergence: Iterate until $|\mathbf{u}^{n+1} - \mathbf{u}^n|_\infty < \text{tol}$

Boundary Conditions

Velocity Boundary Conditions

Type Description Implementation
Periodic Flow wraps around at boundaries Ghost cells copy from opposite boundary
No-slip (Wall) Zero velocity at solid surfaces $u = v = w = 0$ at wall
Inflow Prescribed velocity profile User-defined function callbacks
Outflow Convective/zero-gradient outflow Extrapolation from interior
Recycling Inflow Turbulent inflow from downstream recycle plane Lund et al. (1998) with fringe blending

Recycling Inflow: Generates realistic turbulent inlet data by extracting, shifting, and blending a downstream velocity plane back to the inlet. Includes mass flux correction and divergence correction for clean pressure solves. See docs/RECYCLING_INFLOW_GUIDE.md for details.

Pressure (Poisson) Boundary Conditions

The pressure Poisson equation supports three BC types:

Type Description Formula
Periodic Pressure wraps around $p(\text{ghost}) = p(\text{periodic partner})$
Neumann Zero normal gradient $\partial p / \partial n = 0 \Rightarrow p(\text{ghost}) = p(\text{interior})$
Dirichlet Fixed pressure value $p(\text{ghost}) = 2 p_{\text{bc}} - p(\text{interior})$

Standard BC Configurations:

Configuration x-direction y-direction z-direction Use Case
channel() Periodic Neumann Periodic Channel flow
duct() Periodic Neumann Neumann Square duct
cavity() Neumann Neumann Neumann Lid-driven cavity
all_periodic() Periodic Periodic Periodic Periodic box

Gauge Fixing

For problems with all Neumann or periodic pressure boundaries (no Dirichlet BC), the pressure is underdetermined up to a constant. The solver automatically:

  1. Detects this condition via has_nullspace() check
  2. Subtracts the mean pressure after each solve to fix the gauge

Poisson Solvers

The solver provides 6 Poisson solver options with automatic selection based on grid configuration and boundary conditions:

Automatic Solver Selection Priority

FFT (3D) → FFT2D (2D) → FFT1D (3D partial-periodic) → HYPRE → Multigrid

Available Solvers

Solver Complexity Best For Requirements
FFT O(N log N) 3D channel flows Periodic x AND z, uniform grid
FFT2D O(N log N) 2D channel flows 2D mesh, periodic x
FFT1D O(N log N) + 2D solve 3D duct flows Periodic x OR z (one only)
HYPRE PFMG O(N) Stretched grids, GPU USE_HYPRE build flag
Multigrid O(N) General fallback, stretched grids Always available
SOR O(N²) Testing/debugging Always available

Geometric Multigrid (V-Cycle)

The default solver implements a geometric multigrid V-cycle:

  1. Pre-smooth: Apply smoothing iterations on fine grid (Chebyshev or Jacobi)
  2. Restrict: Compute residual and transfer to coarse grid (full weighting)
  3. Recurse: Solve on coarse grid (recursively)
  4. Prolongate: Interpolate correction back to fine grid (bilinear)
  5. Post-smooth: Apply smoothing iterations

Features:

  • O(N) complexity (optimal)
  • 5-15 V-cycles to convergence (vs 1000-10000 SOR iterations)
  • Semi-coarsening for stretched y-grids: coarsens x/z only, uses y-line Thomas smoother
  • PCG coarse solver with breakdown restart and convergence check throttling
  • CUDA Graph optimization: Entire V-cycle captured as single GPU graph (NVHPC compilers)
  • Chebyshev polynomial smoother with Gershgorin eigenvalue bounds

See docs/POISSON_SOLVER_GUIDE.md for the full guide including semi-coarsening details.

Convergence Criteria (any triggers exit):

  • tol_rhs: RHS-relative $|r|/|b| < \epsilon$ (recommended for projection)
  • tol_rel: Initial-residual relative $|r|/|r_0| < \epsilon$
  • tol_abs: Absolute $|r|_\infty < \epsilon$

FFT-Based Solvers

For problems with periodic boundaries, FFT solvers provide spectral accuracy:

  • FFT (3D): 2D FFT in x-z + batched tridiagonal solve in y (cuSPARSE)
  • FFT2D: 1D FFT in x + batched tridiagonal in y
  • FFT1D: 1D FFT in periodic direction + 2D Helmholtz solve per mode

HYPRE PFMG

GPU-accelerated parallel semicoarsening multigrid from HYPRE:

  • Supports uniform AND stretched grids
  • Entire solve runs on GPU via CUDA backend
  • Automatic download and build via CMake FetchContent

See docs/POISSON_SOLVER_GUIDE.md for detailed documentation.


Turbulence Closures

The solver supports 10 turbulence closure options:

Summary Table

Model Type Equations Anisotropic GPU
none Direct 0 N/A Yes
baseline Algebraic 0 No Yes
gep Algebraic 0 No Yes
komega Transport 2 (k, ω) No Yes
sst Transport 2 (k, ω) No Yes
earsm_wj EARSM 2 (k, ω) Yes Yes
earsm_gs EARSM 2 (k, ω) Yes Yes
earsm_pope EARSM 2 (k, ω) Yes Yes
nn_mlp Neural Net 0 No Yes
nn_tbnn Neural Net 0 Yes Yes

Algebraic Models (Zero-Equation)

1. Mixing Length Model (baseline)

Classical model with van Driest wall damping:

$$\nu_t = (\kappa y)^2 |\mathbf{S}| \left(1 - e^{-y^+/A^+}\right)^2$$

  • $\kappa = 0.41$ (von Kármán constant)
  • $A^+ \approx 26$ (van Driest damping constant)
  • $|\mathbf{S}| = \sqrt{2S_{ij}S_{ij}}$ (strain rate magnitude)

2. GEP Model (gep)

Symbolic regression formula discovered by genetic algorithms (Weatheritt & Sandberg 2016):

$$\nu_t = f_{\text{GEP}}(S_{ij}, \Omega_{ij}, y, \text{Re}_\tau, \ldots)$$

Transport Equation Models (Two-Equation)

3. SST k-ω (sst)

Menter's Shear Stress Transport model (1994):

k-equation: $$\frac{\partial k}{\partial t} + \bar{u}_j \frac{\partial k}{\partial x_j} = P_k - \beta^* k \omega + \nabla \cdot [(\nu + \sigma_k \nu_t) \nabla k]$$

ω-equation (with cross-diffusion): $$\frac{\partial \omega}{\partial t} + \bar{u}j \frac{\partial \omega}{\partial x_j} = \alpha \frac{\omega}{k} P_k - \beta \omega^2 + \nabla \cdot [(\nu + \sigma\omega \nu_t) \nabla \omega] + CD_\omega$$

Eddy viscosity: $$\nu_t = \frac{a_1 k}{\max(a_1 \omega, S F_2)}$$

  • Blending functions F₁, F₂ for k-ε/k-ω transition
  • Production limiter for numerical stability
  • Wall boundary conditions: k = 0, ω = ω_wall(y)

4. Standard k-ω (komega)

Wilcox (1988) formulation without blending:

$$\nu_t = \frac{k}{\omega}$$

EARSM Models (Explicit Algebraic Reynolds Stress)

EARSM models predict the full Reynolds stress anisotropy tensor using a tensor basis expansion:

$$b_{ij} = \sum_{n=1}^{N} G_n(\eta, \xi) , T_{ij}^{(n)}(\mathbf{S}, \mathbf{\Omega})$$

where:

  • $b_{ij}$ = anisotropy tensor (traceless)
  • $T_{ij}^{(n)}$ = integrity basis tensors
  • $G_n$ = scalar coefficient functions
  • $\eta = Sk/\epsilon$, $\xi = \Omega k/\epsilon$ = normalized invariants

Combined with SST k-ω transport for k and ω evolution.

5. Wallin-Johansson EARSM (earsm_wj)

Most sophisticated variant with cubic implicit equation for realizability.

6. Gatski-Speziale EARSM (earsm_gs)

Quadratic model without implicit solve.

7. Pope Quadratic EARSM (earsm_pope)

Classical weak-equilibrium model using first 3 basis tensors.

Re_t-Based Blending:

EARSM models use smooth blending between linear Boussinesq (laminar) and full nonlinear (turbulent):

$$\alpha(\text{Re}_t) = \frac{1}{2}\left(1 + \tanh\left(\frac{\text{Re}_t - \text{Re}_{t,\text{center}}}{\text{Re}_{t,\text{width}}}\right)\right)$$

where $\text{Re}_t = k/(\nu\omega)$. Default transition: center at Re_t = 10, width = 5.

Neural Network Models

8. MLP (nn_mlp)

Multi-layer perceptron for scalar eddy viscosity:

$$\nu_t = \text{NN}_{\text{MLP}}(\lambda_1, \ldots, \lambda_5, y/\delta)$$

Inputs (invariants of strain and rotation tensors):

  • $\lambda_1 = S_{ij}S_{ij}$, $\lambda_2 = \Omega_{ij}\Omega_{ij}$, $\lambda_3 = S_{ij}S_{jk}S_{ki}$, ...
  • $y/\delta$ = normalized wall distance

Architecture: 6 → 32 → 32 → 1 (ReLU activations)

9. TBNN (nn_tbnn)

Tensor Basis Neural Network (Ling et al. 2016) for anisotropic Reynolds stresses:

$$b_{ij} = \sum_{n=1}^{10} g_n(\lambda_1, \ldots, \lambda_5) , T_{ij}^{(n)}$$

Architecture: 5 → 64 → 64 → 64 → 10 (outputs one coefficient per basis tensor)

Key Properties:

  • Frame invariance: Guaranteed by using invariant inputs + tensor basis
  • Realizability: Enforced during training
  • Anisotropy: Captures different normal stresses and off-diagonal components

Supported Flow Configurations

2D Channel Flow

Pressure-driven flow between parallel plates.

Configuration BCs Use Case
Poiseuille (laminar) Periodic x, walls y Analytical validation
Turbulent RANS Periodic x, walls y Model comparison

Analytical solution (Poiseuille): $$u(y) = -\frac{1}{2\nu}\frac{dp}{dx}(H^2/4 - y^2)$$

3D Square Duct Flow

Pressure-driven flow in square cross-section.

Configuration BCs Use Case
Laminar duct Periodic x, walls y and z 3D solver validation
Turbulent duct Periodic x, walls y and z Secondary flow study

3D DNS Channel Flow

Direct Numerical Simulation of turbulent channel flow without any turbulence model.

Configuration BCs Use Case
DNS Re_tau=180 Periodic x/z, walls y Turbulence benchmark (MKM 1999)

Requires trip forcing for transition, velocity filter for stability, directional CFL for stretched grids. See docs/DNS_CHANNEL_GUIDE.md.

Spatially-Developing Channel (Recycling Inflow)

Channel flow with turbulent inflow generated by recycling from a downstream plane.

Configuration BCs Use Case
Recycling inflow Inflow x_lo, outflow x_hi, walls y, periodic z Spatially-developing turbulence

See docs/RECYCLING_INFLOW_GUIDE.md.

3D Taylor-Green Vortex

Classic benchmark for unsteady flow and energy decay.

Configuration BCs Use Case
Taylor-Green All periodic DNS verification, energy decay

Initial condition: $$u = \sin(x)\cos(y)\cos(z), \quad v = -\cos(x)\sin(y)\cos(z), \quad w = 0$$

Energy decay (low Re): $$KE(t) = KE(0) \cdot e^{-2\nu t}$$


Configuration Reference

All parameters can be set via command-line arguments (--param value) or config file (key-value pairs). Command-line arguments override config file values.

Domain and Mesh

Parameter CLI Default Description
Nx --Nx 64 Grid cells in x-direction
Ny --Ny 64 Grid cells in y-direction
Nz --Nz 1 Grid cells in z-direction (1 = 2D simulation)
x_min - 0.0 Domain minimum in x
x_max - Domain maximum in x
y_min - -1.0 Domain minimum in y
y_max - 1.0 Domain maximum in y
z_min --z_min 0.0 Domain minimum in z
z_max --z_max 1.0 Domain maximum in z
stretch_y --stretch false Enable tanh stretching in y (clusters points near walls)
stretch_beta - 2.0 Y-stretching parameter (higher = more clustering)
stretch_z --stretch_z false Enable tanh stretching in z (3D only)
stretch_beta_z --stretch_beta_z 2.0 Z-stretching parameter

Physics Parameters

Parameter CLI Default Description
Re --Re 1000.0 Reynolds number
nu --nu 0.001 Kinematic viscosity
dp_dx --dp_dx -1.0 Pressure gradient (body force driving flow)
rho - 1.0 Density (constant for incompressible)

Auto-Computation of Physics Parameters

The solver uses the relationship: $\text{Re} = \frac{-dp/dx \cdot \delta^3}{3\nu^2}$ where $\delta$ is the channel half-height.

You should specify only TWO of (Re, nu, dp_dx). The third is computed automatically:

Specified Computed Use Case
--Re only nu (using default dp_dx=-1) Quick setup at desired Re
--Re --nu dp_dx Control both Re and viscosity
--Re --dp_dx nu Control Re and driving force
--nu --dp_dx Re Specify physical parameters directly
None Re (from defaults) Uses nu=0.001, dp_dx=-1.0 → Re≈1000

If all three are specified, the solver checks consistency and errors if they don't match (within 1% tolerance).

Time Stepping

Parameter CLI Default Description
dt --dt 0.001 Time step size (when not using adaptive)
adaptive_dt --adaptive_dt true Enable CFL-based adaptive time stepping
CFL_max --CFL 0.5 Maximum CFL number for adaptive dt (used for y-direction)
CFL_xz - -1.0 CFL for x/z directions (-1 = use CFL_max). Set higher than CFL_max for stretched grids
dt_safety - 1.0 Safety multiplier on computed dt (0.5-1.0). Provides headroom for within-step CFL growth
time_integrator - euler Time integrator: euler, rk2, or rk3
max_steps --max_steps 10000 Maximum iterations (steady) or time steps (unsteady)
T_final - -1.0 Final simulation time (-1 = use max_iter instead)
tol --tol 1e-6 Convergence tolerance for steady-state

Directional CFL: When CFL_xz is set, the x/z directions use CFL_xz while y uses the stricter CFL_max. This is essential for stretched grids where dy_min << dx. See Directional CFL.

Simulation Mode

Parameter CLI Default Description
simulation_mode --simulation_mode steady steady or unsteady
perturbation_amplitude --perturbation_amplitude 0.01 Initial perturbation amplitude for DNS
  • Steady mode: Iterates until $|\mathbf{u}^{n+1} - \mathbf{u}^n|_\infty &lt; \text{tol}$ or max_iter reached
  • Unsteady mode: Runs exactly max_iter time steps (or until T_final)

Numerical Schemes

Parameter CLI Default Description
convective_scheme --scheme central central (2nd-order) or upwind (1st-order, more stable)

Turbulence Model

Parameter CLI Default Description
turb_model --model none Turbulence closure (see table below)
nu_t_max - 1.0 Maximum eddy viscosity (clipping)
nn_preset --nn_preset - NN model preset name (loads from data/models/<NAME>/)
nn_weights_path --weights - Custom NN weights directory
nn_scaling_path --scaling - Custom NN scaling directory

Available turbulence models:

--model value Description
none Laminar (no turbulence model)
baseline Algebraic mixing length with van Driest damping
gep Gene Expression Programming (Weatheritt-Sandberg 2016)
sst SST k-ω transport model (Menter 1994)
komega Standard k-ω (Wilcox 1988)
earsm_wj SST k-ω + Wallin-Johansson EARSM
earsm_gs SST k-ω + Gatski-Speziale EARSM
earsm_pope SST k-ω + Pope quadratic EARSM
nn_mlp Neural network scalar eddy viscosity (requires --nn_preset)
nn_tbnn Tensor Basis NN anisotropy model (requires --nn_preset)

For NN models, you must specify either:

  • --nn_preset NAME (loads from data/models/<NAME>/), or
  • --weights DIR --scaling DIR (explicit paths)

Available presets: tbnn_channel_caseholdout, tbnn_phll_caseholdout, example_tbnn, example_scalar_nut

Trip Forcing (DNS Transition)

Body forcing to trigger laminar-to-turbulent transition in DNS. See docs/DNS_CHANNEL_GUIDE.md for details.

Parameter CLI Default Description
trip_enabled - false Enable trip forcing
trip_amplitude - 3.0 Forcing amplitude (scaled by u_tau^2). 1-5 typical
trip_duration - 2.0 Total duration of trip forcing (physical time, not steps)
trip_ramp_off_start - 1.5 When ramp-off begins (physical time)
trip_x_start - -1.0 Start x-location of trip region (-1 = auto: 0.1*Lx)
trip_x_end - -1.0 End x-location (-1 = auto: 0.2*Lx)
trip_n_modes_z - 8 Number of spanwise Fourier modes
trip_force_w - true Also force w-velocity (creates vortical structures)
trip_w_scale - 1.0 Scale factor for w forcing (>1 boosts 3D structures)

Important: trip_duration and trip_ramp_off_start are in physical simulation time (compared against current_time_), not in friction time units or step counts.

Velocity Filter

Explicit Laplacian filter for DNS stability. See docs/DNS_CHANNEL_GUIDE.md for tuning guide.

Parameter CLI Default Description
filter_strength - 0.0 Filter coefficient (0 = disabled). Range: 0.01-0.05
filter_interval - 10 Apply filter every N steps (0 = disabled)

The filter applies a 3D discrete Laplacian: u_new = u + alpha*(Lx+Lz) + alpha_y*Ly where alpha = strength*0.25 and alpha_y = alpha*0.5. Must be applied before projection step.

Recycling Inflow

Turbulent inflow BC from downstream recycle plane. See docs/RECYCLING_INFLOW_GUIDE.md for the full guide.

Parameter CLI Default Description
recycling_inflow - false Enable recycling inflow at x_lo
recycle_x - -1.0 x-location of recycle plane (-1 = auto: x_min + 10*delta)
recycle_shift_z - -1 Spanwise shift in cells (-1 = auto: Nz/4)
recycle_shift_interval - 100 Steps between shift updates (0 = constant)
recycle_filter_tau - -1.0 AR1 filter timescale (-1 = disabled)
recycle_fringe_length - -1.0 Fringe zone length (-1 = auto: 2*delta)
recycle_target_bulk_u - -1.0 Target bulk velocity (-1 = from initial condition)
recycle_remove_transverse_mean - true Remove mean v,w at inlet
recycle_diag_interval - 0 Recycling diagnostics frequency (0 = disabled)

Performance Modes

Parameter CLI Default Description
perf_mode - false Reduced diagnostics (auto-sets diag_interval=50, poisson_check_interval=5)
gpu_only_mode - false Strict GPU-only (no CPU fallbacks, no full-field host reads)
diag_interval - 1 Expensive diagnostics frequency (set >1 for performance)

Poisson Solver

Parameter CLI Default Description
poisson_solver --poisson auto Solver selection (see table below)
poisson_tol --poisson_tol 1e-6 Legacy absolute tolerance (deprecated)
poisson_max_vcycles --poisson_max_vcycles 20 Maximum V-cycles per solve
poisson_omega - 1.8 SOR relaxation parameter (1 < ω < 2)
poisson_abs_tol_floor --poisson_abs_tol_floor 1e-8 Absolute tolerance floor

Poisson solver options:

--poisson value Description Requirements
auto Auto-select best solver (default)
fft 2D FFT in x-z + tridiagonal in y 3D, periodic x AND z, uniform grid
fft2d 1D FFT in x + tridiagonal in y 2D only (Nz=1), periodic x
fft1d 1D FFT + 2D Helmholtz per mode 3D, periodic x OR z (one only)
hypre HYPRE PFMG GPU-accelerated Requires USE_HYPRE build
mg Native geometric multigrid Always available

Auto-selection priority: FFT → FFT2D → FFT1D → HYPRE → MG

Advanced Multigrid Settings

Parameter CLI Default Description
poisson_tol_abs - 0.0 Absolute tolerance on ‖r‖ (0 = disabled)
poisson_tol_rhs - 1e-3 RHS-relative: ‖r‖/‖b‖ (recommended)
poisson_tol_rel - 1e-3 Initial-residual relative: ‖r‖/‖r₀‖
poisson_check_interval - 3 Check convergence every N V-cycles
poisson_use_l2_norm - true Use L2 norm (smoother than L∞)
poisson_linf_safety - 10.0 L∞ safety cap multiplier
poisson_fixed_cycles - 8 Fixed V-cycle count (0 = convergence-based)
poisson_adaptive_cycles - true Enable adaptive checking in fixed-cycle mode
poisson_check_after - 4 Check residual after this many cycles
poisson_nu1 - 0 Pre-smoothing sweeps (0 = auto: 3 for walls)
poisson_nu2 - 0 Post-smoothing sweeps (0 = auto: 1)
poisson_chebyshev_degree - 4 Chebyshev polynomial degree (3-4 typical)
poisson_use_vcycle_graph - true Enable CUDA Graph for V-cycle (GPU only)

Convergence criteria (any triggers exit):

  • tol_rhs: ‖r‖/‖b‖ < ε (recommended for projection)
  • tol_rel: ‖r‖/‖r₀‖ < ε
  • tol_abs: ‖r‖ < ε

Output

Parameter CLI Default Description
output_dir --output output/ Output directory for VTK files
output_freq - 100 Console output frequency (iterations)
num_snapshots --num_snapshots 10 Number of VTK snapshots during simulation
verbose --verbose true Enable verbose output
postprocess --no_postprocess true Enable Poiseuille table + profile output
write_fields --no_write_fields true Enable VTK/field output

Performance and Diagnostics

Parameter CLI Default Description
warmup_iter --warmup_iter 0 Iterations to run before timing (excluded from benchmarks)
turb_guard_enabled --turb_guard_enabled true Enable NaN/Inf guard checks
turb_guard_interval --turb_guard_interval 5 Check for NaN/Inf every N iterations

Benchmark Mode

The --benchmark flag configures the solver for performance timing with minimal overhead:

# Run benchmark with defaults (192^3 grid, 20 iterations)
./duct --benchmark

# Override grid size
./duct --benchmark --Nx 256 --Ny 256 --Nz 256

# Override iteration count
./duct --benchmark --max_steps 100

Benchmark mode sets these defaults (all can be overridden by subsequent flags):

Setting Value Rationale
Grid size 192 × 192 × 192 Large enough for meaningful timing
Domain 3D duct (periodic x, walls y/z) Representative wall-bounded flow
verbose false No console output
postprocess false No profile analysis
write_fields false No VTK output
num_snapshots 0 No intermediate snapshots
convective_scheme upwind First-order upwind
poisson_fixed_cycles 1 Single V-cycle per time step
turb_model none No turbulence model
max_steps 20 Default iteration count
adaptive_dt false Fixed time step (dt=0.001)

Config File Format

Config files use simple key-value syntax:

# Comment lines start with #
Nx = 128
Ny = 256
Re = 5000
turb_model = sst
adaptive_dt = true

Load a config file with --config FILE. Command-line arguments override config file values.


GPU Acceleration

All solver components support GPU offload via OpenMP target directives.

Build with GPU Support

# NVIDIA GPUs (NVHPC compiler)
CC=nvc CXX=nvc++ cmake .. -DCMAKE_BUILD_TYPE=Release -DUSE_GPU_OFFLOAD=ON

# With HYPRE PFMG (fastest Poisson solver)
CC=nvc CXX=nvc++ cmake .. -DUSE_GPU_OFFLOAD=ON -DUSE_HYPRE=ON

GPU-Accelerated Components

  • Momentum equation (convection, diffusion)
  • Pressure Poisson solver (multigrid V-cycles or HYPRE PFMG)
  • Turbulence transport equations (k, ω)
  • EARSM tensor basis computations
  • Neural network inference
  • Recycling inflow (plane extraction, shift, mass correction, divergence correction, fringe blending)
  • Velocity filter
  • Trip forcing
  • Adaptive dt computation (directional CFL reductions)

CUDA Graph Optimization

On NVIDIA GPUs with NVHPC compiler, the multigrid V-cycle is captured as a CUDA Graph:

  • Eliminates per-kernel launch overhead
  • Single cudaGraphLaunch() replaces O(levels × kernels) launches
  • Automatically recaptured if boundary conditions change
  • Disabled automatically for recycling inflow (BCs change each step) and semi-coarsening
  • Can be disabled via poisson_use_vcycle_graph = false or disable_vcycle_graph() API

GPU-Specific Notes

  • gpu_only_mode: When enabled, avoids CPU fallbacks and full-field host reads for maximum performance. Diagnostics that require CPU-side data are skipped.
  • GPU sync: CPU-side diagnostics (statistics, validation) call sync_solution_from_gpu() internally. Custom diagnostic code must sync manually before reading velocity data on the host.
  • Build with compute capability: For specific GPU architectures, use cmake .. -DGPU_CC=90 (H200) or appropriate value.

Validation

~79 tests across 6 labels, organized into Tier 1 (CI, every push) and Tier 2 (SLURM, manual). See docs/VALIDATION.md for full results and docs/TESTING_GUIDE.md for how to run and extend the suite.

Analytical Benchmarks

Test Case Metric Result
Poiseuille flow L2 error vs analytical < 0.2% (2nd-order convergence confirmed)
Taylor-Green vortex (Re=100) Energy decay vs analytical Matches $E_0 e^{-2\nu t}$
Taylor-Green vortex (Re=1600) Stability through breakdown Stable on 64^3
MMS convergence Spatial order >= 1.8 (2nd-order scheme)

RANS Models (10 closures)

All 10 turbulence models validated for stability, profile shape, and eddy viscosity on stretched grids:

Category Models Status
Algebraic Baseline, GEP Stable, u+ within 30% of MKM DNS
Transport SST k-omega, k-omega Stable (point-implicit destruction fix, March 2026)
EARSM WJ, GS, Pope Stable, frame-invariant
Neural Net MLP, TBNN Infrastructure validated

Physics Conservation

Property Criterion
Divergence-free $|\nabla \cdot \mathbf{u}|_\infty &lt; 10^{-10}$
Momentum balance Body force = wall shear (< 10% imbalance)
Channel symmetry $u(y) = u(-y)$ (machine precision)
D·G = L (stretched grid) $&lt; 10^{-10}$ relative error
Galilean invariance Fluctuating KE matches to $10^{-6}$ across frames

DNS Channel Flow

Test Case Status Re_tau Achieved Reference
Channel Re_tau = 180 Stable (filter-limited) ~255-278 Moser, Kim & Mansour (1999)

GPU Parity

Check Status
CPU/GPU kernel parity All kernels match
GPU utilization gate >= 70% GPU compute time
Cross-backend consistency CPU and GPU outputs within tolerance

Recycling Inflow

Test Status Tolerance
PeriodicVsRecycling Pass < 5% shear stress, < 5% streamwise stress
RecyclingInflow (12 checks) Pass All passing on CPU and GPU

Dataset

Dataset Reference
McConkey et al. Scientific Data 8, 255 (2021)

Training Neural Network Models

Train custom turbulence models on DNS/LES data:

# Setup environment
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Download dataset (~500 MB)
bash scripts/download_mcconkey_data.sh

# Train TBNN model
python scripts/train_tbnn_mcconkey.py \
    --data_dir mcconkey_data \
    --case channel \
    --output data/models/tbnn_channel \
    --epochs 100

# Use in solver
./channel --model nn_tbnn --nn_preset tbnn_channel

Detailed Guides

Guide Description
docs/DNS_CHANNEL_GUIDE.md DNS channel flow: grid requirements, trip forcing, directional CFL, velocity filter, diagnostics, troubleshooting
docs/RECYCLING_INFLOW_GUIDE.md Recycling inflow BC: theory, configuration, GPU implementation, testing
docs/POISSON_SOLVER_GUIDE.md All Poisson solvers: FFT, MG (semi-coarsening, CUDA Graph), HYPRE, selection guide
docs/VALIDATION.md Validation results: all 79 tests, RANS models, DNS, operator correctness, GPU parity
docs/TESTING_GUIDE.md Testing: how to run, test harness API, adding tests, GPU testing, CI architecture
docs/HYPRE_POISSON_SOLVER.md HYPRE PFMG GPU solver details
docs/TRAINING_GUIDE.md Training neural network turbulence models

References

Numerical Methods

  • Chorin, A. J. "Numerical solution of the Navier-Stokes equations." Math. Comput. 22.104 (1968): 745-762
  • Briggs, W. L., Henson, V. E., & McCormick, S. F. A Multigrid Tutorial, 2nd ed. SIAM, 2000

Turbulence Modeling

  • Menter, F. R. "Two-equation eddy-viscosity turbulence models for engineering applications." AIAA J. 32.8 (1994): 1598-1605
  • Wilcox, D. C. "Reassessment of the scale-determining equation for advanced turbulence models." AIAA J. 26.11 (1988): 1299-1310
  • Wallin, S., & Johansson, A. V. "An explicit algebraic Reynolds stress model..." J. Fluid Mech. 403 (2000): 89-132
  • Gatski, T. B., & Speziale, C. G. "On explicit algebraic stress models..." J. Fluid Mech. 254 (1993): 59-78
  • Pope, S. B. "A more general effective-viscosity hypothesis." J. Fluid Mech. 72.2 (1975): 331-340

Neural Network Closures

  • Ling, J., Kurzawski, A., & Templeton, J. "Reynolds averaged turbulence modelling using deep neural networks with embedded invariance." J. Fluid Mech. 807 (2016): 155-166
  • Weatheritt, J., & Sandberg, R. D. "A novel evolutionary algorithm applied to algebraic modifications of the RANS stress-strain relationship." J. Comput. Phys. 325 (2016): 22-37

DNS and Inflow Methods

  • Moser, R. D., Kim, J., & Mansour, N. N. "Direct numerical simulation of turbulent channel flow up to Re_tau = 590." Physics of Fluids 11.4 (1999): 943-945
  • Lund, T. S., Wu, X., & Squires, K. D. "Generation of turbulent inflow data for spatially-developing boundary layer simulations." J. Comput. Phys. 140.2 (1998): 233-258

Dataset

  • McConkey, R., et al. "A curated dataset for data-driven turbulence modelling." Scientific Data 8 (2021): 255

License

MIT License - see license file

About

NN+CFD

Resources

License

Stars

Watchers

Forks

Contributors