NN-CFD: Neural Network Turbulence Closures for Incompressible Flow

A high-performance C++ solver for incompressible turbulent flow with pluggable turbulence closures ranging from classical algebraic models to advanced transport equations and data-driven neural networks. Features a fractional-step projection method with multiple Poisson solvers, pure C++ NN inference, and comprehensive GPU acceleration via OpenMP target offload.

Features

Fractional-step projection method for incompressible Navier-Stokes
- Explicit time integration (Euler, RK2, RK3) with adaptive CFL-based time stepping
- Directional CFL constraints for stretched grids
- Multiple Poisson solvers with automatic selection (FFT, Multigrid, HYPRE)
- Pressure projection for divergence-free velocity
Staggered MAC grid with second-order central finite differences
DNS capability with trip forcing, velocity filter, and turbulence diagnostics
Recycling inflow BC for spatially-developing turbulent flows (Lund et al. 1998)
10 turbulence closures: algebraic, transport, EARSM, and neural network models
Pure C++ NN inference - no Python/TensorFlow at runtime
GPU acceleration via OpenMP target directives for NVIDIA and AMD GPUs

Quick Start

Build the Solver

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j4

Run Examples

# Laminar channel flow (Poiseuille, analytical validation)
./channel --Nx 32 --Ny 64 --nu 0.01 --adaptive_dt --max_steps 10000

# Turbulent channel with SST k-omega
./channel --Nx 64 --Ny 128 --Re 5000 --model sst --adaptive_dt

# Neural network turbulence model
./channel --model nn_tbnn --nn_preset tbnn_channel_caseholdout --adaptive_dt

# 3D Taylor-Green vortex
./taylor_green_3d --Nx 64 --Ny 64 --Nz 64 --Re 100 --max_steps 1000

Governing Equations

The solver implements the incompressible Reynolds-Averaged Navier-Stokes (RANS) equations:

Momentum Equation

$$\frac{\partial \bar{u}_i}{\partial t} + \bar{u}_j \frac{\partial \bar{u}_i}{\partial x_j} = -\frac{1}{\rho} \frac{\partial \bar{p}}{\partial x_i} + \frac{\partial}{\partial x_j}\left[(\nu + \nu_t) \frac{\partial \bar{u}_i}{\partial x_j}\right] + f_i$$

Continuity Equation (Incompressibility)

$$\nabla \cdot \mathbf{u} = 0$$

Variables:

Symbol	Description
$\bar{u}_i$	Mean velocity components (u, v, w)
$\bar{p}$	Mean pressure
$\nu$	Kinematic viscosity
$\nu_t$	Turbulent eddy viscosity (from closure model)
$f_i$	Body force (e.g., pressure gradient driving force)
$\rho$	Density (constant for incompressible flow)

Numerical Methods

Fractional-Step Projection Method

The solver uses a three-step fractional-step method (Chorin 1968) to decouple pressure and velocity:

Step 1: Provisional Velocity (explicit momentum without pressure)

$$\frac{\mathbf{u}^* - \mathbf{u}^n}{\Delta t} = -(\mathbf{u}^n \cdot \nabla)\mathbf{u}^n + \nabla \cdot [(\nu + \nu_t) \nabla \mathbf{u}^n] + \mathbf{f}$$

Step 2: Pressure Poisson Equation (enforce incompressibility)

$$\nabla^2 p' = \frac{1}{\Delta t} \nabla \cdot \mathbf{u}^*$$

Step 3: Velocity Correction (project onto divergence-free space)

$$\mathbf{u}^{n+1} = \mathbf{u}^* - \Delta t \nabla p'$$

This ensures $\nabla \cdot \mathbf{u}^{n+1} = 0$ to machine precision.

Spatial Discretization

All spatial derivatives use second-order central finite differences on a staggered Marker-and-Cell (MAC) grid:

Variable	Grid Location
u-velocity	x-faces (staggered in x)
v-velocity	y-faces (staggered in y)
w-velocity	z-faces (staggered in z)
Pressure, scalars	Cell centers

Gradient (central difference):

$$\left.\frac{\partial u}{\partial x}\right|_{i,j} = \frac{u_{i+1,j} - u_{i-1,j}}{2\Delta x}$$

Laplacian (5-point stencil in 2D, 7-point in 3D):

$$\left.\nabla^2 u\right|_{i,j} = \frac{u_{i+1,j} - 2u_{i,j} + u_{i-1,j}}{\Delta x^2} + \frac{u_{i,j+1} - 2u_{i,j} + u_{i,j-1}}{\Delta y^2}$$

Convective Schemes (selectable via convective_scheme config):

Central differences (default): Second-order accurate, lower numerical dissipation
First-order upwind: More stable at high Reynolds numbers, increased numerical dissipation

Variable Viscosity Diffusion:

$$\nabla \cdot [(\nu + \nu_t) \nabla u] = \frac{1}{\Delta x^2}\left[\nu_{e}(u_{i+1,j} - u_{i,j}) - \nu_{w}(u_{i,j} - u_{i-1,j})\right] + \ldots$$

where $\nu_e, \nu_w$ are face-averaged effective viscosities.

Time Integration

Explicit integrators (Euler, RK2, RK3) with adaptive time stepping.

Directional CFL

For stretched grids (common in wall-bounded DNS), the solver supports separate CFL numbers for different directions to avoid unnecessarily small time steps:

$$\Delta t = \text{dt_safety} \cdot \min\left(\frac{\text{CFL}_{xz} \cdot \Delta x}{|u|_{\max}},; \frac{\text{CFL}_{\max}}{(|v|/\Delta y)_{\max}},; \frac{\text{CFL}_{xz} \cdot \Delta z}{|w|_{\max}},; \Delta t_{\text{diff}}\right)$$

Parameter	Default	Purpose
`CFL_max`	0.5	CFL for y-direction (strict on stretched grids)
`CFL_xz`	-1	CFL for x/z directions (-1 = use CFL_max)
`dt_safety`	1.0	Safety multiplier on computed dt

The y-direction uses CFL_max with per-cell $|v|/\Delta y$ to account for variable grid spacing. For DNS, typical values are CFL_max = 0.15, CFL_xz = 0.30, dt_safety = 0.85.

See docs/DNS_CHANNEL_GUIDE.md for details on directional CFL tuning.

Steady-state convergence: Iterate until $|\mathbf{u}^{n+1} - \mathbf{u}^n|_\infty < \text{tol}$

Boundary Conditions

Velocity Boundary Conditions

Type	Description	Implementation
Periodic	Flow wraps around at boundaries	Ghost cells copy from opposite boundary
No-slip (Wall)	Zero velocity at solid surfaces	$u = v = w = 0$ at wall
Inflow	Prescribed velocity profile	User-defined function callbacks
Outflow	Convective/zero-gradient outflow	Extrapolation from interior
Recycling Inflow	Turbulent inflow from downstream recycle plane	Lund et al. (1998) with fringe blending

Recycling Inflow: Generates realistic turbulent inlet data by extracting, shifting, and blending a downstream velocity plane back to the inlet. Includes mass flux correction and divergence correction for clean pressure solves. See docs/RECYCLING_INFLOW_GUIDE.md for details.

Pressure (Poisson) Boundary Conditions

The pressure Poisson equation supports three BC types:

Type	Description	Formula
Periodic	Pressure wraps around	$p(\text{ghost}) = p(\text{periodic partner})$
Neumann	Zero normal gradient	$\partial p / \partial n = 0 \Rightarrow p(\text{ghost}) = p(\text{interior})$
Dirichlet	Fixed pressure value	$p(\text{ghost}) = 2 p_{\text{bc}} - p(\text{interior})$

Standard BC Configurations:

Configuration	x-direction	y-direction	z-direction	Use Case
`channel()`	Periodic	Neumann	Periodic	Channel flow
`duct()`	Periodic	Neumann	Neumann	Square duct
`cavity()`	Neumann	Neumann	Neumann	Lid-driven cavity
`all_periodic()`	Periodic	Periodic	Periodic	Periodic box

Gauge Fixing

For problems with all Neumann or periodic pressure boundaries (no Dirichlet BC), the pressure is underdetermined up to a constant. The solver automatically:

Detects this condition via has_nullspace() check
Subtracts the mean pressure after each solve to fix the gauge

Poisson Solvers

The solver provides 6 Poisson solver options with automatic selection based on grid configuration and boundary conditions:

Automatic Solver Selection Priority

FFT (3D) → FFT2D (2D) → FFT1D (3D partial-periodic) → HYPRE → Multigrid

Available Solvers

Solver	Complexity	Best For	Requirements
FFT	O(N log N)	3D channel flows	Periodic x AND z, uniform grid
FFT2D	O(N log N)	2D channel flows	2D mesh, periodic x
FFT1D	O(N log N) + 2D solve	3D duct flows	Periodic x OR z (one only)
HYPRE PFMG	O(N)	Stretched grids, GPU	`USE_HYPRE` build flag
Multigrid	O(N)	General fallback, stretched grids	Always available
SOR	O(N²)	Testing/debugging	Always available

Geometric Multigrid (V-Cycle)

The default solver implements a geometric multigrid V-cycle:

Pre-smooth: Apply smoothing iterations on fine grid (Chebyshev or Jacobi)
Restrict: Compute residual and transfer to coarse grid (full weighting)
Recurse: Solve on coarse grid (recursively)
Prolongate: Interpolate correction back to fine grid (bilinear)
Post-smooth: Apply smoothing iterations

Features:

O(N) complexity (optimal)
5-15 V-cycles to convergence (vs 1000-10000 SOR iterations)
Semi-coarsening for stretched y-grids: coarsens x/z only, uses y-line Thomas smoother
PCG coarse solver with breakdown restart and convergence check throttling
CUDA Graph optimization: Entire V-cycle captured as single GPU graph (NVHPC compilers)
Chebyshev polynomial smoother with Gershgorin eigenvalue bounds

See docs/POISSON_SOLVER_GUIDE.md for the full guide including semi-coarsening details.

Convergence Criteria (any triggers exit):

tol_rhs: RHS-relative $|r|/|b| < \epsilon$ (recommended for projection)
tol_rel: Initial-residual relative $|r|/|r_0| < \epsilon$
tol_abs: Absolute $|r|_\infty < \epsilon$

FFT-Based Solvers

For problems with periodic boundaries, FFT solvers provide spectral accuracy:

FFT (3D): 2D FFT in x-z + batched tridiagonal solve in y (cuSPARSE)
FFT2D: 1D FFT in x + batched tridiagonal in y
FFT1D: 1D FFT in periodic direction + 2D Helmholtz solve per mode

HYPRE PFMG

GPU-accelerated parallel semicoarsening multigrid from HYPRE:

Supports uniform AND stretched grids
Entire solve runs on GPU via CUDA backend
Automatic download and build via CMake FetchContent

See docs/POISSON_SOLVER_GUIDE.md for detailed documentation.

Turbulence Closures

The solver supports 10 turbulence closure options:

Summary Table

Model	Type	Equations	Anisotropic	GPU
`none`	Direct	0	N/A	Yes
`baseline`	Algebraic	0	No	Yes
`gep`	Algebraic	0	No	Yes
`komega`	Transport	2 (k, ω)	No	Yes
`sst`	Transport	2 (k, ω)	No	Yes
`earsm_wj`	EARSM	2 (k, ω)	Yes	Yes
`earsm_gs`	EARSM	2 (k, ω)	Yes	Yes
`earsm_pope`	EARSM	2 (k, ω)	Yes	Yes
`nn_mlp`	Neural Net	0	No	Yes
`nn_tbnn`	Neural Net	0	Yes	Yes

Algebraic Models (Zero-Equation)

1. Mixing Length Model (`baseline`)

Classical model with van Driest wall damping:

$$\nu_t = (\kappa y)^2 |\mathbf{S}| \left(1 - e^{-y^+/A^+}\right)^2$$

$\kappa = 0.41$ (von Kármán constant)
$A^+ \approx 26$ (van Driest damping constant)
$|\mathbf{S}| = \sqrt{2S_{ij}S_{ij}}$ (strain rate magnitude)

2. GEP Model (`gep`)

Symbolic regression formula discovered by genetic algorithms (Weatheritt & Sandberg 2016):

$$\nu_t = f_{\text{GEP}}(S_{ij}, \Omega_{ij}, y, \text{Re}_\tau, \ldots)$$

Transport Equation Models (Two-Equation)

3. SST k-ω (`sst`)

Menter's Shear Stress Transport model (1994):

k-equation: $$\frac{\partial k}{\partial t} + \bar{u}_j \frac{\partial k}{\partial x_j} = P_k - \beta^* k \omega + \nabla \cdot [(\nu + \sigma_k \nu_t) \nabla k]$$

ω-equation (with cross-diffusion): $$\frac{\partial \omega}{\partial t} + \bar{u}j \frac{\partial \omega}{\partial x_j} = \alpha \frac{\omega}{k} P_k - \beta \omega^2 + \nabla \cdot [(\nu + \sigma\omega \nu_t) \nabla \omega] + CD_\omega$$

Eddy viscosity: $$\nu_t = \frac{a_1 k}{\max(a_1 \omega, S F_2)}$$

Blending functions F₁, F₂ for k-ε/k-ω transition
Production limiter for numerical stability
Wall boundary conditions: k = 0, ω = ω_wall(y)

4. Standard k-ω (`komega`)

Wilcox (1988) formulation without blending:

$$\nu_t = \frac{k}{\omega}$$

EARSM Models (Explicit Algebraic Reynolds Stress)

EARSM models predict the full Reynolds stress anisotropy tensor using a tensor basis expansion:

$$b_{ij} = \sum_{n=1}^{N} G_n(\eta, \xi) , T_{ij}^{(n)}(\mathbf{S}, \mathbf{\Omega})$$

where:

$b_{ij}$ = anisotropy tensor (traceless)
$T_{ij}^{(n)}$ = integrity basis tensors
$G_n$ = scalar coefficient functions
$\eta = Sk/\epsilon$, $\xi = \Omega k/\epsilon$ = normalized invariants

Combined with SST k-ω transport for k and ω evolution.

5. Wallin-Johansson EARSM (`earsm_wj`)

Most sophisticated variant with cubic implicit equation for realizability.

6. Gatski-Speziale EARSM (`earsm_gs`)

Quadratic model without implicit solve.

7. Pope Quadratic EARSM (`earsm_pope`)

Classical weak-equilibrium model using first 3 basis tensors.

Re_t-Based Blending:

EARSM models use smooth blending between linear Boussinesq (laminar) and full nonlinear (turbulent):

$$\alpha(\text{Re}_t) = \frac{1}{2}\left(1 + \tanh\left(\frac{\text{Re}_t - \text{Re}_{t,\text{center}}}{\text{Re}_{t,\text{width}}}\right)\right)$$

where $\text{Re}_t = k/(\nu\omega)$. Default transition: center at Re_t = 10, width = 5.

Neural Network Models

8. MLP (`nn_mlp`)

Multi-layer perceptron for scalar eddy viscosity:

$$\nu_t = \text{NN}_{\text{MLP}}(\lambda_1, \ldots, \lambda_5, y/\delta)$$

Inputs (invariants of strain and rotation tensors):

$\lambda_1 = S_{ij}S_{ij}$, $\lambda_2 = \Omega_{ij}\Omega_{ij}$, $\lambda_3 = S_{ij}S_{jk}S_{ki}$, ...
$y/\delta$ = normalized wall distance

Architecture: 6 → 32 → 32 → 1 (ReLU activations)

9. TBNN (`nn_tbnn`)

Tensor Basis Neural Network (Ling et al. 2016) for anisotropic Reynolds stresses:

$$b_{ij} = \sum_{n=1}^{10} g_n(\lambda_1, \ldots, \lambda_5) , T_{ij}^{(n)}$$

Architecture: 5 → 64 → 64 → 64 → 10 (outputs one coefficient per basis tensor)

Key Properties:

Frame invariance: Guaranteed by using invariant inputs + tensor basis
Realizability: Enforced during training
Anisotropy: Captures different normal stresses and off-diagonal components

Supported Flow Configurations

2D Channel Flow

Pressure-driven flow between parallel plates.

Configuration	BCs	Use Case
Poiseuille (laminar)	Periodic x, walls y	Analytical validation
Turbulent RANS	Periodic x, walls y	Model comparison

Analytical solution (Poiseuille): $$u(y) = -\frac{1}{2\nu}\frac{dp}{dx}(H^2/4 - y^2)$$

3D Square Duct Flow

Pressure-driven flow in square cross-section.

Configuration	BCs	Use Case
Laminar duct	Periodic x, walls y and z	3D solver validation
Turbulent duct	Periodic x, walls y and z	Secondary flow study

3D DNS Channel Flow

Direct Numerical Simulation of turbulent channel flow without any turbulence model.

Configuration	BCs	Use Case
DNS Re_tau=180	Periodic x/z, walls y	Turbulence benchmark (MKM 1999)

Requires trip forcing for transition, velocity filter for stability, directional CFL for stretched grids. See docs/DNS_CHANNEL_GUIDE.md.

Spatially-Developing Channel (Recycling Inflow)

Channel flow with turbulent inflow generated by recycling from a downstream plane.

Configuration	BCs	Use Case
Recycling inflow	Inflow x_lo, outflow x_hi, walls y, periodic z	Spatially-developing turbulence

See docs/RECYCLING_INFLOW_GUIDE.md.

3D Taylor-Green Vortex

Classic benchmark for unsteady flow and energy decay.

Configuration	BCs	Use Case
Taylor-Green	All periodic	DNS verification, energy decay

Initial condition: $$u = \sin(x)\cos(y)\cos(z), \quad v = -\cos(x)\sin(y)\cos(z), \quad w = 0$$

Energy decay (low Re): $$KE(t) = KE(0) \cdot e^{-2\nu t}$$

Configuration Reference

All parameters can be set via command-line arguments (--param value) or config file (key-value pairs). Command-line arguments override config file values.

Domain and Mesh

Parameter	CLI	Default	Description
`Nx`	`--Nx`	64	Grid cells in x-direction
`Ny`	`--Ny`	64	Grid cells in y-direction
`Nz`	`--Nz`	1	Grid cells in z-direction (1 = 2D simulation)
`x_min`	-	0.0	Domain minimum in x
`x_max`	-	2π	Domain maximum in x
`y_min`	-	-1.0	Domain minimum in y
`y_max`	-	1.0	Domain maximum in y
`z_min`	`--z_min`	0.0	Domain minimum in z
`z_max`	`--z_max`	1.0	Domain maximum in z
`stretch_y`	`--stretch`	false	Enable tanh stretching in y (clusters points near walls)
`stretch_beta`	-	2.0	Y-stretching parameter (higher = more clustering)
`stretch_z`	`--stretch_z`	false	Enable tanh stretching in z (3D only)
`stretch_beta_z`	`--stretch_beta_z`	2.0	Z-stretching parameter

Physics Parameters

Parameter	CLI	Default	Description
`Re`	`--Re`	1000.0	Reynolds number
`nu`	`--nu`	0.001	Kinematic viscosity
`dp_dx`	`--dp_dx`	-1.0	Pressure gradient (body force driving flow)
`rho`	-	1.0	Density (constant for incompressible)

Auto-Computation of Physics Parameters

The solver uses the relationship: $\text{Re} = \frac{-dp/dx \cdot \delta^3}{3\nu^2}$ where $\delta$ is the channel half-height.

You should specify only TWO of (Re, nu, dp_dx). The third is computed automatically:

Specified	Computed	Use Case
`--Re` only	nu (using default dp_dx=-1)	Quick setup at desired Re
`--Re --nu`	dp_dx	Control both Re and viscosity
`--Re --dp_dx`	nu	Control Re and driving force
`--nu --dp_dx`	Re	Specify physical parameters directly
None	Re (from defaults)	Uses nu=0.001, dp_dx=-1.0 → Re≈1000

If all three are specified, the solver checks consistency and errors if they don't match (within 1% tolerance).

Time Stepping

Parameter	CLI	Default	Description
`dt`	`--dt`	0.001	Time step size (when not using adaptive)
`adaptive_dt`	`--adaptive_dt`	true	Enable CFL-based adaptive time stepping
`CFL_max`	`--CFL`	0.5	Maximum CFL number for adaptive dt (used for y-direction)
`CFL_xz`	-	-1.0	CFL for x/z directions (-1 = use CFL_max). Set higher than CFL_max for stretched grids
`dt_safety`	-	1.0	Safety multiplier on computed dt (0.5-1.0). Provides headroom for within-step CFL growth
`time_integrator`	-	`euler`	Time integrator: `euler`, `rk2`, or `rk3`
`max_steps`	`--max_steps`	10000	Maximum iterations (steady) or time steps (unsteady)
`T_final`	-	-1.0	Final simulation time (-1 = use max_iter instead)
`tol`	`--tol`	1e-6	Convergence tolerance for steady-state

Directional CFL: When CFL_xz is set, the x/z directions use CFL_xz while y uses the stricter CFL_max. This is essential for stretched grids where dy_min << dx. See Directional CFL.

Simulation Mode

Parameter	CLI	Default	Description
`simulation_mode`	`--simulation_mode`	`steady`	`steady` or `unsteady`
`perturbation_amplitude`	`--perturbation_amplitude`	0.01	Initial perturbation amplitude for DNS

Steady mode: Iterates until $|\mathbf{u}^{n+1} - \mathbf{u}^n|_\infty < \text{tol}$ or max_iter reached
Unsteady mode: Runs exactly max_iter time steps (or until T_final)

Numerical Schemes

Parameter	CLI	Default	Description
`convective_scheme`	`--scheme`	`central`	`central` (2nd-order) or `upwind` (1st-order, more stable)

Turbulence Model

Parameter	CLI	Default	Description
`turb_model`	`--model`	`none`	Turbulence closure (see table below)
`nu_t_max`	-	1.0	Maximum eddy viscosity (clipping)
`nn_preset`	`--nn_preset`	-	NN model preset name (loads from `data/models/<NAME>/`)
`nn_weights_path`	`--weights`	-	Custom NN weights directory
`nn_scaling_path`	`--scaling`	-	Custom NN scaling directory

Available turbulence models:

`--model` value	Description
`none`	Laminar (no turbulence model)
`baseline`	Algebraic mixing length with van Driest damping
`gep`	Gene Expression Programming (Weatheritt-Sandberg 2016)
`sst`	SST k-ω transport model (Menter 1994)
`komega`	Standard k-ω (Wilcox 1988)
`earsm_wj`	SST k-ω + Wallin-Johansson EARSM
`earsm_gs`	SST k-ω + Gatski-Speziale EARSM
`earsm_pope`	SST k-ω + Pope quadratic EARSM
`nn_mlp`	Neural network scalar eddy viscosity (requires `--nn_preset`)
`nn_tbnn`	Tensor Basis NN anisotropy model (requires `--nn_preset`)

For NN models, you must specify either:

--nn_preset NAME (loads from data/models/<NAME>/), or
--weights DIR --scaling DIR (explicit paths)

Available presets: tbnn_channel_caseholdout, tbnn_phll_caseholdout, example_tbnn, example_scalar_nut

Trip Forcing (DNS Transition)

Body forcing to trigger laminar-to-turbulent transition in DNS. See docs/DNS_CHANNEL_GUIDE.md for details.

Parameter	CLI	Default	Description
`trip_enabled`	-	false	Enable trip forcing
`trip_amplitude`	-	3.0	Forcing amplitude (scaled by u_tau^2). 1-5 typical
`trip_duration`	-	2.0	Total duration of trip forcing (physical time, not steps)
`trip_ramp_off_start`	-	1.5	When ramp-off begins (physical time)
`trip_x_start`	-	-1.0	Start x-location of trip region (-1 = auto: 0.1*Lx)
`trip_x_end`	-	-1.0	End x-location (-1 = auto: 0.2*Lx)
`trip_n_modes_z`	-	8	Number of spanwise Fourier modes
`trip_force_w`	-	true	Also force w-velocity (creates vortical structures)
`trip_w_scale`	-	1.0	Scale factor for w forcing (>1 boosts 3D structures)

Important: trip_duration and trip_ramp_off_start are in physical simulation time (compared against current_time_), not in friction time units or step counts.

Velocity Filter

Explicit Laplacian filter for DNS stability. See docs/DNS_CHANNEL_GUIDE.md for tuning guide.

Parameter	CLI	Default	Description
`filter_strength`	-	0.0	Filter coefficient (0 = disabled). Range: 0.01-0.05
`filter_interval`	-	10	Apply filter every N steps (0 = disabled)

The filter applies a 3D discrete Laplacian: u_new = u + alpha*(Lx+Lz) + alpha_y*Ly where alpha = strength*0.25 and alpha_y = alpha*0.5. Must be applied before projection step.

Recycling Inflow

Turbulent inflow BC from downstream recycle plane. See docs/RECYCLING_INFLOW_GUIDE.md for the full guide.

Parameter	CLI	Default	Description
`recycling_inflow`	-	false	Enable recycling inflow at x_lo
`recycle_x`	-	-1.0	x-location of recycle plane (-1 = auto: x_min + 10*delta)
`recycle_shift_z`	-	-1	Spanwise shift in cells (-1 = auto: Nz/4)
`recycle_shift_interval`	-	100	Steps between shift updates (0 = constant)
`recycle_filter_tau`	-	-1.0	AR1 filter timescale (-1 = disabled)
`recycle_fringe_length`	-	-1.0	Fringe zone length (-1 = auto: 2*delta)
`recycle_target_bulk_u`	-	-1.0	Target bulk velocity (-1 = from initial condition)
`recycle_remove_transverse_mean`	-	true	Remove mean v,w at inlet
`recycle_diag_interval`	-	0	Recycling diagnostics frequency (0 = disabled)

Performance Modes

Parameter	CLI	Default	Description
`perf_mode`	-	false	Reduced diagnostics (auto-sets diag_interval=50, poisson_check_interval=5)
`gpu_only_mode`	-	false	Strict GPU-only (no CPU fallbacks, no full-field host reads)
`diag_interval`	-	1	Expensive diagnostics frequency (set >1 for performance)

Poisson Solver

Parameter	CLI	Default	Description
`poisson_solver`	`--poisson`	`auto`	Solver selection (see table below)
`poisson_tol`	`--poisson_tol`	1e-6	Legacy absolute tolerance (deprecated)
`poisson_max_vcycles`	`--poisson_max_vcycles`	20	Maximum V-cycles per solve
`poisson_omega`	-	1.8	SOR relaxation parameter (1 < ω < 2)
`poisson_abs_tol_floor`	`--poisson_abs_tol_floor`	1e-8	Absolute tolerance floor

Poisson solver options:

`--poisson` value	Description	Requirements
`auto`	Auto-select best solver	(default)
`fft`	2D FFT in x-z + tridiagonal in y	3D, periodic x AND z, uniform grid
`fft2d`	1D FFT in x + tridiagonal in y	2D only (Nz=1), periodic x
`fft1d`	1D FFT + 2D Helmholtz per mode	3D, periodic x OR z (one only)
`hypre`	HYPRE PFMG GPU-accelerated	Requires `USE_HYPRE` build
`mg`	Native geometric multigrid	Always available

Auto-selection priority: FFT → FFT2D → FFT1D → HYPRE → MG

Advanced Multigrid Settings

Parameter	CLI	Default	Description
`poisson_tol_abs`	-	0.0	Absolute tolerance on ‖r‖ (0 = disabled)
`poisson_tol_rhs`	-	1e-3	RHS-relative: ‖r‖/‖b‖ (recommended)
`poisson_tol_rel`	-	1e-3	Initial-residual relative: ‖r‖/‖r₀‖
`poisson_check_interval`	-	3	Check convergence every N V-cycles
`poisson_use_l2_norm`	-	true	Use L2 norm (smoother than L∞)
`poisson_linf_safety`	-	10.0	L∞ safety cap multiplier
`poisson_fixed_cycles`	-	8	Fixed V-cycle count (0 = convergence-based)
`poisson_adaptive_cycles`	-	true	Enable adaptive checking in fixed-cycle mode
`poisson_check_after`	-	4	Check residual after this many cycles
`poisson_nu1`	-	0	Pre-smoothing sweeps (0 = auto: 3 for walls)
`poisson_nu2`	-	0	Post-smoothing sweeps (0 = auto: 1)
`poisson_chebyshev_degree`	-	4	Chebyshev polynomial degree (3-4 typical)
`poisson_use_vcycle_graph`	-	true	Enable CUDA Graph for V-cycle (GPU only)

Convergence criteria (any triggers exit):

tol_rhs: ‖r‖/‖b‖ < ε (recommended for projection)
tol_rel: ‖r‖/‖r₀‖ < ε
tol_abs: ‖r‖ < ε

Output

Parameter	CLI	Default	Description
`output_dir`	`--output`	`output/`	Output directory for VTK files
`output_freq`	-	100	Console output frequency (iterations)
`num_snapshots`	`--num_snapshots`	10	Number of VTK snapshots during simulation
`verbose`	`--verbose`	true	Enable verbose output
`postprocess`	`--no_postprocess`	true	Enable Poiseuille table + profile output
`write_fields`	`--no_write_fields`	true	Enable VTK/field output

Performance and Diagnostics

Parameter	CLI	Default	Description
`warmup_iter`	`--warmup_iter`	0	Iterations to run before timing (excluded from benchmarks)
`turb_guard_enabled`	`--turb_guard_enabled`	true	Enable NaN/Inf guard checks
`turb_guard_interval`	`--turb_guard_interval`	5	Check for NaN/Inf every N iterations

Benchmark Mode

The --benchmark flag configures the solver for performance timing with minimal overhead:

# Run benchmark with defaults (192^3 grid, 20 iterations)
./duct --benchmark

# Override grid size
./duct --benchmark --Nx 256 --Ny 256 --Nz 256

# Override iteration count
./duct --benchmark --max_steps 100

Benchmark mode sets these defaults (all can be overridden by subsequent flags):

Setting	Value	Rationale
Grid size	192 × 192 × 192	Large enough for meaningful timing
Domain	3D duct (periodic x, walls y/z)	Representative wall-bounded flow
`verbose`	false	No console output
`postprocess`	false	No profile analysis
`write_fields`	false	No VTK output
`num_snapshots`	0	No intermediate snapshots
`convective_scheme`	upwind	First-order upwind
`poisson_fixed_cycles`	1	Single V-cycle per time step
`turb_model`	none	No turbulence model
`max_steps`	20	Default iteration count
`adaptive_dt`	false	Fixed time step (dt=0.001)

Config File Format

Config files use simple key-value syntax:

# Comment lines start with #
Nx = 128
Ny = 256
Re = 5000
turb_model = sst
adaptive_dt = true

Load a config file with --config FILE. Command-line arguments override config file values.

GPU Acceleration

All solver components support GPU offload via OpenMP target directives.

Build with GPU Support

# NVIDIA GPUs (NVHPC compiler)
CC=nvc CXX=nvc++ cmake .. -DCMAKE_BUILD_TYPE=Release -DUSE_GPU_OFFLOAD=ON

# With HYPRE PFMG (fastest Poisson solver)
CC=nvc CXX=nvc++ cmake .. -DUSE_GPU_OFFLOAD=ON -DUSE_HYPRE=ON

GPU-Accelerated Components

Momentum equation (convection, diffusion)
Pressure Poisson solver (multigrid V-cycles or HYPRE PFMG)
Turbulence transport equations (k, ω)
EARSM tensor basis computations
Neural network inference
Recycling inflow (plane extraction, shift, mass correction, divergence correction, fringe blending)
Velocity filter
Trip forcing
Adaptive dt computation (directional CFL reductions)

CUDA Graph Optimization

On NVIDIA GPUs with NVHPC compiler, the multigrid V-cycle is captured as a CUDA Graph:

Eliminates per-kernel launch overhead
Single cudaGraphLaunch() replaces O(levels × kernels) launches
Automatically recaptured if boundary conditions change
Disabled automatically for recycling inflow (BCs change each step) and semi-coarsening
Can be disabled via poisson_use_vcycle_graph = false or disable_vcycle_graph() API

GPU-Specific Notes

gpu_only_mode: When enabled, avoids CPU fallbacks and full-field host reads for maximum performance. Diagnostics that require CPU-side data are skipped.
GPU sync: CPU-side diagnostics (statistics, validation) call sync_solution_from_gpu() internally. Custom diagnostic code must sync manually before reading velocity data on the host.
Build with compute capability: For specific GPU architectures, use cmake .. -DGPU_CC=90 (H200) or appropriate value.

Validation

~79 tests across 6 labels, organized into Tier 1 (CI, every push) and Tier 2 (SLURM, manual). See docs/VALIDATION.md for full results and docs/TESTING_GUIDE.md for how to run and extend the suite.

Analytical Benchmarks

Test Case	Metric	Result
Poiseuille flow	L2 error vs analytical	< 0.2% (2nd-order convergence confirmed)
Taylor-Green vortex (Re=100)	Energy decay vs analytical	Matches $E_0 e^{-2\nu t}$
Taylor-Green vortex (Re=1600)	Stability through breakdown	Stable on 64^3
MMS convergence	Spatial order	>= 1.8 (2nd-order scheme)

RANS Models (10 closures)

All 10 turbulence models validated for stability, profile shape, and eddy viscosity on stretched grids:

Category	Models	Status
Algebraic	Baseline, GEP	Stable, u+ within 30% of MKM DNS
Transport	SST k-omega, k-omega	Stable (point-implicit destruction fix, March 2026)
EARSM	WJ, GS, Pope	Stable, frame-invariant
Neural Net	MLP, TBNN	Infrastructure validated

Physics Conservation

Property	Criterion
Divergence-free	$\|\nabla \cdot \mathbf{u}\|_\infty < 10^{-10}$
Momentum balance	Body force = wall shear (< 10% imbalance)
Channel symmetry	$u(y) = u(-y)$ (machine precision)
D·G = L (stretched grid)	$< 10^{-10}$ relative error
Galilean invariance	Fluctuating KE matches to $10^{-6}$ across frames

DNS Channel Flow

Test Case	Status	Re_tau Achieved	Reference
Channel Re_tau = 180	Stable (filter-limited)	~255-278	Moser, Kim & Mansour (1999)

GPU Parity

Check	Status
CPU/GPU kernel parity	All kernels match
GPU utilization gate	>= 70% GPU compute time
Cross-backend consistency	CPU and GPU outputs within tolerance

Recycling Inflow

Test	Status	Tolerance
PeriodicVsRecycling	Pass	< 5% shear stress, < 5% streamwise stress
RecyclingInflow (12 checks)	Pass	All passing on CPU and GPU

Dataset

Dataset	Reference
McConkey et al.	Scientific Data 8, 255 (2021)

Training Neural Network Models

Train custom turbulence models on DNS/LES data:

# Setup environment
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Download dataset (~500 MB)
bash scripts/download_mcconkey_data.sh

# Train TBNN model
python scripts/train_tbnn_mcconkey.py \
    --data_dir mcconkey_data \
    --case channel \
    --output data/models/tbnn_channel \
    --epochs 100

# Use in solver
./channel --model nn_tbnn --nn_preset tbnn_channel

Detailed Guides

Guide	Description
`docs/DNS_CHANNEL_GUIDE.md`	DNS channel flow: grid requirements, trip forcing, directional CFL, velocity filter, diagnostics, troubleshooting
`docs/RECYCLING_INFLOW_GUIDE.md`	Recycling inflow BC: theory, configuration, GPU implementation, testing
`docs/POISSON_SOLVER_GUIDE.md`	All Poisson solvers: FFT, MG (semi-coarsening, CUDA Graph), HYPRE, selection guide
`docs/VALIDATION.md`	Validation results: all 79 tests, RANS models, DNS, operator correctness, GPU parity
`docs/TESTING_GUIDE.md`	Testing: how to run, test harness API, adding tests, GPU testing, CI architecture
`docs/HYPRE_POISSON_SOLVER.md`	HYPRE PFMG GPU solver details
`docs/TRAINING_GUIDE.md`	Training neural network turbulence models

References

Numerical Methods

Chorin, A. J. "Numerical solution of the Navier-Stokes equations." Math. Comput. 22.104 (1968): 745-762
Briggs, W. L., Henson, V. E., & McCormick, S. F. A Multigrid Tutorial, 2nd ed. SIAM, 2000

Turbulence Modeling

Menter, F. R. "Two-equation eddy-viscosity turbulence models for engineering applications." AIAA J. 32.8 (1994): 1598-1605
Wilcox, D. C. "Reassessment of the scale-determining equation for advanced turbulence models." AIAA J. 26.11 (1988): 1299-1310
Wallin, S., & Johansson, A. V. "An explicit algebraic Reynolds stress model..." J. Fluid Mech. 403 (2000): 89-132
Gatski, T. B., & Speziale, C. G. "On explicit algebraic stress models..." J. Fluid Mech. 254 (1993): 59-78
Pope, S. B. "A more general effective-viscosity hypothesis." J. Fluid Mech. 72.2 (1975): 331-340

Neural Network Closures

Ling, J., Kurzawski, A., & Templeton, J. "Reynolds averaged turbulence modelling using deep neural networks with embedded invariance." J. Fluid Mech. 807 (2016): 155-166
Weatheritt, J., & Sandberg, R. D. "A novel evolutionary algorithm applied to algebraic modifications of the RANS stress-strain relationship." J. Comput. Phys. 325 (2016): 22-37

DNS and Inflow Methods

Moser, R. D., Kim, J., & Mansour, N. N. "Direct numerical simulation of turbulent channel flow up to Re_tau = 590." Physics of Fluids 11.4 (1999): 943-945
Lund, T. S., Wu, X., & Squires, K. D. "Generation of turbulent inflow data for spatially-developing boundary layer simulations." J. Comput. Phys. 140.2 (1998): 233-258

Dataset

McConkey, R., et al. "A curated dataset for data-driven turbulence modelling." Scientific Data 8 (2021): 255

License

MIT License - see license file

Name		Name	Last commit message	Last commit date
Latest commit History 837 Commits
.github		.github
app		app
data		data
docs		docs
examples		examples
include		include
scripts		scripts
src		src
tests		tests
.clangd		.clangd
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
README.md		README.md
activate_venv.sh		activate_venv.sh
license		license
requirements.txt		requirements.txt
run.sh		run.sh

Folders and files

Latest commit

History

Repository files navigation

NN-CFD: Neural Network Turbulence Closures for Incompressible Flow

Features

Table of Contents

Quick Start

Build the Solver

Run Examples

Governing Equations

Momentum Equation

Continuity Equation (Incompressibility)

Numerical Methods

Fractional-Step Projection Method

Spatial Discretization

Time Integration

Directional CFL

Boundary Conditions

Velocity Boundary Conditions

Pressure (Poisson) Boundary Conditions

Gauge Fixing

Poisson Solvers

Automatic Solver Selection Priority

Available Solvers

Geometric Multigrid (V-Cycle)

FFT-Based Solvers

HYPRE PFMG

Turbulence Closures

Summary Table

Algebraic Models (Zero-Equation)

1. Mixing Length Model (baseline)

2. GEP Model (gep)

Transport Equation Models (Two-Equation)

3. SST k-ω (sst)

4. Standard k-ω (komega)

EARSM Models (Explicit Algebraic Reynolds Stress)

5. Wallin-Johansson EARSM (earsm_wj)

6. Gatski-Speziale EARSM (earsm_gs)

7. Pope Quadratic EARSM (earsm_pope)

Neural Network Models

8. MLP (nn_mlp)

9. TBNN (nn_tbnn)

Supported Flow Configurations

2D Channel Flow

3D Square Duct Flow

3D DNS Channel Flow

Spatially-Developing Channel (Recycling Inflow)

3D Taylor-Green Vortex

Configuration Reference

Domain and Mesh

Physics Parameters

Auto-Computation of Physics Parameters

Time Stepping

Simulation Mode

Numerical Schemes

Turbulence Model

Trip Forcing (DNS Transition)

Velocity Filter

Recycling Inflow

Performance Modes

Poisson Solver

Advanced Multigrid Settings

Output

Performance and Diagnostics

Benchmark Mode

Config File Format

GPU Acceleration

Build with GPU Support

GPU-Accelerated Components

CUDA Graph Optimization

GPU-Specific Notes

Validation

Analytical Benchmarks

RANS Models (10 closures)

Physics Conservation

DNS Channel Flow

GPU Parity

Recycling Inflow

Dataset

1. Mixing Length Model (`baseline`)

2. GEP Model (`gep`)

3. SST k-ω (`sst`)

4. Standard k-ω (`komega`)

5. Wallin-Johansson EARSM (`earsm_wj`)

6. Gatski-Speziale EARSM (`earsm_gs`)

7. Pope Quadratic EARSM (`earsm_pope`)

8. MLP (`nn_mlp`)

9. TBNN (`nn_tbnn`)