Nexus Benchmark

A comprehensive benchmarking suite for PrimisAI Nexus framework, focusing on mathematical problem-solving and EDA-aware RTL generation capabilities.

Overview

This repository contains benchmark tests and evaluation frameworks for two main domains:

Mathematical Problem Solving

Uses symbolic computation for solving complex mathematical problems
Benchmark problems sourced from "Measuring Mathematical Problem Solving With the MATH Dataset" arXiv:2103.03874
Covers various domains including number theory, algebra, geometry, and probability

EDA-aware RTL Generation

Automated Verilog code generation with verification capabilities
Uses benchmark suite from "VerilogEval: Evaluating Large Language Models for Verilog Code Generation" arXiv:2309.07544
Includes automated syntax checking and functional verification

Prerequisites

Python 3.8+
Icarus Verilog (for RTL evaluation)
Access to LLM API

Installation

Clone the repository:

git clone https://github.com/PrimisAI/nexus-benchmark.git
cd nexus-benchmark

Install required Python packages:

pip install -r requirements.txt

Set up environment variables: Create a .env file in the root directory with the following:

LLM_API_KEY=your_api_key_here
LLM_MODEL=gpt-4
LLM_BASE_URL=your_api_base_url

Running Evaluations

Mathematics Evaluation

Run the mathematics evaluation script:

python evaluations/math/generate_results.py

This will:

Process benchmark problems from different mathematical domains
Generate solutions using the Nexus mathematics workflow

Generated results are available at:

evaluations/math/generated_results/
├── 1_number_theory_227/
├── 2_algebra_2/
├── 3_geometry_1140/
├── 4_intermediate_algebra_24256/
└── 5_counting_and_probability_25780/

Verilog RTL Evaluation

Ensure Icarus Verilog is installed:

# Ubuntu/Debian
sudo apt-get install iverilog
# macOS
brew install icarus-verilog

Run the Verilog evaluation script (with self-verification):

python evaluations/verilog_eval/generate_results.py

Generated results (with claude-3-5-sonnet-20241022-v2) are available at:

evaluations/verilog_eval/generated_results/results.jsonl

Alternatively, run without self-verification:

python evaluations/verilog_eval/generate_results_no_sv.py

Generated results (with claude-3-5-sonnet-20241022-v2) without self-verification are available at:

evaluations/verilog_eval/generated_results/results_no_sv.jsonl

Project Structure

├── benchmarks/          # Test cases and problem sets
├── evaluations/         # Evaluation scripts and results
├── nexus_workflows/     # Core workflow implementations

Workflows

Mathematics Workflow

Uses SymPy for symbolic mathematics
Three-agent architecture: Mathematician → Reviewer → Supervisor
Supports operations: differentiation, integration, simplification, equation solving, etc.

RTL Generation Workflow

EDA-aware Verilog code generation
Four-agent architecture: Coder → Reviewer → Verifier → Supervisor
Automated syntax checking and functional verification

Case Studies

FPGA Design Optimization using Vivado

This case study demonstrates the practical application of Nexus framework in automating FPGA design optimization:

Objective: Automate Power, Performance, Area (PPA) optimization using Vivado CLI
Implementation:
- Interactive Streamlit interface for design optimization
- Automated TCL command generation and constraint manipulation
- Real-time parsing of Vivado reports for power, timing, and utilization
Results: Available in evaluations/ppa/experiment_chats/

For detailed instructions on setting up and running this case study, please refer to: PPA Optimization Case Study Documentation

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
benchmarks		benchmarks
evaluations		evaluations
nexus_workflows		nexus_workflows
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nexus Benchmark

Overview

Mathematical Problem Solving

EDA-aware RTL Generation

Prerequisites

Installation

Running Evaluations

Mathematics Evaluation

Verilog RTL Evaluation

Project Structure

Workflows

Mathematics Workflow

RTL Generation Workflow

Case Studies

FPGA Design Optimization using Vivado

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nexus Benchmark

Overview

Mathematical Problem Solving

EDA-aware RTL Generation

Prerequisites

Installation

Running Evaluations

Mathematics Evaluation

Verilog RTL Evaluation

Project Structure

Workflows

Mathematics Workflow

RTL Generation Workflow

Case Studies

FPGA Design Optimization using Vivado

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages