A comprehensive benchmarking suite for PrimisAI Nexus framework, focusing on mathematical problem-solving and EDA-aware RTL generation capabilities.
This repository contains benchmark tests and evaluation frameworks for two main domains:
- Uses symbolic computation for solving complex mathematical problems
- Benchmark problems sourced from "Measuring Mathematical Problem Solving With the MATH Dataset" arXiv:2103.03874
- Covers various domains including number theory, algebra, geometry, and probability
- Automated Verilog code generation with verification capabilities
- Uses benchmark suite from "VerilogEval: Evaluating Large Language Models for Verilog Code Generation" arXiv:2309.07544
- Includes automated syntax checking and functional verification
- Python 3.8+
- Icarus Verilog (for RTL evaluation)
- Access to LLM API
- Clone the repository:
git clone https://github.com/PrimisAI/nexus-benchmark.git
cd nexus-benchmark- Install required Python packages:
pip install -r requirements.txt- Set up environment variables:
Create a
.envfile in the root directory with the following:
LLM_API_KEY=your_api_key_here
LLM_MODEL=gpt-4
LLM_BASE_URL=your_api_base_url
Run the mathematics evaluation script:
python evaluations/math/generate_results.pyThis will:
- Process benchmark problems from different mathematical domains
- Generate solutions using the Nexus mathematics workflow
Generated results are available at:
evaluations/math/generated_results/
├── 1_number_theory_227/
├── 2_algebra_2/
├── 3_geometry_1140/
├── 4_intermediate_algebra_24256/
└── 5_counting_and_probability_25780/
- Ensure Icarus Verilog is installed:
# Ubuntu/Debian
sudo apt-get install iverilog
# macOS
brew install icarus-verilog- Run the Verilog evaluation script (with self-verification):
python evaluations/verilog_eval/generate_results.pyGenerated results (with claude-3-5-sonnet-20241022-v2) are available at:
evaluations/verilog_eval/generated_results/results.jsonl
- Alternatively, run without self-verification:
python evaluations/verilog_eval/generate_results_no_sv.pyGenerated results (with claude-3-5-sonnet-20241022-v2) without self-verification are available at:
evaluations/verilog_eval/generated_results/results_no_sv.jsonl
├── benchmarks/ # Test cases and problem sets
├── evaluations/ # Evaluation scripts and results
├── nexus_workflows/ # Core workflow implementations
- Uses SymPy for symbolic mathematics
- Three-agent architecture: Mathematician → Reviewer → Supervisor
- Supports operations: differentiation, integration, simplification, equation solving, etc.
- EDA-aware Verilog code generation
- Four-agent architecture: Coder → Reviewer → Verifier → Supervisor
- Automated syntax checking and functional verification
This case study demonstrates the practical application of Nexus framework in automating FPGA design optimization:
- Objective: Automate Power, Performance, Area (PPA) optimization using Vivado CLI
- Implementation:
- Interactive Streamlit interface for design optimization
- Automated TCL command generation and constraint manipulation
- Real-time parsing of Vivado reports for power, timing, and utilization
- Results: Available in
evaluations/ppa/experiment_chats/
For detailed instructions on setting up and running this case study, please refer to: PPA Optimization Case Study Documentation