Skip to content

PrimisAI/nexus-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nexus Benchmark

A comprehensive benchmarking suite for PrimisAI Nexus framework, focusing on mathematical problem-solving and EDA-aware RTL generation capabilities.

Overview

This repository contains benchmark tests and evaluation frameworks for two main domains:

Mathematical Problem Solving

  • Uses symbolic computation for solving complex mathematical problems
  • Benchmark problems sourced from "Measuring Mathematical Problem Solving With the MATH Dataset" arXiv:2103.03874
  • Covers various domains including number theory, algebra, geometry, and probability

EDA-aware RTL Generation

  • Automated Verilog code generation with verification capabilities
  • Uses benchmark suite from "VerilogEval: Evaluating Large Language Models for Verilog Code Generation" arXiv:2309.07544
  • Includes automated syntax checking and functional verification

Prerequisites

  • Python 3.8+
  • Icarus Verilog (for RTL evaluation)
  • Access to LLM API

Installation

  1. Clone the repository:
git clone https://github.com/PrimisAI/nexus-benchmark.git
cd nexus-benchmark
  1. Install required Python packages:
pip install -r requirements.txt
  1. Set up environment variables: Create a .env file in the root directory with the following:
LLM_API_KEY=your_api_key_here
LLM_MODEL=gpt-4
LLM_BASE_URL=your_api_base_url

Running Evaluations

Mathematics Evaluation

Run the mathematics evaluation script:

python evaluations/math/generate_results.py

This will:

  • Process benchmark problems from different mathematical domains
  • Generate solutions using the Nexus mathematics workflow

Generated results are available at:

evaluations/math/generated_results/
├── 1_number_theory_227/
├── 2_algebra_2/
├── 3_geometry_1140/
├── 4_intermediate_algebra_24256/
└── 5_counting_and_probability_25780/

Verilog RTL Evaluation

  1. Ensure Icarus Verilog is installed:
# Ubuntu/Debian
sudo apt-get install iverilog
# macOS
brew install icarus-verilog
  1. Run the Verilog evaluation script (with self-verification):
python evaluations/verilog_eval/generate_results.py

Generated results (with claude-3-5-sonnet-20241022-v2) are available at:

evaluations/verilog_eval/generated_results/results.jsonl
  1. Alternatively, run without self-verification:
python evaluations/verilog_eval/generate_results_no_sv.py

Generated results (with claude-3-5-sonnet-20241022-v2) without self-verification are available at:

evaluations/verilog_eval/generated_results/results_no_sv.jsonl

Project Structure

├── benchmarks/          # Test cases and problem sets
├── evaluations/         # Evaluation scripts and results
├── nexus_workflows/     # Core workflow implementations

Workflows

Mathematics Workflow

  • Uses SymPy for symbolic mathematics
  • Three-agent architecture: Mathematician → Reviewer → Supervisor
  • Supports operations: differentiation, integration, simplification, equation solving, etc.

RTL Generation Workflow

  • EDA-aware Verilog code generation
  • Four-agent architecture: Coder → Reviewer → Verifier → Supervisor
  • Automated syntax checking and functional verification

Case Studies

FPGA Design Optimization using Vivado

This case study demonstrates the practical application of Nexus framework in automating FPGA design optimization:

  • Objective: Automate Power, Performance, Area (PPA) optimization using Vivado CLI
  • Implementation:
    • Interactive Streamlit interface for design optimization
    • Automated TCL command generation and constraint manipulation
    • Real-time parsing of Vivado reports for power, timing, and utilization
  • Results: Available in evaluations/ppa/experiment_chats/

For detailed instructions on setting up and running this case study, please refer to: PPA Optimization Case Study Documentation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages