Skip to content

Single Benchmark Directions Don't Work #2

@ForbiddenEra

Description

@ForbiddenEra

Dude, c'mon don't just post slop?

Instructions say:

# Start server with speculative decoding
llama-server \
  -m /path/to/target-model.gguf \
  --model-draft /path/to/draft-model.gguf \
  -ngl 99 -c 4096 --port 8080

# In another terminal, run benchmark
python bench.py --url http://127.0.0.1:8080 --requests 5 --max-tokens 512

so I run a server........and......... lol

#$ python3 bench.py --url http://127.0.0.1:8123 --requests 5 --max-tokens 512
usage: bench.py [-h] --base-url BASE_URL --model MODEL [--api-key API_KEY] [--compare-url COMPARE_URL] [--compare-model COMPARE_MODEL] [--compare-api-key COMPARE_API_KEY] [--compare-label COMPARE_LABEL]
                [--label LABEL] [--runs RUNS] [--max-tokens MAX_TOKENS] [--temperature TEMPERATURE] [--prompt PROMPT]
bench.py: error: the following arguments are required: --base-url, --model                                                                                                                                /0.3s
#$ python3 bench.py --base-url http://127.0.0.1:8123 --requests 5 --max-tokens 512
usage: bench.py [-h] --base-url BASE_URL --model MODEL [--api-key API_KEY] [--compare-url COMPARE_URL] [--compare-model COMPARE_MODEL] [--compare-api-key COMPARE_API_KEY] [--compare-label COMPARE_LABEL]
                [--label LABEL] [--runs RUNS] [--max-tokens MAX_TOKENS] [--temperature TEMPERATURE] [--prompt PROMPT]
bench.py: error: the following arguments are required: --model                                                                                                                                            /0.3s
#$ python3 bench.py --base-url http://127.0.0.1:8123 --requests 5 --max-tokens 512 --model /mnt/models/qwen/bartowski/Qwen_Qwen2.5-VL-72B-Instruct-Q8_0-00001-of-00002.gguf
usage: bench.py [-h] --base-url BASE_URL --model MODEL [--api-key API_KEY] [--compare-url COMPARE_URL] [--compare-model COMPARE_MODEL] [--compare-api-key COMPARE_API_KEY] [--compare-label COMPARE_LABEL]
                [--label LABEL] [--runs RUNS] [--max-tokens MAX_TOKENS] [--temperature TEMPERATURE] [--prompt PROMPT]
bench.py: error: unrecognized arguments: --requests 5
#$ python3 bench.py --base-url http://127.0.0.1:8123 --max-tokens 512 --model /mnt/models/qwen/bartowski/Qwen_Qwen2.5-VL-72B-Instruct-Q8_0-00001-of-00002.gguf 
======================================================================
  draftbench - OpenAI-compatible endpoint benchmark
======================================================================
  Endpoint : http://127.0.0.1:8123
  Model    : /mnt/models/qwen/bartowski/Qwen_Qwen2.5-VL-72B-Instruct-Q8_0-00001-of-00002.gguf
  Prompts  : 3
  Runs     : 1
  MaxTok   : 512
  Temp     : 0.0
======================================================================

  [baseline] request 1/3  ERROR: HTTPConnectionPool(host='127.0.0.1', port=8123): Read timed out. (read timeout=120)
  [baseline] request 2/3 ...

Also I need to be able to pass offload splits (eg. -ngl 9 -ts 8,1) so I'm not sure if I can even try the server runner (unless you wanna share some hardware with a bro, I'm stuck trying to split a 12gb 3060 and a 6gb 2060 on a 768gb ram old xeon server or with an m1 studio 64gb)

was really hoping I could try and get a bit more out of my sh!tty server with this tool :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions