Skip to content

Releases: lavantien/llm-tournament

v1.7

09 Mar 17:14
Compare
Choose a tag to compare

LLM Tournament v1.7 Release Notes

Release Date: March 10, 2025

Overview

Version 1.7 introduces a comprehensive tiered scoring system that provides a more nuanced evaluation framework for LLM models. This release features an 11-tier classification system with spiritual names, revised color schemes, and improved score distribution mechanisms for more accurate model ranking.

New Features

Tiered Scoring System

  • Implemented an 11-tier classification system with spiritual names (Divine, Legendary, Mythical, etc.)
  • Updated the scoring scale to 0-3000+ ELO range
  • Introduced color scheme with reversed colors for better visual hierarchy

Mock Data Generation

  • Added even distribution functionality to mock results generation
  • Updated score weights for random score generation
  • Implemented proper random number generator seeding for more consistent testing
  • Updated mock scores generation to support the 11-tier system
  • Added specific score distribution for tier testing

Model Updates

  • Added phi-4-mini to the model roster

Bug Fixes

  • Fixed UI flicker and state issues with "Generate Random Mock Scores" button
  • Ensured consistent score calculation and display throughout the application
  • Fixed sorting consistency in stats chart
  • Ensured total score chart sorts in descending order
  • Added protection to ensure positive arguments for random number generation
  • Fixed client-side handling of sorted data to prevent display inconsistencies

Under the Hood

  • Removed unused math import
  • Updated tier names to single words for clarity and consistency
  • Various code optimizations for better performance

Breaking Changes

None. All changes are backward compatible with existing data.

Upgrade Instructions

Standard update procedure:

  1. Pull the latest code
  2. Restart the application

No data migration is required for this update.

v1.6

06 Mar 19:30
Compare
Choose a tag to compare

Changelog - v1.6 (March 2025)

🚀 New Features

  • Added screenshots to README showcasing the program
  • Enhanced UI with emoji button replacements for improved intuitiveness
  • Added "Previous" button in results page to restore previous state after changes
  • Improved evaluate page with markdown-formatted prompt and solution display
  • Added copy button to capture raw markdown prompt text
  • Added Previous/Next navigation buttons for prompt browsing
  • Added math functions to template system for advanced calculations

🛠️ Fixes

  • Fixed various UI rendering issues in results page:
    • Ensured table rows display correctly with improved debug logging
    • Corrected template type mismatch in score button coloring
    • Added nil check to ModelFilter in results template
    • Fixed score buttons that were displaying too large
    • Ensured raw markdown is copied instead of processed HTML
  • Resolved WebSocket handling issues:
    • Improved error logging
    • Enhanced connection stability
    • Fixed initial table rendering on first load
  • Fixed JSON parsing error in "Generate Random Mock Scores" functionality
  • Fixed prompt index display in evaluate page
  • Added TotalPrompts field to template data for proper rendering
  • Optimized button arrangement for better usability

💅 UI Improvements

  • Refactored overall UI for improved user experience
  • Enhanced results page layout and design
  • Center-aligned model and prompt headers for better readability
  • Reduced progress bar region width to 20% of screen for more balanced display

📦 Updates

  • Updated model definitions and configurations
  • Updated Anthropic Claude Thinking 96K pipeline to v0.4

v1.5

01 Mar 18:46
Compare
Choose a tag to compare
  • add more tools: openwebui/pipes/anthropic-claude-thinking-96k
  • add tools section to readme
  • fix chart overflow bug
  • beautify the UI
  • update screenshots

v1.4

28 Feb 20:12
Compare
Choose a tag to compare
  • modularize the code base
  • granular scoring schemes and matching color scheme for cells
  • comprehensive stats page with tiered ranking
  • random persistent mock scores generator and ensure live update
  • fix prompt suite rename issue
  • import/export to json instead of csv
  • enhance websockets logic and stability
  • enhance readme quality
  • streamlined the prompt and contestant counts to 20
  • support aider configs for o1 high, o3-mini high, v3, r1, 3.7 sonnet, codestral

v1.3

19 Jan 05:30
Compare
Choose a tag to compare
  • prompts page now have another filter: by profile
  • full set of system prompt in XML for different purposes
  • tools - good and lightweight local TTS powered by Kokoro 82M and Onnx
  • enhance prompts quality and readme quality
  • update contestant list to 33
  • update default prompt list to 33

v1.2

18 Jan 21:10
Compare
Choose a tag to compare
  • optimize project structure and simplify the code base
  • change to profile's name will also reflects on related prompts render
    and selection
  • full text search in profile page can now properly handle xml

v1.1

17 Jan 18:43
Compare
Choose a tag to compare
  • added copy button to profile
  • update contestant list to 32
  • update default prompt list to 32

v1.0

15 Jan 08:03
Compare
Choose a tag to compare
  • all functionalities finished
  • performance optimized
  • prepared contestant list (30 models)
  • prepared prompt suite (30 quality prompts)
  • prepared profiles (chain-of-thought + ReAct reasoning system-prompt, pali-vietnamese-translating system-prompt)