Releases: lavantien/llm-tournament
Releases · lavantien/llm-tournament
v1.7
LLM Tournament v1.7 Release Notes
Release Date: March 10, 2025
Overview
Version 1.7 introduces a comprehensive tiered scoring system that provides a more nuanced evaluation framework for LLM models. This release features an 11-tier classification system with spiritual names, revised color schemes, and improved score distribution mechanisms for more accurate model ranking.
New Features
Tiered Scoring System
- Implemented an 11-tier classification system with spiritual names (Divine, Legendary, Mythical, etc.)
- Updated the scoring scale to 0-3000+ ELO range
- Introduced color scheme with reversed colors for better visual hierarchy
Mock Data Generation
- Added even distribution functionality to mock results generation
- Updated score weights for random score generation
- Implemented proper random number generator seeding for more consistent testing
- Updated mock scores generation to support the 11-tier system
- Added specific score distribution for tier testing
Model Updates
- Added phi-4-mini to the model roster
Bug Fixes
- Fixed UI flicker and state issues with "Generate Random Mock Scores" button
- Ensured consistent score calculation and display throughout the application
- Fixed sorting consistency in stats chart
- Ensured total score chart sorts in descending order
- Added protection to ensure positive arguments for random number generation
- Fixed client-side handling of sorted data to prevent display inconsistencies
Under the Hood
- Removed unused math import
- Updated tier names to single words for clarity and consistency
- Various code optimizations for better performance
Breaking Changes
None. All changes are backward compatible with existing data.
Upgrade Instructions
Standard update procedure:
- Pull the latest code
- Restart the application
No data migration is required for this update.
v1.6
Changelog - v1.6 (March 2025)
🚀 New Features
- Added screenshots to README showcasing the program
- Enhanced UI with emoji button replacements for improved intuitiveness
- Added "Previous" button in results page to restore previous state after changes
- Improved evaluate page with markdown-formatted prompt and solution display
- Added copy button to capture raw markdown prompt text
- Added Previous/Next navigation buttons for prompt browsing
- Added math functions to template system for advanced calculations
🛠️ Fixes
- Fixed various UI rendering issues in results page:
- Ensured table rows display correctly with improved debug logging
- Corrected template type mismatch in score button coloring
- Added nil check to ModelFilter in results template
- Fixed score buttons that were displaying too large
- Ensured raw markdown is copied instead of processed HTML
- Resolved WebSocket handling issues:
- Improved error logging
- Enhanced connection stability
- Fixed initial table rendering on first load
- Fixed JSON parsing error in "Generate Random Mock Scores" functionality
- Fixed prompt index display in evaluate page
- Added TotalPrompts field to template data for proper rendering
- Optimized button arrangement for better usability
💅 UI Improvements
- Refactored overall UI for improved user experience
- Enhanced results page layout and design
- Center-aligned model and prompt headers for better readability
- Reduced progress bar region width to 20% of screen for more balanced display
📦 Updates
- Updated model definitions and configurations
- Updated Anthropic Claude Thinking 96K pipeline to v0.4
v1.5
v1.4
- modularize the code base
- granular scoring schemes and matching color scheme for cells
- comprehensive stats page with tiered ranking
- random persistent mock scores generator and ensure live update
- fix prompt suite rename issue
- import/export to json instead of csv
- enhance websockets logic and stability
- enhance readme quality
- streamlined the prompt and contestant counts to 20
- support aider configs for o1 high, o3-mini high, v3, r1, 3.7 sonnet, codestral
v1.3
- prompts page now have another filter: by profile
- full set of system prompt in XML for different purposes
- tools - good and lightweight local TTS powered by Kokoro 82M and Onnx
- enhance prompts quality and readme quality
- update contestant list to 33
- update default prompt list to 33