🔍 LLM Test Results
Sherlock Holmes benchmark across frontier models
— pages
— providers