Skip to main content
Page 1

Benchmarking Domain Intelligence

Large language models are improving rapidly; to date, this improvement has largely been measured via academic benchmarks. These benchmarks, such as MMLU and...

Beyond the Leaderboard: Unpacking Function Calling Evaluation

1. Introduction The research and engineering community at large have been continuously iterating upon Large Language Models (LLMs) in order to make them...