A Framework for the Evaluation of Large Language Models | doi.page