Human Judgment Language Models – Information on AI & Technology Archive
AI can predict the probability of events, but it cannot make decisions based on personal preferences. For example, AI can predict the chance of rain, but it cannot decide whether or not you should carry an umbrella. This decision depends on personal preferences and judgment. AI excels at prediction but lacks judgment. It can provide information, but ultimately, decisions are made by individuals who apply their own judgment based on their preferences. For instance, while AI might provide a prediction about an employee’s performance, it is the employer who applies judgment and decides whom to fire.
In their 2018 book “Prediction Machines,” the authors saw a role for reward function engineers to determine rewards for actions based on AI predictions. These engineers would provide a skilled complement to AI adoption. However, innovation in reward function engineering has been slow, with little progress in developing tools to codify human judgment into machines. Recently, large language models have transformed the way AI assists in decision-making by changing the way humans provide judgment. Despite their intelligence, LLMs are still just prediction machines.
When asked to rewrite a paragraph for a certain audience, ChatGPT produces a paragraph without giving options or a lecture on grammar and rhetoric. It’s impressive that ChatGPT can write the desired paragraph, considering the reward and risk issues involved. The writing must be honest, harmless, and helpful. ChatGPT is trained on existing writing and produces paragraphs using a type of “autocomplete.” Despite this, it produces written outcomes better than the average person. How does ChatGPT judge quality from the content it was trained on? Some believe that large language models uncover fundamental rules of grammar, making the writing readable but not necessarily clear and compelling.
A 2022 paper from OpenAI researchers describes how raw large language models were used to offer outputs to real people, who ranked several alternative outcomes to the same prompt based on criteria such as helpfulness, honesty, and harmlessness. With clear instructions and training, different people could readily agree on these rankings. These rankings were then used to fine-tune the algorithm, with the model learning human judgment and adjusting based on positive and negative reinforcement. With just a few thousand examples of human judgment in the form of ranked responses, the AI began producing highly ranked outcomes for all queries, even those far from the ones evaluated. Human judgment on writing quality spread throughout the model.
The evaluators of LLMs were effectively reward function engineers. Unlike with statistical models, large language models interact in plain language, allowing anyone to help teach the model judgment. With little effort, reward function engineers were able to train LLMs to be useful and safe. This allowed OpenAI to launch a consumer-facing model that did not suffer from the flaws of its predecessors. The simple method of codifying human judgment into machines supercharged AI performance, imbuing the machine with the ability to predict word sequences and apply human judgment to increase their appeal.
The discovery of an easy method for machines to apply human judgment made all the difference. Specialized reward function engineers will be needed to deploy AI prediction machines at scale for many decisions. The discovery of an intuitive approach for codifying human judgment into machines – fine-tuning via reinforcement learning from human feedback – may unlock valuable AI applications where human judgment is difficult to codify in advance but easy to implement when seen.