There’s growing interest in using AI language models to generate text for business applications. Large companies are deploying their own systems while others are leveraging models like OpenAI’s GPT-3 via APIs. According to OpenAI, GPT-3 is now being used in over 300 apps by thousands of developers, producing an average of more than 4.5 billion novel words per day.
But while recent language models are impressively fluent, they have a tendency to write falsehoods ranging from factual inaccuracies to potentially harmful disinformation. To quantify the risks associated with “deceptive” models, researchers at the University of Oxford and OpenAI created a dataset called TruthfulQA that contains questions some humans might answer incorrectly due to false beliefs or misconceptions. The researchers found that while the best-performing model was truthful on 58% of questions, it fell short of human performance at 94%.