A Grounded Assessment of the Generative A.I. Explosion | Icelandic Institute for Intelligent Machines

by Helgi Páll Helgason

We now live in a world where generative AI can conjure photorealistic images of pretty much anything we can think of with results that are often indistinguishable from the real thing (this comes with its own set of problems but that’s a topic for another time). Then we have highly potent Large Language Models (LLMs) that can service very complex requests phrased in natural language, OpenAI’s GPT-4 reigning supreme at the moment. Consider that you can make absurd requests, such as…

The image was generated with Midjourney 5.1. The prompt used was simply “a man looking at the generative AI explosion”.

“Prove the Pythagorean theorem in a German poem and then list the elements in the periodic table in Chinese”

… and the model will usually generate a correct result from scratch in mere seconds. The same goes for more useful requests such as writing a piece of code and reviewing, rewriting or even generating written content on almost any topic. These examples just begin to scratch the surface of what is possible.

It is clear that LLMs at their present state of development can create significant business value already, but these models have limitations that are sometimes overlooked.

In the midst of the storm of progress and activity currently taking place with Generative AI, I’d like to stop for a moment to reflect, and offer a grounded and practical assessment.

LLMs are very large artificial neural networks. It is sometimes said that they simulate the inner workings of the human brain, but this is true to a far lesser extent than commonly perceived. Since neural networks were first introduced in the 80s, it has been well understood that they are function approximators. Even with the introduction of new features (e.g. attention) and new architectures (e.g. transformers) this fundamental nature remains unchanged. Although simplified, you can think of how they work as learning to map a set of training data points to their correct result values and then interpolating between these data points when given novel data. While often very effective, there is no guarantee that this will always produce correct results. An approximation of a function is not the same as the actual function. As statistician George Box famously said, “All models are wrong, but some are useful”.

In reality, there are few guarantees available when it comes to the performance of a LLM, other than you will get some result. Here are some key considerations:

There are no guarantees that a LLM will exhibit the same level of performance and accuracy for all possible data on a task that involves variable data (or parameters if you prefer). Sentiment analysis is one example of such a task as it involves free-form text.
There are no guarantees that a LLM will be able to correctly service a novel variant of a well-tested request that may occur in a production setting.
There are no guarantees that a LLM will return results in the expected format. Asking for a result in JSON format does not guarantee that the result will always be in valid JSON format.
There are no guarantees that a LLM will return factually correct results. They can return made up results that look plausible. Present day LLMs do not know when they are hallucinating so there is no way to warn the consumer of the results.

Note that these issues are fundamental and largely unlikely to change in the near future.

The potential business value of LLMs is dependent on ideal conditions, which is not always the case. This has significant implications for their use in a production setting:

LLMs can be highly valuable assistants for human experts performing tasks as long as their results are always considered with a healthy degree of criticism by human experts equipped to do so.
LLMs can be used for decision making when supervised by a qualified human.
LLMs can be used for automation on tasks where the cost of failure (economic or otherwise) is low and acceptable.
LLMs should not be used for automation on tasks where cost of failure is high and unacceptable. Exceptions apply if the result can be verified and/or corrected in an automated fashion before any action is taken upon it.

In conclusion, LLMs are powerful AI models that can significantly increase productivity and create business value if used appropriately, but LLMs can be counterproductive and potentially harmful if applied in unsuitable contexts.

—————
Thanks to my PhD thesis advisor and friend Dr. Kristinn R. Thórisson for discussion and feedback related to this article.

Catalyzing innovation and high-technology research in Iceland