Parallel Development and Survivorship Bias in LLMs

Analysis

By Yanai Levy, Nov 25th, 2023

Large Language Models (LLMs) are the new hotness making the rounds in the technology sphere. Many are billing them as artificial intelligence, the kind we see in Star Trek and other concerning futuristic fantasies, but that is not the case. LLMs are powerful and a huge leap forward in many ways, but they are also cognitively and operatively limited, though the latter has been improving as of late. Despite that, we are living through a very exciting time. The kind of potential for change that this tech represents is nothing to scoff at, and being able to watch the development of something possibly on the scale of the internet is an opportunity to be cherished. Unlike the internet’s development which we can now view in hindsight, the development of LLMS is full of uncertainty right now. However, using perspective from the past we can make some educated guesses as to the way LLMs will progress in our near future.

Let’s take a few step back to the development of the internet. The internet’s first form was that of ARPANET during the cold war. The US government was concerned that Soviet Union’s advancements would mean that existing ways of routing information between government computers and command centers would be in danger of being cut off from each other as the result of a nuclear or conventional attack. To combat this possibility they tasked ARPA, now known as DARPA, to conceive of a way to share information between different nodes in a way that would allow the information to be routed in many ways to avoid a catastrophic failure along a given path. Essentially what that means is, if the normal A to B path that a telegraph line ran through was suddenly cut, the telegraph was useless. In contrast ARPANET could route the packets of information it used through other nodes on the network to get around even multiple physical path failures.

Visualization of the difference in how the internet connects its nodes vs the telegraph

While ARPANET was the earliest implementation of an internet type technology, there were other technologies competing with it and augmenting it at the time. One of the most well-known is the French CYCLADES, whose founder Louis Pouzin pioneered the TCP/IP concept still used by modern internet services. After ARPANET split into MILNET and ARPANET, the former of which was reserved for military use, the term Internet was coined to refer to them both as well as other incumbents such as SATNET and ALOHA. Into the early eighties, the Usenet and NASA developed NSFNET became more widely used and shortly afterwards ARPANET was disused and decommissioned. The X.25 protocol made many of these networks able to exchange information through gateways, and the Internet was truly born, a network of networks.

What has just been presented as an early and basic timeline of the internet is the accepted view, however it falls prey to a large amount of “history is written by the victors” syndrome and survivorship bias. Basically, the history is likely inaccurate because its details have been lost due to certain “winners” at key points in its development deciding to record only their opinions and suppress others. In addition to that we don’t know much about failed startups or the companies that did not win the bidding war to build ARPANET, since their development largely ended when they leant the wrong way at their tipping points.

Okay, back to LLMs. The reason all that context is necessary about the internet is that we are now going through the equivalent time period of the post-ARPANET period where many different companies are vying to be the one that carries us forward now that the technology has been established and possible. OpenAI would have you believe that it is ChatGPT that will take us into the LLM world. Google would like you to believe its Bard LLM will be the LLM of the future. Microsoft is betting on their customized version of ChatGPT called Sydney.

So, what is an LLM really? The first thing to note is that they are not all that new. While an early version called ELIZA was shown by MIT in 1966, LLMs in their current guise popped on up the scene in 2017 with Google’s Transformer architecture that they released to the public. This kind of transformer LLM was the basis for GPT1, the current ChatGPT’s great-grandfather. LLMs are networks that are loosely modeled after the human brain, which can be fed extremely large amounts of data (the large in LLM). Their strength lies in finding patterns within this data, and being able to replicate things that follow those patterns that did not exist before. That is why the current crop of LLMs are so good at sounding human. They have been fed millions and millions of conversations, so they have a good model of how different types of sentences are formed. They know nothing. They do not reason or think. What they can do is parse tons of information and spit it back out in a way that sounds like a sentence. That is the big difference between an AI like you see in sci-fi movies and what we have today. They are often confidently wrong, they can contradict themselves, and they can go off the rails with a little coaxing. Think toddler that just learned to walk. The fact it can do so is incredible, but it sure isn’t very good at planning where to go and how to get there.

The thing to keep in mind is that we don’t know which of these versions will be the ubiquitous one in 10-20 years. We don’t even know if one of them will be at all, maybe we haven’t seen the conversational language model that we will have in our homes or in our eyeballs yet. Any and all who claim to have that information are trying to sell you something or are not very bright. Because of the survivorship bias mentioned earlier, chances are that no one will even remember what LLMs are around now when this technology matures, bar maybe one or two that either beat the odds and become the one chiefly used or make significant enough contributions to the field that they are noted in the Wikipedia article about it. Our responsibility is to monitor the growth directions of this technology so that it does things that we support it doing. We are watching something momentous happen, but that does not relieve us of our responsibility to guide it in the right direction.

Thanks for reading!

Parallel Development and Survivorship Bias in LLMs

Analysis

Lev Reviews