• GM, No Fomo
  • Posts
  • šŸ—æ All LLMs are converging to the same point

šŸ—æ All LLMs are converging to the same point

How to know what's next for crypto

No Fomo Moai Mascot

GM. Just wrapped a 60-page doc, 117 slide RFP that took hundreds of hours to make. It got me thinking.. how far away are we from automating creative services? Will it ever be possible? Should it be possible?

LLMs are designed to automate the binary, the manual. The things that can be replicated over and over. Like the time I worked on Wall Street in leveraged loans and had to fax our allocated trades to brokers.

But complex creative processes, like how to make a movie, will always require manual labor ā€“ if you want it to be good. Why? Because all LLMs are converging to the same point.

In the age of automation, if you want to stand out and be different, you have to be uniquely yourself and know how to express that.

Sure, one-man award-winning movies will be possible. Successful franchises created without humans may also become a reality. But a cookie cutter Marvel movie made by robots, if it performed well in the market, is more indicative of poor cultural taste than it is an existential threat to our ability to make a superior, unique film in the age of automation.

Apologies in advance this is a stream-of-consciousness post, and I will organize my thoughts later.

Without Further Ado. ā˜• *knuckle cracks* ā˜• Letā€™s get into it.

VaultCraft launches V2, TVL skyrockets above $100M

VaultCraft launches V2, partners with Safe, and secures $100M+ in Bitcoin

  • Matrixport, Asiaā€™s leading crypto providers, commits $100M+ in Bitcoin

  • OKX Web3 to launch Safe Smart Vaults with $250K+ in rewards

Our AI creative tech stack (and why creativity cannot never be automated)

We use AI for our creative ops. And weā€™re nowhere close to automating our creativity. Frankly, we donā€™t want to. But weā€™re leaning hard into trimming the fat.

Our high-level AI creative tech stack at Supernova:

  • Perplexity and Claude for research

  • GPT for process outlining

  • Midjourney, Luma AI for image + video, storyboarding

  • Adobe Creative Suite for editing

  • Play.ai, Eleven Labs for audio

  • Topaz for optimization

We use more AI tools ā€” but this is the bread and butter. I kind of had an ā€œahaā€ moment when I spoke with an Executive at buck.co, and she told told me they designed all marketing assets for OpenAI.

Why would one of the worldā€™s leading AI companies spend millions of dollars on an external vendor for creative services?

Because creativity cannot be automated. And if you want to be the best of the best, it never should be. Iā€™m not an engineer so feel free to reply and correct me if Iā€™m wrong.

But this is my understanding for how LLMs work, why they are inherently limited, and why thatā€™s actually a good thing.

Language models are based on data scraping on the Internet. Itā€™s a battle of who has the best data. And no language model is capable of having all the information. Just like no country is capable of having all the money in the world.

Without absolute totalitarianism or some far-off utopia, it is not feasible. Ok, so each dataset is limited and distinct, right?

Well not exactly.

Because in order to be a LLM, you have to pull from a ton of public data. And many of them used the same web scrapers as pre-training, to the point where it feels like all LLMs are converging to the same point. From this rabbit hole Reddit thread,

ā€œI generated a list of 100 items last night. I used Gemini, GPT4, GPT4o, llama405B, MistralLarge, CommandR an DeepSeek2.5

Outside of deepseek, the first 6 generated almost identical dataset and grouped them almost exactly the same. The yapping was obviously different between the models, but the main data I needed was damn near exactly the same. The order of the data by category was also similar. As I stared at the data, it dawned on me that they are all converging towards toward the same location.

I don't think that location points to ASI. I suppose with them all being trained on almost same data it's to be expected, but it got me thinking.ā€

One reply pointed out:

ā€œThe original LLMs, namely GPT-3.5/4 and llama 1 used web scrapers to scrape a large portion of Internet content, and format it into a dataset. They then used this as pre-training data, which produces a base model, which is effectively a high quality text autocomplete. This base model then undergoes instruct tuning, which teaches it to follow instructions and thereby chat with people. In the case of chatgpt, they taught it to respond in a professional, dry, robotic, corporate manner. They also used RLHF in order to rank its responses, and optimize for human preference. However in the case of llama one, because this was prior to mass usage of synthetic data, it actually had very colorful, realistic, human-like use of language, but terrible intelligence compared to GPT.

After the Llama one leak, the first fine tunes of llama one came out, these were purely research oriented, but the original Alpaca and Vicuna were research showing that training large language models on GPT 3.5 chat logs significantly improved its performance. People begin to collecting GPT chat logs, and turning them into massive data sets, leading to what we know nowadays as synthetic data, in other words, data generated by an llm.

The use of synthetic data is essentially a form of distillation, which means to take a large model, and train a small model on its outputs, to optimize the smallest models intelligence and responses to be as close to the large model as possible.

Meta almost immediately caught on to how effective the usage of synthetic data was, and used it in its training data sets for llama 2, causing intellectual capabilities to skyrocket, but the manner of speech and verbal tics of GPT were now included in the Llama series. After this came Mistral 7b, a small open source model, by a small open source startup, but it showed that even a small model can be significantly better than a large model, if trained properly. Mistral, being a French company with not that many resources, has always relied very heavily on synthetic data.

Mistral 7B kicked off the small model craze, in which experimentation and new techniques for optimization were rampant, and RLHF generally fell out of favor in both the open source company community and the corporate community since it requires human labor, is slow, and is expensive, and therefore can only be done by large corporations, which was a huge problem for the open source community. It was replaced by much newer techniques like DPO (direct preference optimization) and experimental ones like SimPO or KTO.

One of the most common complaints during this era was that every model talked exactly like chatgpt. Meta caught on to this, and when they released llama 3, they used DPO during the training phase, to make the model seem much more friendly and human. This boosted its place on a human preference leaderboard, lymsys, and caught on very quickly. Other models followed suit, with Claude 3.5 also opting for similar friendly speech, and Gemma 2 doing the same. Nowadays, that is the standard.

I'm no fine tuner, so my understanding is limited, but modern synthetic data collection generally works by creating a data set of questions, figuring out what is considered the "best" model at the time, and then making tons of API calls in parallel, and adding the llms response to the data set. As of right now, that would be Claude 3.5 sonnet. However, some people prefer to open a run pod instance or host locally, in which case they get the best local model they can get their hands on, which is either Mistral Large 123b, or llama 3.1 405b, and have it generate the answers to the data set.

I know for a fact that synthetic data is heavily used during instruct tuning, but as for how it figures into pre-training, I'm not quite sure, you may want to read a paper about thatā€.

These charts are from a great read by Cameron R. Wolfe (spelt his last name wrong), who dives deeper into how language models are trained.

The TLDR: unless one mega model trained its data on all human activity ever, and had all humans as users for in-context training to enhance it even further, no AGI can dominate creative services on its own.

And even so, how could this superhuman create ideas without the influence of historical data? Humans would still be capable of creating net-new ideas. Essentially, they would achieve Inception, and if they did, this being would truly be the manifestation of God. Maybe thatā€™s the Second Coming of Jesus weā€™ve been waiting for.

Iā€™m sure there are a million holes in my reasoning here. But a good thought starter for where weā€™re headed nonetheless. Because I think we will find out the answer sooner than later, as technology accelerates, manual tasks diminish, and humans are left to uncover the darkest corners of the Universe.

What weā€™re reading

Your daily AI dose

Mindstream is your one-stop shop for all things AI.

How good are we? Well, we become only the second ever newsletter (after the Hustle) to be acquired by HubSpot. Our small team of writers works hard to put out the most enjoyable and informative newsletter on AI around.

Itā€™s completely free, and youā€™ll get a bunch of free AI resources when you subscribe.

Reach Over 100 Million Humans

This is NOFOMO, the newsletter keeping you up-to-date on all things emerging tech.

No Fomo is part of Supernovaā€™s media conglomerate - a media network reaching 100 million humans and over 10 billion impressions/month across socials, gaming, newsletters, and podcasts.

DISCLAIMER: None of this is financial advice. This newsletter is strictly educational and is not investment advice or a solicitation to buy or sell assets or make financial decisions. Please be careful and do your own research.

Reply

or to participate.