•

6months ago

Newbie

OpenAI CEO reveals GPT-6 bottleneck! Responding to Huang Renxun's question, he almost "mortgaged the future" for computing power.

Edited on 6months ago

On August 16, Greg Brockman, co-founder and president of OpenAI, shared his latest insights on key AI issues such as the bottlenecks in AI technology development and the relationship between research and engineering at the World AI Engineers Conference. As a veteran of the AI industry who entered the field in 2015, Brockman made an important observation in response to the host's question about the challenges of developing GPT-6:

As computing power and data scale expand rapidly, foundational research is making a comeback, and the importance of algorithms is once again coming to the forefront, becoming a key bottleneck in the future development of AI technology.

For Brockman, this is not necessarily a bad thing. He finds it somewhat tedious to constantly focus on the classic paper “Attention is All You Need” and the Transformer model, which can leave one intellectually unsatisfied. Currently, reinforcement learning has emerged as one of the new directions in algorithm research, but he also acknowledges that there are still significant gaps in capabilities.

Engineering and research are the two main drivers of AI development. Brockman, who has an engineering background, believes that engineers' contributions are on par with those of researchers, and in some ways even more important. Without research and innovation, there would be nothing to do; without engineering capabilities, those ideas could not be realised.

OpenAI has always treated engineering and research as equally important, even though their ways of thinking are different. For new engineers joining OpenAI, Brockman's first lesson is to maintain technical humility, as methods that have proven effective in traditional internet giants may not apply at OpenAI.

Resource coordination between product development and research is another challenge OpenAI frequently faces. Brockman acknowledged in the interview that to support the massive computational power requirements for product launches, OpenAI had to borrow computational resources originally allocated for research, effectively ‘mortgaging the future.’ However, he believes this trade-off is worthwhile.

Brockman also reflected on his childhood interest in mathematics, his transition to programming, his transfer from Harvard to MIT, and his eventual decision to drop out and join the fintech startup Stripe. Due to space constraints, this portion of the interview was not included in the transcript.

Towards the end of the interview, Brockman answered two questions from NVIDIA founder and CEO Jensen Huang, which focused on the future form of AI infrastructure and the evolution of development processes.

This interview with Greg Brockman was recorded in June this year. The following is a summary of some of the highlights (with minor edits made by Zhi Dong Xi to ensure the original meaning remains intact):

1. Engineers and researchers are equally important; the first lesson at OpenAI is technical humility

Host: In 2022, you said that now is the time to become a machine learning engineer, and great engineers can contribute to future progress on the same level as great researchers. Does this still hold true today?

Greg Brockman: I believe engineers' contributions are comparable to those of researchers, and even greater.

In the early days, OpenAI was a group of PhD-level research scientists who proposed ideas and tested them. Engineering was essential to this research. AlexNet was essentially an engineering feat of ‘implementing fast convolutional kernels on GPUs.’

Interestingly, the people in Alex Krizhevsky's lab at the time didn't think much of this research. They believed AlexNet was just a fast kernel for a specific image dataset and wasn't significant.

But Ilya said, ‘We can apply it to ImageNet. It will definitely work well.’ This decision combined great engineering with theoretical innovation.

I believe my previous view remains valid today. Now, the engineering required by the industry is not just about building specific kernels, but about constructing complete systems, scaling them up to 100,000 GPUs, building reinforcement learning systems, and coordinating the relationships between all the components.

Without innovative ideas, there is nothing to do; and without engineering capability, that idea cannot be realised. What we need to do is harmoniously combine the two aspects.

The relationship between Ilya and Alex symbolises the collaboration between research and engineering, which is now OpenAI's philosophy.

From the beginning, OpenAI has regarded engineering and research as equally important, with the two teams needing to work closely together. The relationship between research and engineering is also an issue that can never be fully resolved; after solving problems at the current level, more complex issues arise.

I have noticed that the problems we encounter are largely the same as those faced by other laboratories, though we may go further or encounter different variations. I believe there are fundamental reasons behind this. From the outset, I clearly sensed a significant difference in how individuals with engineering backgrounds and those with research backgrounds understood system constraints.

As an engineer, you might think, ‘If the interface is already defined, there is no need to concern oneself with its underlying implementation; I can implement it in any way I choose.’

But as a researcher, you might think, ‘If any part of the system malfunctions, I only see a slight drop in performance, no error alerts, and no indication of where the error occurred. I must take full responsibility for the entire code segment.’ Unless the interface is extremely robust and completely reliable—a very high standard—researchers must assume responsibility for the code. This difference often leads to friction.

I once witnessed in an early project that after engineers wrote the code, researchers would engage in lengthy discussions over every line, resulting in extremely slow progress. Later, we changed our approach: I directly participated in the project, proposing five ideas at once, and researchers would say four of them were unacceptable, which I found to be exactly the feedback I wanted.

The greatest value we recognised—and what I often emphasise to new OpenAI colleagues from the engineering field—is technical humility.

You bring valuable skills here, but this is a completely different environment from traditional internet startups. Learning when to rely on your existing intuition and when to set it aside is not easy.

Most importantly, stay humble, listen carefully, and assume there are things you don’t understand until you truly grasp the reasons. Only then should you change the architecture or adjust the abstraction layer. Truly understanding and acting with this humility is the key factor determining success or failure.

II. Some research computing power has been diverted to product development, and OpenAI sometimes has to ‘mortgage the future.’

Host: Let's discuss some of OpenAI's recent major releases and share a few interesting stories. One particularly noteworthy issue is scalability—at different scales, everything could potentially collapse.

When ChatGPT was launched, it attracted 1 million users in just five days; and after the release of the 4.0 version of ImageGen this year, the user base surpassed 100 million within the same five-day period. What are the key differences between these two phases?

Greg Brockman: They are similar in many ways. ChatGPT was originally a low-key research preview; we launched it quietly, but it quickly led to system crashes.

We anticipated it would be popular, but at the time, we believed it would take until GPT-4 to truly reach this level of popularity. Internal colleagues had already been exposed to it, so it didn’t feel particularly impressive.

This is also a characteristic of the field—the update pace is very fast. You might just see something and think, “This is the most amazing thing I’ve ever seen,” and the next moment, you’re thinking, “Why can’t it merge 10 PRs (pull requests) at once?” The situation with ImageGen was similar—it was extremely popular after release, with an astonishing spread and user growth.

To support these two releases, we even broke with tradition and diverted some computational resources from research to support the product launch. This was akin to ‘mortgaging the future’ to keep the system running, but if we could deliver on time and meet the demand, allowing more people to experience the magic of the technology, the trade-off was worth it.

We have always adhered to the same philosophy: to provide users with the best experience, drive technological development, create unprecedented achievements, and do our utmost to bring them to the world and achieve success.

3. AI programming is not just about ‘showcasing skills’; it is transitioning towards serious software engineering.

Host: ‘Vibe coding’ has now become a phenomenon. What are your thoughts on it?

Greg Brockman: Vibe coding is a magical enabling mechanism that reflects future trends. Its specific forms will continue to evolve over time.

Even with technologies like Codex, our vision is that when these agents are deployed, they won't be just one or ten copies, but hundreds, thousands, or even tens of thousands running simultaneously.

You would want to collaborate with them as you would with colleagues—they run in the cloud and can connect to various systems. Even when you're asleep or your laptop is turned off, they can continue to work.

Currently, people generally view vibe coding as an interactive loop, but this form will change. Future interactions will become more frequent, and agentic AI will intervene and surpass this model, driving the construction of more systems.

An interesting phenomenon is that many ambient programming demos focus on creating ‘cool’ projects like fun apps or prank websites, but what's truly novel and transformative is that AI is beginning to transform and deeply integrate into existing applications.

Many companies face the challenge of migrating, updating libraries, and converting legacy code bases—such as COBOL—to modern languages, which is both difficult and tedious. AI is gradually addressing these issues.

The starting point for atmospheric programming is ‘creating cool applications,’ but it is evolving into serious software engineering—especially in its ability to delve into existing systems and improve them. This will enable businesses to grow faster, and that is precisely where we are heading.

Host: I hear that Codex is like a ‘child you've raised yourself’ for you. You've emphasised from the start the importance of making it modular and well-documented. How do you think Codex will change the way we program?

Greg Brockman: Calling it my ‘child’ is a bit of an exaggeration. I have an outstanding team that has been working tirelessly to support them and their vision. This direction is both fascinating and full of potential.

The most interesting point is that the structure of the codebase determines how much value can be extracted from Codex.

Most existing codebases are designed to leverage human strengths, while models excel at handling diverse tasks without the deep conceptual connections humans can make. If the system aligns more closely with the model's characteristics, the results will be better.

The ideal approach is to break the code into smaller modules, write high-quality tests that run quickly, and then let the model fill in the details. The model will run the tests and complete the implementation on its own. Connecting components (architecture diagrams) is relatively easy to build, while filling in the details is often the most challenging part.

This approach sounds like good software engineering practice, but in reality, humans often skip this step because they can process more complex conceptual abstractions in their minds. Writing and refining tests is a tedious task, but models can run 100 or even 1,000 times more tests than humans, thereby taking on more work.

In a sense, we aim to build a codebase designed for junior developers to maximise the model's value. Of course, as the model's capabilities improve, whether this structure remains optimal will be an interesting question.

The advantage of this approach is that it aligns with the practices humans should follow for maintainability. The future of software engineering may require reintroducing practices we abandoned for shortcuts, enabling systems to achieve their full potential.

4. As training systems become increasingly complex, checkpoint design needs to be updated accordingly.

Q: The tasks we are currently executing often take longer, consume more GPU resources, and are unreliable, frequently resulting in failures that interrupt training. This is well known.

However, you mentioned that it is possible to restart a run, which is fine. But how should we handle this when training agents with long-term trajectories? Because if the trajectory itself is non-deterministic and has already progressed halfway, it is difficult to truly restart from the beginning.

Greg Brockman: As model capabilities improve, you will continuously encounter new problems, solve them, and face new challenges.

When runtime is short, these issues are not significant; but if a task requires days of runtime, you must carefully consider details such as how to save state. In short, as training system complexity increases, such issues must be given serious attention.

A few years ago, we primarily focused on traditional unsupervised training, where saving checkpoints was relatively straightforward, but even then, it was no easy task. If you want to transition from ‘occasionally saving checkpoints’ to ‘saving at every step,’ you must seriously consider how to avoid issues like data duplication and blocking.

In more complex reinforcement learning systems, checkpoints remain important, such as saving caches to avoid redundant computations. Our system has an advantage: the state of the language model is relatively clear and easy to store and process. However, if the connected external tools themselves have state, they may not be able to resume smoothly after an interruption.

Therefore, it is necessary to plan the checkpoint mechanism for the entire system end-to-end. In some cases, interrupting and restarting the system, causing some fluctuations in the results curve, may be acceptable because the model is intelligent enough to handle such situations. The new feature we plan to launch allows users to take control of the virtual machine, save its state, and then resume operation.

5. Building AGI is not just about software; it also requires the simultaneous development of supercomputers.

Jensen Huang: I wish I could ask you this question in person. In this new world, data centre workloads and AI infrastructure will become extremely diverse. On one hand, some agents engage in deep research, responsible for thinking, reasoning, and planning, and require a large amount of memory; on the other hand, some agents need to respond as quickly as possible.

How do we build an AI infrastructure that can efficiently handle large pre-filled tasks, large decoding tasks, and workloads in between, while also meeting the needs of low-latency, high-performance multimodal visual and speech AI? These AI systems are like your R2-D2 (the robot from Star Wars) or your always-available companion.

These two types of workloads are fundamentally different: one is extremely computationally intensive and may run for extended periods, while the other demands low latency. What would an ideal AI infrastructure look like in the future?

Greg Brockman: Of course, this requires a large number of GPUs. If I were to summarise, Huang wants me to tell him what kind of hardware to build.

There are two types of demands: one is long-term, large-scale computational needs, and the other is real-time, instantaneous computational needs. This is indeed challenging because it is a complex co-design problem.

I come from a software background, and we initially thought we were just developing AGI (Artificial General Intelligence) software, but we soon realised that to achieve these goals, we must build large-scale infrastructure.

If we want to create systems that truly change the world, we may need to build the largest computer in human history, which is reasonable to some extent.

A simple approach is to indeed use two types of accelerators: one prioritises maximising computational performance, while the other prioritises extremely low latency. Stacking a large amount of high-bandwidth memory (HBM) on one type and a large number of computational units on the other would essentially solve the problem. The real challenge lies in predicting the proportion of demand for each type. If the balance is off, part of the cluster could become useless, which sounds terrifying.

However, since this field has no fixed rules or constraints and is primarily an optimisation problem, if there are deviations in engineer resource allocation, we can usually find ways to utilise those resources, though at a significant cost.

For example, the entire industry is shifting towards Mixture-of-Experts (MoE) models. In part, this is because some DRAM is idle, so we utilise these idle resources to increase model parameters, thereby improving machine learning computational efficiency without incurring additional computational costs. Therefore, even if resource balancing is off, it does not lead to disaster.

The homogenisation of accelerators is a good starting point, but I believe that ultimately, customising accelerators for specific purposes is also reasonable. As infrastructure capital expenditures reach astonishing scales, highly optimising workloads also becomes reasonable.

However, the industry has not yet reached a consensus, as research and development is progressing at an extremely rapid pace, which in turn largely dictates the overall direction.

6. Basic research is making a comeback, with algorithms replacing data and computing power as the key bottlenecks

Q: I didn't intend to ask this question, but you mentioned research. Could you rank the bottlenecks faced during the GPT-6 expansion process? Computation, data, algorithms, power, funding. Which are the first and second? Which is OpenAI most constrained by?

Greg Brockman: I believe we are now in an era of the return of basic research, which is very exciting. There was a time when the focus was: we have the Transformer, so let's keep scaling it up.

In these clear-cut problems, the primary task was simply to improve metrics, which was certainly interesting, but in some ways also felt intellectually unchallenging and unsatisfying. Life shouldn’t be limited to the mindset of the original “Attention is All You Need” paper.

Today, what we’re seeing is that as computational power and data scale expand rapidly, the importance of algorithms is once again coming to the fore, almost becoming the key bottleneck for future progress.

These issues are foundational and critical components. Though they may appear imbalanced in daily practice, fundamentally, this balance must be maintained. The progress in paradigms like reinforcement learning is highly encouraging, and this is an area we have consciously invested in over the years.

When training GPT-4, during the first interaction with it, everyone wonders, ‘Is this AGI?’ Clearly, it is not yet AGI, but it is difficult to articulate precisely why not. It performs very smoothly, but sometimes takes the wrong direction.

This indicates that reliability remains a core issue: it has never truly experienced the world, more like someone who has only read all the books or understands the world through observation, separated from it by a glass window.

Therefore, we realise that a different paradigm is needed, and we must continue to drive improvements until the system truly possesses practical capabilities. I believe this situation still exists today, with many obvious capability gaps that need to be addressed. As long as we keep pushing forward, we will eventually reach our goal.

7. A ‘diverse model library’ is gradually taking shape, and the future economy will be driven by AI

Jensen Huang: For the AI-native engineers here, they might be thinking that in the coming years, OpenAI will have AGI (Artificial General Intelligence), and they will build domain-specific agents on top of OpenAI's AGI. As OpenAI's AGI becomes more powerful, how will their development process change?

Greg Brockman: I think this is a very interesting question. It can be viewed from a very broad perspective, with firm but differing viewpoints. My view is: first, anything is possible.

Perhaps in the future, AI will be so powerful that we only need to let them write all the code; perhaps there will be AI running in the cloud; perhaps there will be many domain-specific agents that require a significant amount of customisation to achieve.

I believe the trend is moving toward this ‘diverse model library’ approach, which is very exciting because different models have different inference costs, and from a system perspective, distillation techniques work very well. In fact, much of the capability comes from a model's ability to call upon other models.

This will create numerous opportunities, and we are moving toward an AI-driven economy. Although we have not fully arrived there yet, the signs are already evident. The people present here are building this future. The economic system is vast, diverse, and dynamic.

When people envision the potential of AI, it is easy to focus solely on what we are doing now and the ratio of AI to humans. However, the real focus should be: how can we increase economic output by tenfold and ensure everyone benefits more?

In the future, models will become more powerful, foundational technologies will be more robust, we will use them for more tasks, and the barriers to entry will be lower.

In fields like healthcare, it cannot be applied simplistically; we need to think responsibly about the right approach. In education, it involves parents, teachers, and students, and each stage requires expertise and significant effort.

Therefore, there will be numerous opportunities to build these systems, and every engineer here possesses the potential to achieve this goal.