
In 2016, I Started an AI Code Generation Venture
In 2016, when AI was far from becoming the widespread wave it is today, I had already begun earnestly attempting something: using language models to understand, modify, and generate code.
I named the company ai.codes. This name hardly needed explanation: AI codes—treating 'code' as a verb, meaning 'AI writes code'.
This vision finally became reality a decade later. The wave of 'Agentic Coding' has swept the globe. It's just that when all this truly arrived, my venture from back then had already stopped in 2016.
Today, I want to write down this past story. It serves both as a record of that eve before the technological explosion and as a belated reply to my 30-year-old self.

The vision from back then
Backstory: Why I Believed in This
From 2011 to 2014, I worked at Google. Back then, I had a habit: checking what code Jeff Dean had recently written.
In the summer of我们发现 2013, I noticed he was involved in a project called DistBelief. That was Google's early large-scale distributed neural network training system, essentially making the forward and backward processes of neural networks into distributed computing on CPU clusters. Although I didn't know where Jeff Dean ultimately wanted to take this path at the time, this incident became an opportunity: I started spending a lot of time systematically learning about neural networks.
In 2014, I left Google, moved to the San Francisco Bay Area, and joined Fitbit, which wasn't very prominent at the time.
Because I already had a relatively deep understanding of the capabilities of neural networks, at Fitbit, I proactively introduced convolutional neural networks into the problem of sleep cycle classification. Fitbit was one of the earliest companies globally to do sleep cycle recognition on wearable devices, and the small team I led back then should also be considered one of the earlier teams in the industry to actually apply convolutional networks to this problem. The mainstream method at the time was still feature engineering plus linear classifiers, and we quickly discovered in practice that convolutional neural networks performed significantly better: the signal representations they learned were stronger than manually constructed features.
At work, I used convolutional networks, but what truly fascinated me was actually the potential of neural networks in natural language processing: translation, understanding text, and further, understanding programs.
I was particularly curious back then: Could neural networks write code? Especially the kind of code I found very verbose, not highly complex, but with many patterns. Because in my work, I often saw that engineers, especially those working on underlying systems, were often either writing configurations or writing 'code that generates configurations.' Meanwhile, machine translation had begun to achieve real breakthroughs. Since AI could do translation between natural languages, why couldn't it do translation from natural language to code? Moreover, much code is more regular and has lower entropy in a statistical sense than natural language.
Back then, I kept writing 'Programming Pearls: Extra Chapters' on my blog, and I had quite a bit of reflection on the characteristics of various programming languages and the various difficulties in human programming. I gradually realized that when AI was still far from perfect, the truly feasible path wasn't pure AI, but cautiously combining AI with compiler technology.
These simple judgments, long-accumulated technical interest, and confidence in my own engineering abilities ultimately converged into the starting point of my venture.
In August 2016, I resigned from Fitbit, deciding to pursue this full-time.
Back then, many people registered .ai domains. I deliberately went against the trend and registered a domain I still find quite explosive to this day: ai.codes. The meaning is very direct: AI writes code.
The First Version of the Product: Starting with Autocompletion
The connection between language models and code autocompletion is actually very direct. So the earliest product I made was a smarter code autocompletion tool, with a simple goal: improving programmers' efficiency in the IDE.
To achieve this, I downloaded a large amount of open-source code from GitHub at the time, specifically built a set of backend storage and training architecture, and trained a language model that was already not small for its time on a GPU. Its task was not fundamentally different from today's large language models: predicting the next token.
In 2016, Transformer hadn't appeared yet; LSTM was still the most realistic sequence model choice. So my prediction model ended up being a 4-layer LSTM. Looking back now, it was certainly still early-stage, but it was impressive enough at the time.
Code has a natural advantage compared to natural language: its next token is often subject to stronger grammatical constraints. For example, if a left parenthesis hasn't appeared before, a right parenthesis shouldn't appear out of thin air later. So I naturally added a layer of constraint back then: filtering out those tokens that couldn't possibly appear in the current grammatical position.
Looking back today, this is similar to the later familiar concept of constrained decoding; extrapolating further, it also shares common ground with the constrained generation used later when models generate valid tool calls or valid JSON. At the time, I didn't consider it any theoretical breakthrough; I just felt it was too intuitive and should be done this way as a matter of course.
Soon, I wrote an IntelliJ plugin that could do code completion in the editor. Traditional IDE autocompletion usually only helps you complete a function name, for example, you type Array., and it suggests a candidate method after the cursor; but my system could already complete entire statements.
Later, I also scraped a large number of code snippets from Stack Overflow and indexed them. I invented a comment method using three slashes: as long as you input a comment line with three slashes and press enter, for example:
/// Here I need to read the file contents into a list

The model would then attempt to translate this natural language sentence into code.
Even today, I remember the almost magical feeling the first time I saw it truly 'write code.' The method behind it wasn't actually mysterious: I first searched the index, retrieved many candidate code snippets, and then had the language model continue generation combining the context. The resulting program was not only syntactically correct but could even automatically align variable names with the context. Looking back today, this was already very close to the later so-called retrieval-augmented generation approach.
In 2016, such results were quite impressive.
But I soon made a huge mistake: I didn't know how to promote it, nor how to turn it into revenue.
I believed back then that such technology could only be sold to companies, and needed to be combined with a company's internal codebase for customized fine-tuning to truly show its magic. So I spent a long time trying to sell something to various companies that most people generally couldn't understand and whose full value wasn't apparent without customization. Apart from some friends and acquaintances willing to try it, I never managed to obtain real revenue.
Beyond Technology, People and the Era Were Even Harder
In Silicon Valley, people often joke that a company with no revenue often has the highest valuation. But when you're doing something that almost no one can sufficiently understand, fundraising becomes exceptionally difficult.
I remember once sitting in a venture capitalist's office, facing a Stanford MBA who knew almost nothing about AI technology. Her core question for me wasn't about the model, the product, or programmer workflows, but: Can this technology be sold in China to make money?
She even directly told me that if the system couldn't understand Chinese comments yet, they wouldn't invest, because their backers had considerations related to Chinese capital.
Such feedback quickly made me realize that the problem wasn't just that I wasn't explaining it well enough, but that many people weren't even discussing the same issue as me. We all called it AI verbally, but we weren't thinking about the same thing in our minds.
Most Silicon Valley investors weren't actually pessimistic about the prospects of this direction, but it was simply too far ahead of its time back then; they needed more signals. And being ahead of its time precisely meant a lack of signals.
In September, I went to Mountain View for a Y Combinator interview. During the interview, everyone actually quite liked my product, but the final feedback was still: need more signals, encouraged me to come back later.

The rejection letter from YC
Fundraising occupied a lot of my time and directly slowed down product and market progress. It was only then that I truly realized how passive a situation the lack of a suitable co-founder could drag an overly超前 startup into.
I had a highly respected colleague from my time at Google. I even flew back to Chicago many times, trying to persuade him to join. But in his view, this thing was too fanciful. Later, a colleague from my graduate school days was willing to take this risk, but the entire technology stack was neither something that could be directly learned from textbooks nor his PhD direction, so progress was still slow.
Looking back, I can understand these colleagues and investors. They weren't wrong; it's just that no one was willing to bet on a vision that hadn't yet been proven by the era.
So, the reality back then was: I almost single-handedly carried everything—technology, product, fundraising, narrative, psychological pressure, and life itself.
Back then, my child was just over a year old, had just learned to walk, and chased after me to play every day. Yet I was running around outside all day. I still remember a very specific scene to this day: One day I came home very late, my wife and son were already asleep, the kitten curled up beside them. I stood there, knowing in my heart that I vaguely saw a puzzle about the future, but in the real world, I couldn't find the next step to piece it together. I kept hitting walls. Fundraising, partners, product path—all futures were separated by a layer of fog. My parents would only say one thing: Don't work too hard.
That feeling wasn't simply hard work, but a deeper loneliness: You know what you've seen, but you can't translate it into a language others can also believe. And at the same time, you must do something, so as not to辜负 those who still believe in you.
How the Venture Ended
After being rejected by YC, I decided to first clearly think through the product's shortcomings. I did some more experiments, trained a few small models, and gradually saw the core of the problem: What I lacked wasn't the next plugin, the next demo, or a prettier fundraising pitch deck.
What I truly lacked was a stronger model. More directly, it was the money to train a larger model.
My model's performance was still not stable enough, predictions often had problems. I had already used all the compiler tricks and syntax checking methods I could think of, patched almost everywhere possible. The remaining problems couldn't be solved by minor engineering fixes, but by the model itself not being strong enough.
Around Christmas, I seriously calculated once: I probably needed about $250,000 to train this model.
This number didn't seem exaggerated to me, but in the eyes of many VCs, it was almost heretical. In their imagination, fundraising money should be used to hire people, drive growth, tell stories, not be sent into data centers and ultimately turned into GPU heat. They didn't understand this direction to begin with, and at this step, they understood even less: why I wanted to directly burn $250,000 for a model that wasn't necessarily successful yet.
The problem wasn't whether they respected me, but that I truly couldn't build a bridge convincing enough for them between the 'existing product' and the 'product that might emerge if model capabilities leap forward.'
I needed money to train a stronger model; but I also needed that stronger model to prove this thing was worth investing in.
Looking back now, perhaps I could have thought of other paths. For example, changing the切入点, first finding users willing to tolerate the model not being smart enough but still able to benefit from it; or earlier building a product around toolchains, data, and workflows that could grow slowly. But back then, I was trapped by this deadlock, feeling I couldn't push forward for the time being.
So I started thinking, maybe the only way was to first monetize the technical ability I had on hand, go work first, wait until I saved enough money, then come back and continue doing this.
I remember that Christmas, the financial pressure was already very real. I couldn't bear to buy a Christmas tree, and only remembered to buy my son a toy truck the day before Christmas Eve. After that Christmas, I finally admitted: I couldn't keep twisting like this anymore. Rather than消耗 internally in place, it was better to take a step back first, wait for the时机 to mature, then come back.
So, I transferred the entire backend technology for downloading GitHub code to a company that needed it. Then I decided that I would only go to two companies next to earn some money: OpenAI and Reddit.

The three-step plan formulated at the time: first do autocompletion, then do small-scale filling, finally achieve using only natural language, because code is ultimately just a sequence of symbols computers can understand.
OpenAI and Reddit
OpenAI back then was still just a small non-profit organization, even requiring doing some problems first to get an interview opportunity. Those problems weren't difficult for me. Among the several engineers who最终 interviewed me, one was Andrej Karpathy.
The technical questions themselves didn't stump me. The real problem was that I knew too clearly what I wanted to do: I wanted to do natural language processing, I wanted to do AI writing code. But OpenAI's focus at the time was more on reinforcement learning and vision, and I wasn't very interested in those directions.
I remember they asked me what I wanted to do if I joined OpenAI. I said, I could do research, I could do engineering, but what I truly wanted to do was make AI writing code happen.
Conversely, this also meant I might not be the person OpenAI most needed at that stage. Looking back now, I was probably quite a character back then. So, receiving a rejection letter wasn't surprising.
The second company was Reddit.
My judgment at the time was: If natural language processing was truly going to become an important capability in the future, then communities like Reddit would definitely become increasingly important, because NLP companies worldwide would eventually need data like Reddit's. Later facts also大体 proved this to be true.
So I joined Reddit as a Senior Director in the machine learning direction. Later, some friends felt sorry for me, thinking if I had gone to OpenAI back then, my life trajectory might have been completely different. I don't think that way. Life isn't a problem that can be反复验算; many choices can only hold true for that version of oneself at that time.
Looking Back, Where Did I Go Wrong
I certainly made many mistakes.
First, I spent too much time trying to persuade those who根本 couldn't understand this thing. Entrepreneurs need to tell stories, but not everyone is worth your repeated explanations. For a direction that's too far ahead of its time, the more realistic approach often isn't convincing everyone, but quickly finding that tiny minority who can inherently understand.
Second, I didn't dare to build in public back then. The startup environment today is already very different: writing publicly, doing publicly, iterating publicly is itself part of accumulating momentum. But I didn't have that mindset back then, always thinking about waiting until it was more complete, prettier before going public. The result was that much important work was done silently, not truly participating in that later波澜壮阔 history.
Finally, and most importantly: I didn't have a clear enough understanding of my own failure modes back then. I didn't know clearly enough in what ways I would fail, what constraints I would get stuck on, and how I should adjust my approach when these constraints appeared simultaneously. For example, I thought I had saved enough money to support a year of创业; but when I真正走到 the关键节点, I didn't have $250,000 in my pocket to make a豪赌.
Looking Back, What Did I Do Right
If there's anything I did right, probably two things.
First, I truly did something crazy. It didn't succeed, but it was real: there was code, there were models, and there were countless specific memories. It's part of my life, not an intellectual's self-satisfaction停留在脑子里.
Second, I'm glad I didn't take that investor's money back then. That was the first真正像样的 investment contract I received, the amount was $1.5 million, but I didn't like the contract terms at all. After rejecting that contract, I might have offended half the Silicon Valley investment circle, but I still like that 'unwilling to be束缚' mindset from back then. I paid a price for it, and also preserved a bit of freedom because of it.
An Anecdote, Written for Those Anxious in the Current AI Wave
Finally, a small story.
In the summer of 2016, I had already gone to Reddit, working on recommendation systems not language models, but I was still very关注 the direction of AI writing code. Back then, people in Silicon Valley doing this knew each other to some extent.
One day, two young Russians found me, wanting to chat about this direction. One was named Illia Polosukhin, the other Alexander Skidanov.
Illia introduced himself as a technical lead on the Google TensorFlow team at the time, Alexander was a top ACM ICPC competitor. We chatted for an afternoon at a café near Union Square in San Francisco. I told them about my attempts, the difficulties I encountered, and also shared some investors I had接触过 and the challenges this direction might face. I even坦率地 mentioned an investor who was very热心 about this direction but whom I didn't like; it's just that person had already started转向 blockchain, and I wasn't sure if he would still invest in the code direction.
They also told me what they wanted to do: use AI to generate code solutions for ACM programming competition problems. And I could sense they might have stronger methods than I did.
When we parted, I sincerely wished them well, and thought, these two might really be able to accomplish something, I should continue关注一下.
A month later, they received funding from that investor, the company was named near.ai, meaning AI is already very near to us.
But the subsequent development of things was almost completely different from the linear technological evolution path I had in mind back then. A few months later, the company switched to doing AI-generated dApps; a few more months later, switched again to doing a public chain; finally彻底 became a blockchain company.
Why do I tell this story?
Because those familiar with large language model history probably already know: Illia Polosukhin is actually one of the authors of the Transformer paper 'Attention Is All You Need.' At the time, this划时代的 paper had just been published, but no one could clearly see its significance. Today's ChatGPT, and the entire modern large language model era, are almost all built upon this Transformer work.
The part of this anecdote that感慨 me the most is: We who are身处其中 are actually always surrounded by the fog of the future. So-called 'seeing the right direction' or 'seeing the wrong direction' often isn't enough to determine where you will最终 end up. You might see the future, but lack resources; you might possess resources, but go in another direction; you might participate in the most crucial foundational work, yet still not know how it will change the world years later.
The future doesn't unfold linearly.
So, anxiety can't truly help us接近 the future. What's more important is, within the boundaries you can see at present, make a choice worthy of yourself; as for the rest, leave it to time.