Low code, no code to AI programmer

Interested in learning about the newest software that can automatically generate code you wish to write? This article shares the experience of using GitHub’s Copilot and dives into alternative OpenAI powered coding assistants such as CodeT5. Using software like Copilot and CodeT5 can assist any developer in generating code based on your intent and patterns.

I recently received an invitation to test the GitHub Copilot, and could not wait for the weekend to take it for a spin! I must say I was thoroughly impressed by what I saw, after a few days of using it. GitHub Copilot recognizes my intent in multiple ways including a documenting comment at the start of the program and automatically generates code that I wished to write. 

For example, a comment such as “# binary search function” at the start of the program file serves as an intent for GitHub Copilot. Copilot in turn suggests the entire function body that I can choose to accept as is. In my experiment, Copilot also suggested other possible solutions as well as test cases that I should add. 

And for further fun, I put in a comment that I wanted the same function to be built as a recursive one. And Copilot came back with a fully generated function that I can use. The following shows how I ended up building a fully functioning code with just a comment and a few tab and enters keys. 

Fig 1: Binary search algorithm in Python — built using Copilot 

Or consider the following where Copilot helped generate a ShoppingItem class. It recognized the attributes that I wanted to include from the comment. When I add a new attribute, it recognized my intent to add getter/setter for the same, include it in the “toString” method, and so on. 

Fig 2: Shopping item class in Java — built using Copilot 

“I am a firm believer that developers and the creativity they bring to the table are key ingredients to software and its quality. Having said that, any technology that helps developers be more productive and reduces mundane activities is always welcome.” 

A typical process for any developer would include 

  • understand the requirements 
  • develop high-level design 
  • low-level design activity of breaking down the problem into smaller and simpler problems, and 
  • finally writing code using frameworks and languages of choice — primarily involving usage of predefined libraries, APIs, etc. 

The last step in the process is where you need an understanding of languages and frameworks. Each language has its own programming paradigms and patterns. 

Developers new to the programming language or framework would see this as a barrier. And for experienced developers, I would say, applying them over a period of time becomes a mundane activity. 

GitHub Copilot excels in this last activity — generating code fragments based on developer intent and following programming paradigms and patterns. 

In this blog I explore 

  • how GitHub Copilot works and look at OpenAI/GPT-3 which is the foundation for Copilot 
  • what developers need to be aware of before they start using it extensively 
  • economics of such technology 
  • other similar solutions that are under development 
  • implications for the industry in the near future 

GitHub Copilot under the hood 

Many of you would have used the autocomplete feature in Gmail. It is driven by models that consider what the user has typed in the current email, email subject, and previous email body among others, and provides autocomplete suggestions. GitHub Copilot works in a similar fashion. It recognizes user intent via a documenting comment, package structure, previously written code, etc., and generates autocomplete suggestions — which in this case is code. Under the hood, Copilot makes use of GPT-3 language model developed under OpenAI initiative. 

OpenAI was founded by Elon Musk and Sam Altman in 2015 with the aim of building artificial general intelligence. OpenAI built Generative Pre-trained Transformer 3 (GPT-3) — it is an autoregressive language model that uses deep learning to produce human-like text. 

GPT-3’s main skill is generating natural language in response to a natural language prompt. GPT-3 is a language model trained on trillions of words from the Internet. And GPT-3’s full version has a capacity of 175 billion machine learning parameters. The quality of the text generated by GPT-3 is so high, that it can be difficult to determine if the text was written by a human, which has both benefits and risks. 

There are a number of models built on GPT-3 — each with different capabilities. More details on each model are available here

Fig 3: GPT-3 models 

GitHub Copilot is powered by OpenAI Codex model. Codex is trained on natural language as well as billions of lines of source code from publicly available sources. Including code in public GitHub repositories. As a result, OpenAI Codex has much of the natural language understanding of GPT-3, but it produces working code. 

Fig 4: GitHub Copilot 

Apart from code autocomplete, GPT-3 has several other applications including 

  • Transpilation: A source-to-source translator, source-to-source compiler. Transcompiler, or transpiler is a type of translator that takes the source code of a program written in a programming language as its input and produces an equivalent source code in the same or a different programming language. This would be handy when converting legacy programs into new technology stack. 
  • Code refactoring 
  • Explaining code 

A word of caution to developers using technology such as Copilot 

GPT-3 at its core is a language model. It takes the user’s intent and predicts what the user is looking to type next. While it is already trained on trillions of words and is a model with 175 billion parameters, it will be constrained based on the training data set. It cannot dynamically come up with new content that it is not trained on. GPT-3 does not understand the design or purpose of your code. It just looks at patterns and predicts what you are going to type next and provides autocomplete suggestions. 

FAQ section in GitHub Copilot · Your AI pair programmer calls out the following limitations that you should be aware of and consider when using Copilot 

  • While we are working hard to make GitHub Copilot better, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted, like any other code. As the developer, you are always in charge. 
  • GitHub Copilot doesn’t actually test the code it suggests, so the code may not even compile or run. 
  • GitHub Copilot can only hold a very limited context, so even single source files longer than a few hundred lines are clipped and only the immediately preceding context is used. 
  • And GitHub Copilot may suggest old or deprecated uses of libraries and languages. You can use the code anywhere, but you do so at your own risk. 

Over my few days of time with the technology, I found it best suited to generating small code fragments with precise intent expressed by the developer. With entire function blocks, it is sub-optimal. 

GitHub Copilot and similar technology will excel in alleviating the developer of learning each new languages’ nuance, programming paradigms, and patterns. Instead of having to Google for possible solutions or search in StackOverflow, GitHub Copilot provides them at your fingertips as autocomplete suggestions. 

The developers can focus on design and breaking down complex problems into simpler ones. And this is a key responsibility of any developer that is not going away anytime soon. 

Economics of code generation 

GitHub has recently said that up to 30% of new code on its platform is written with the help of Copilot. This has significant implications. If this can be translated into time and effort saved, it would result in a significant reduction in $$$ to develop software. 

The commercial model for GPT-3 Codex is not yet available. But even with a licensing cost for using the technology, I can see how almost all software companies can benefit from it. 

Language models like GPT-3 require extensive resources to train and run. And they also need to be regularly updated and fine tuned, which imposes more expenses on the company hosting the machine learning model. Training GPT-3 will take 355 GPU-years and would cost over $4.6M using a Tesla V100 cloud instance. So I can see why such technologies will require a commercial model to be sustainable. 

In 2019, OpenAI transitioned from non-profit to for-profit. OpenAI then announced its intention to commercially license its technologies, with Microsoft as its preferred partner. Microsoft announced on September 22, 2020, that it had licensed “exclusive” use of GPT-3; others can still use the public API to receive output, but only Microsoft has access to GPT-3’s underlying model. 

OpenAI has allowed developers to access GPT-3 via a user-friendly API and is also the heart of its commercial product. 

While this makes financial sense for OpenAI and Microsoft, its primary investor, it does result in a significant limitation for companies that want to extend this further. There are other similar initiatives that are under development like CodeT5. And it would be interesting how this field evolves. 

Other initiatives like OpenAI GPT-3 

CodeT5 is an “Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation”. While GPT-3 is a generic language model, CodeT5 better leverages the code semantics conveyed from the developer-assigned identifiers. CodeT5 employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning. There are pre-trained models on CodeT5 and fine tuning checkpoints that are publicly available via the Hugging Face library. 

CodeT5 provides several capabilities including code generation, code summarization etc. 

I got to play around using the publicly available pre-trained model. I used the code generated using Codex as input to the CodeT5 model :-).

CodeT5 model :-). 

Fig 5: CodeT5 demo of generating explanation of code fragment provided 

Looking ahead 

The quality of the text generated by GPT-3 is so high that it can be difficult to determine whether or not it was written by a human, which has both benefits and risks. 

There is a lot of hype around GPT-3 and Codex at the moment. And rightly so. In my limited time with the solution, I can see why everyone is so excited about this technology. There are a number of applications that will make the life of a developer much more productive. 

Automated code generation technology such as GPT-3 can, in turn, fuel other software and can turn out to be a multi-billion-dollar industry of its own. For example, Debuild based on GPT-3 helps build web apps fast. Debuild is a tool that lets you generate functional web apps from a simple English description. There are many more that have already taken flight. 

Interesting times ahead for the software development industry. 

One of our basic screening tests for candidates applying for a position with us is to develop a program to parse user input such as “1 plus 5 minus 4” and return the result. We look at the thought process of developers, how comprehensive is the algorithm, does it handle precedence rules, what is the time complexity, etc. With GitHub Copilot and a simple documenting comment, I was able to auto-generate the following program including test cases. While this is not the comprehensive program that we expect and does lack in a number of aspects including missing functionality such as it does not work without parentheses, it is a clear indicator of how Copilot can improve developer productivity by taking over some of the repetitive and mundane activities. 

GPT-3 and Codex learn from publicly available code out there — especially GitHub. And as GitHub reports, 30% of new code for some languages in its platform is now suggested by Copilot. So in the near future GPT-3 may be learning predominantly from code it is generating?? 

Subheader 1
Text Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna.Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna.


Subheader 2
Text Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna.

Subheader 3
Text Lorem Ipsum Dolor Sit Amet, Consetetur Sadipscing Elitr, Sed Diam Nonumy Eirmod Tempor Invidunt Ut Labore Et Dolore Magna Aliquyam Erat, Sed Diam Voluptua. At Vero Eos Et Accusam Et Justo Duo Dolores Et Ea Rebum. Stet Clita Kasd Gubergren, No Sea Takimata Sanctus Est Lorem Ipsum Dolor Sit Amet. Lorem Ipsum Dolor Sit Amet, Consetetur Sadipscing Elitr, Sed Diam
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna.

Rajesh Rajagopalan
President and CTO

Share this Article

Latest Insights

Tag Cloud

Share this Article