As an AI development intern at VIA Technologies, I had an opportunity to put many of the mathematics and computer science principles from my undergraduate studies to practical use. The internship gave me a chance to work on a number of assignments and projects, involving everything from relatively simple machine learning algorithms, to extremely deep, complex neural networks. By far the most interesting project I worked on was an AI text generation project.
The objective was to create an AI that could automatically generate marketing material including social media posts, and possibly even full-length articles. This would allow us to automate much of the sales and promotion work that VIA employees currently engage in, freeing them to perform more important tasks. Though I was certainly a bit surprised when I received the initial project assignment, I was more than ready to dive in. I had been studying and learning about deep learning networks for some time, and this project felt like the perfect opportunity to sink my teeth and gain some proper experience in the field.
Designing the Neural Network Architecture
The first decision I had to make was how to design my neural network for the task at hand. While we often hear the term “neural networks,” one might assume them all to be basically the same, however they actually vary wildly based on what you wish to use them for. There are convolutional neural networks for computer vision tasks, fully connected networks for statistical modeling, and type I was most concerned with, recurrent neural networks (RNNs) which are used primarily for text generation.
Before moving forward, it’s probably best to explain what exactly a recurrent neural network is. What I have below are some ‘vanilla’ examples for the most common RNN architectures.
In this graphic, pink boxes represent inputs, green boxes represent the learned values of the network and the blue boxes represent outputs. As can be seen by the arrows, what makes an RNN special is that learned values throughout the network take in, as inputs, the previous learned values from earlier in the network. Moreover, unlike other neural networks that just give a single output, RNNs are capable of returning a sequential output of varying length. This makes them especially effective for language translation, or language modeling.
How Language Modeling Networks Work
All of this talk of weights and learning might sound intimidating, but I promise I’ll bring it home to something we can more easily understand. We’ve all seen the predictive text feature on iMessage right? Type in any word, and it will immediately suggest the next word, or more accurately, the word most likely to appear next given the specific word you just typed. That’s exactly how a language modeling network works. Given a particular start word, it can then suggest a new word. Then based on that new word, it suggests the next new word – so on and so forth until it decides to stop.
To make the process a bit clearer, here’s an example of a character-level language model:
As we can see, this model is trying to predict the world ‘hello’ given the starting character ‘h.’ If we simply extend this to a word level model, and include the entire English dictionary as our classes, we have something that can be trained.
Implementing and Training the Model
I implemented the model using PyTorch deep learning framework, which was developed by Facebook. As a result, the actual implementation of the model was not the most difficult part. As experimentation with deep learning has become more common, deep learning frameworks have become more intuitive and easy to use. Therefore, after just a few weeks of coding, I had arrived at a workable architecture.
Now, all that was left to do was train the model. A task as it turns out, that represented the largest hurdle. In order to train these networks, you need incredible amounts of data of the kind of text you want to generate. Unfortunately, I did not have access to nearly enough marketing material to generate clear, reliable text. The sentence below is perhaps the most comprehensible sample marketing Tweet generate by my model:
‘Autonomous vehicles are here, now, in your home. Find out technology, alter your management’
However, as a proof of concept that my model did indeed work, I trained it on some much larger data sets, including the complete works of William Shakespeare and the source code for the Linux operating system. The results are show in order below:
Though the project is still certainly a work in progress, it does demonstrates the incredible advancements that are being made in language modeling as a result of deep neural networks. I am hugely thankful to VIA for giving me the opportunity to work on this AI text generation project, and look forward to pursuing the topic further in the future.
Written by Ajay Chopra, Intern at VIA Technologies, Inc.