A few weeks ago we were treated to an opportunity to witness two of the world’s most influential thought leaders have an open discussion on AI. Taking the form of a traditional panel, the event was hosted by Epan Wu, President of Embedded and IoT Business unit here are VIA. The panel took place at HTC Headquarters in Taipei, Taiwan and was sponsored by HTC, DeepQ, VIA Technologies, Inc. Vive, and TVBS.
The lead panelist was the world renowned Dr Yann LeCun, a French-American computer scientist working primarily in the fields of machine learning, computer vision, mobile robotics, and computational neuroscience. Perhaps most notably, he is also chief AI Scientist at Facebook. Dr LeCun and Epan were joined Dr Edward Chang, a pioneer of data-driven deep learning currently serving as the President of AI Research and Healthcare (DeepQ) at HTC, and an adjunct professor at Stanford CS department. Read on for summary of some of the fascinating topics that were discussed by the panel (note: non-readers can go ahead and watch the video in full).
How do Computers Learn? What is Self-Supervised Learning?
Epan began the panel by asking about the current state of AI development and the origins and impact of self-supervised learning. Dr LeCun explains that the reason that AI has been able to develop more rapidly in recent years is because of the success of Deep Learning, a phenomena directly resulting from the availability of large data sets and the processing power now available from modern GPUs (Graphics Processing Units). The challenges faced today lie not in having sufficient computational power (historically a major inhibitor to progress), but rather the problem involves the type of learning that we are using in AI systems. The learning paradigms we are using to teach machines are substantially less efficient than those used by humans.
Before we get to the idea of ‘self-supervised learning’, let’s examine the two dominant paradigms current used in today’s AI applications; ‘supervised learning’ and ‘reinforcement learning’. Supervised Learning involves teaching or training a machine to classify images, say for example differentiate between cats, dogs and tables. It involves showing the machine thousands of examples of cats, dogs and tables. When the machine fails to give a correct answer, a human corrects the machine so that that the answer produced is closer to the correct answer. So in this approach, you give the computer the correct answer. Reinforcement learning works a little differently. You don’t tell the machine the right answer, you just tell it if the answer is good or bad, and it has to figure out by a process of trial and error which answer is the best one.
Both kinds of approach are really inefficient, requiring a lot of interactions, a lot of trials and a huge set of samples. Dr LeCun uses an example from his work at Facebook where the AI is trained using supervised learning and a data set of approximately four billion photos. This is the basis by which Facebook is able to curate your news feed, to present things you may be interested in, or products you might want to buy. This is made possible by the sheer size of the data set and the millions of interactions involved.
Reinforcement learning is perhaps even less efficient. For example, to play a classic Atari-style video game from the 1980s, a computer using reinforcement learning would require eighty to one hundred hours of practice to reach a level of performance that a human can achieve in around fifteen minutes. Eventually the machine can become superior to humans, but the process of trial and error means that it takes a very long time.
Let’s apply this to autonomous vehicle development. It is completely impractical to use either of these two common techniques to teach a car to drive itself. The car would have to drive for millions of hours, have tens of thousands of accidents, run off the edge of cliffs, and perhaps even kill pedestrians as part of the learning process – clearly this would be utterly impractical in the real world.
So how is it possible that a human is able to master basic driving skills in around twenty to thirty hours of practice, and do so without causing any accidents?
Humans have a model of the world in our head, and we know how the world works. If the car is near a cliff edge, we can infer that the car will run off the cliff with disastrous consequences. We have a predictive model that can predict when something bad is going to happen. A predictive model that does not rely entirely on experiential data. Self-supervised learning attempts to address this issue, giving the computer a model of the world, so that it can make accurate predictions about consequences.
How do Humans Learn? Can Machines Use the Same Technique?
Epan Wu notes that despite successful AIs seeming to have an almost magical quality, the average toddler has a better grasp of the world around them. Referencing the importance of self-supervised learning, Dr LeCun first argues that infant children and animals have an innate ability to develop an understanding of the world around them. This is based largely on the accumulation of an enormous amount of background knowledge about how the world works, acquired from observing the world around them. Eventually it allows infants to develop what we refer to as ‘common sense’, or an ability to ‘fill in the blanks’.
To illustrate further, Dr LeCun then reminds everyone in the room that although we cannot see his back right now, or the area of floor directly behind him, we can all infer what his back looks like, because we have seen many backs before. Likewise we are likely to correctly infer that the floor behind looks very similar to the floor in front of him.
He then offered a second example using language, and the example sentence, ‘John picks up his bag and leaves the conference room’. From this sentence we can immediately infer lots of information – John is probably at work, he is picking up his bag using his hand, he’s walking out of the room, he’s opening the door… etc. There are many things we can infer that perhaps are not so obvious, for example John is probably a man, he is not going to dematerialize out of the room, he’s not going to fly, and he’s not going to pick up his bag using telekinesis. We also know that once he has left the room, he will not be in the room anymore because John cannot be in two places at the same time.
Having background knowledge about how the world works allows us to make these inferences with a high degree of accuracy. As humans we take this for granted, but in fact we had to learn all of these things – that the world is three dimensional, the fact that there are objects in the world, all of which we learn in the first few months of life. How do we get machines to learn this? How do we give machines the enormous amount of knowledge needed to develop ‘common sense’?
What about Human Level Intelligence?
Epan then moved the discussion on to the topic of human level AI and the massive gap between that and what we have achieved so far. She asks the panel if they could identify a realistic target for achieving human level intelligence. Dr LeCun begins by noting that in his opinion human level AI is a preferable term to Artificial General Intelligence (AGI), which, despite being popular in certain circles, is actually a complete misnomer, largely because human intelligence is actually very specialized and should not be considered general.
He notes that we often don’t appreciate just how specialized our intelligence really is, because we don’t have a capacity to recognize intellectual capacities that we don’t have. So we tend to think that our intelligence is somewhat general, but in fact it is not, we are really quite specialized. Take the example of AphaGo, an AI developed to play the popular game Go. After AlphaGo had proved its competitive superiority, many of the best human players in the world were forced to concede that they were many levels behind an AI developed specifically for the task of playing Go. Humans are therefore not as intelligent as we often think we are.
There are many narrow domains where an AI can become superior to a human, but these remain somewhat narrow – the domain of driving cars is not yet possible for example. Dr LeCun then expressed his strong personal belief that there will come a time when machines will have superior intelligence to humans, in all areas where humans are intelligent. However, he refutes the idea that an AI could one day take over the world, as we frequently see in Science-Fiction for example. He argues that the desire to dominate is not linked to intelligence, that in fact most intelligent beings have no desire to dominate. The most intelligent humans often have little, or no desire to assume the role as commander-in-chief.
The Human Brain as a Model for AI
Epan asked the panel about the missing pieces in the development of human level AI. Dr Chang began by pointing out that we have currently achieved some level of perception using CNNs (Convolutional Neural Networks). Examples include image recognition and natural language processing. We also know that humans have four different lobes in our brains, and that perception is only handled by our frontal lobe with other lobes handling memory, knowledge and so on. Recently however neural science has discovered that although the functions of each lobes are different, physiologically they are very similar, mostly resembling neural nets. Some scientists believe that using a computation model with four lobes can help us get closer to human level Intelligence.
Dr Chang also raises the question of how to model knowledge using only a neural net. Our intuition tells us that knowledge is represented by natural language. If we think about knowledge we think about a huge library, we think about books. Humans differ from animals because we can accumulate knowledge through natural language. Books contain natural language and photos or images. If we can model that information, and also try to fuse it with perception, perhaps this would an interesting direction to explore.
Language, Intelligence and Thought
Although Dr LeCun agrees, he points out that this raises very interesting questions about whether or not language is an essential element of intelligence. Many intelligent species of animals do not use language, for example Orangutans use very little language and do not communicate much with each other, yet they grasp abstract concepts that allow them to build tools and houses. He argues that we have to be careful when assuming that language plays an essential role in thought, especially because there is a strong urge for us to do so. It’s important for us to remember that there are forms of intelligence that do not require language.
Dr Chang goes on to mention how progress in the area of genomics now suggests that perhaps pseudo-information could be passed through our genetics. When we are born, we are not born with nothing, but that we are probably initialized by genomics and that genomics can explain certain kinds of knowledge, such as the idea of being hungry, which we describe as prior knowledge, or instincts embedded in our DNA.
The panel go on to discuss a variety of fascinating topics, including the difference between animal and human intelligence, the structure of the human brain, the different approaches to AI development being used around the world, the topic of data in the context of GDPR, the role of governments and more.
If you haven’t already, check out the AI Panel with Dr LeCun and Dr Chang in the video below: