Registered users can unlock up to five pieces of premium content each month.
What Is Deep Reinforcement Learning and Why Does It Matter? |
NEWS |
Deep Reinforcement Learning (RL) is a technique used by data scientists working in AI. It is an algorithm that can learn to develop solutions over multiple steps to achieve a goal independently from human intervention. The critical challenge for software developers building deep-RL algorithms is to set the goals and reward functions that then inform the learning process. Deep RL has been given top billing as a potential AI technology that could enable far greater autonomation in the automotive, consumer, financial, and industrial spaces.
The field has seen a number of significant advances in the past few years. These advances have allowed deep-RL software agents to play games at superhuman levels. Notable projects in reinforcement learning have included DeepMind’s Deep Q Networks (DQN) (a software that could play Atari games), AlphaGo and AlphaGo Zero (a DeepMind software that became the world’s best player of the board game Go), and OpenAI’s Five which learned to play the video game Dota2. Deep RL has also now seen its first major commercial deployment by Google Cloud, who is using deep-RL software to manage the water cooling systems of its data centers, reducing the water cooling bill by 40% in test iterations.
While these projects have yielded incredibly impressive results, it has proven difficult for other companies either to reproduce or to leverage. When developers downloaded the code in some of the famous projects noted above, they found it extremely challenging to get the code to work in any context other than the one for which the deep-RL software was trained, even if it is generalized. To stand any chance of leveraging these old projects, developers must engage in challenging and time-consuming tasks, such as writing a goal orientation process from scratch; dealing with a lot of filler code around the analysis of the decision trees; and performing lots of code refactoring.
To address these issues and to improve a researcher’s ability to experiment and come up with new applications for deep RL, Google has now launched a new AI framework for developing deep-RL software called Dopamine. Dopamine sits on top of the open-source software library TensorFlow. It offers access to several partially pretrained deep-RL algorithms and to a specialized environment in which to set goals and test deep-RL software agents. The pretrained algorithms are in popular deep-RL architectures such as DQN, C-51, and Rainbow and can be trained from scratch. Dopamine also includes an environment to test and train models called Arcade Learning Environment.
Google Takes Further Control of Deep RL with Dopamine |
IMPACT |
Dopamine represents a significant step forward for researchers looking to develop deep-RL software. Other frameworks do not provide the combination of flexibility and stability necessary to enable researchers to iterate different models effectively and thus explore new research directions. Due to a lack of quality open-source tools, deep-RL development has been confined to those research organizations that have already invested heavily in the technology and have the skills and resources to build algorithms from scratch in nonspecialized frameworks. If executed properly, Dopamine could have the effect of lowering the barrier to entry on training and productizing deep-RL technology.
Dopamine could also extend Google’s lead and influence in the AI framework ecosystem. AI frameworks that fall under the Open Neural Network Exchange (ONNX) bracket—like PyTorch, MXNet, and Caffe2—or under non-Google-related frameworks today do not offer an equivalent extended deep-RL tool set like Dopamine. Dopamine benefits those developers already familiar with TensorFlow, and it may cause more developers to opt for using the framework. If Dopamine proves to be a popular platform, it will also give Google more control over managing the development of deep-RL technology, as it has done so far with TensorFlow and, before that, Android.
There are two other open-source platforms that are currently used in the deep-RL research community for training algorithms. These are OpenAI’s Gym and Unity’s Machine-Learning (ML) Agents. Unity’s ML Agents has been developed in partnership with Google’s DeepMind and is more widely used today, particularly by gaming companies. Dopamine will likely supersede Unity’s and OpenAI’s deep-RL frameworks, because although they have been good for demonstrating the potential of deep RL in the context of the environments they offer, the models have not been easy to reproduce and productize outside of those environments.
Deep RL Still Has Issues to Overcome to Enable Its Wider Adoption |
RECOMMENDATIONS |
More open-source tools for deep RL will provide a boost to the technology. However, three critical issues remain as barriers to the wider application of deep-RL software. These come in the form of the compute resources and expertise required to train algorithms, the training environments available today, and the acquisition and handling of the required data to train deep-RL algorithms to a level where they can be productized.
Deep-RL’s need for dedicated software and hardware: We can explore these issues by looking at DeepMind’s AlphaGo Zero. The deep-RL model that was designed to play the board game Go was trained using self-learning, a process where the deep-RL algorithm was tuned solely by playing itself. AlphaGo Zero was trained on four Tensor Processing Units (TPUs) and took 40 days of training to supersede the previous Go-playing algorithm champ.
Deep-RL’s need for significant investment in this technology: DeepMind has gone from training AlphaGo models on 176 GPUs to just four TPUs in the Zero iteration of the software. On the surface this may seem like a dramatic decline in compute resources, but the truth is very different The AlphaGo project first debuted in 2015; it took until October 2017 for DeepMind to refine the model-training process to be used on just four TPUs. DeepMind had the luxury of utilizing Google’s TPU, a piece of hardware specialized for AI processes like deep RL, but that is only available to users outside of Google in a beta cloud service. DeepMind’s access to resources in terms of time, highly skilled researchers, proprietary custom hardware, and capital allowed the company to go through countless iterations of AlphaGo to optimize the training process and to model the architecture to maximize performance. Most enterprises would not be able or prepared to support research to such a scale, making deep RL an intimidating technology to develop.
Deep-RL’s need for a revolution in how we approach machine programming: It is also worth considering that Go is a game where the training environment is very easy to create. Building software that emulates the context in which, for instance, real-world machines operate is considerably more complex. Researchers will have to increasingly address the issue of how to model environments virtually. Outside of gaming, most software today is designed to operate machines in a given environment. The software and the machines have no understanding of anything outside that environment. Deep RL will invert this paradigm, so that the focus of software developers will be on creating environments that deep-RL machines can learn to operate in by themselves. In terms of building platforms that could support the creation of environments of sufficient quality to train algorithms and in terms of using real-world data to inform these environments, there is much work to be done for wider adoption of deep-RL software to take place. Dopamine’s Arcade Learning Environment may prove sufficient for some basic deployments, but it will not be sufficient for productizing in many scenarios.
Deep-RL’s need for machines to start measuring their own success: When training algorithms to play Go, feedback on each action creates new relevant data instantaneously. In contrast, when applying deep RL to other real-world instances, relevant feedback may be sparse in comparison to actions. The problem of sparse feedback creates a need for far greater exploration in the training process than is required when feedback is always instantaneous, which in turns increases the compute required to build and train a robust deep-RL model. Most machines today are also not capturing enough of the right kind of data to monitor this feedback process sufficiently so that they can deal with the problem of sparse feedback in the first place and support model training and development.
Dopamine holds a lot of promise for the wider exploration of the possibilities of deep RL and an opportunity for Google to further its exploration of AI, but researchers still need to address critical issues with deep RL for it to fulfill the hype that surrounds it today.