Reinforcement Learning a Human-Inspired Machine Intelligence

17th June 2023

The NIMI group’s activities at InfAi Institute involve developing novel RL algorithms and techniques tailored to the specific challenges of optimizing mechanical ventilation and extracorporeal lung support.

Share this Post

Authors (NIMI Team, Institute for Applied Computer Science e.V.): Dr. Sahar Vahdati, Dr. Roman Ließner, Farhad Safaei, Jason Li, Nikhil Ostwal, Prathmesh Dudeh

Artificial Intelligence (AI) has been a subject of interest and research for several decades. However, there have been a few breaking moments that brought AI to the forefront of public awareness and sparked widespread discussions about its ability to overcome human intelligence in certain tasks.

One of the significant breaking news was when IBM’s chess-playing computer, Deep Blue, defeated world chess champion Garry Kasparov in 1997 which showcased the potential of AI in complex decision-making tasks. Many years later, in 2011, IBM’s Watson, a question-answering AI system, competed on a popular quiz show against former champions Ken Jennings and Brad Rutter. Watson’s victory captured public attention and raised awareness about AI’s potential in understanding and processing human language. In 2016, Google DeepMind’s AI program AlphaGo defeated the world champion Go player Lee Sedol. This milestone demonstrated AI’s ability to master complex, strategic games and highlighted its potential in tackling real-world challenges. The magic behind these algorithms is Reinforcement Learning (RL) that draws inspiration from the principles of human intelligence to develop algorithms capable of learning on making/suggesting decisions in dynamic environments.

The foundations of Reinforcement Learning were laid by Alan Turing, Norbert Wiener, and Claude Shannon on machines simulating human intelligence and adapting to their environment through feedback mechanisms. Simultaneously, the Markov Decision Processes was established by Richard Bellman. Bellman’s work introduced the notion of value functions and the Bellman equation, which became fundamental to RL algorithms by providing a recursive way to calculate optimal policies. It is worth mentioning that the very early and real application of RL was initially done by Donald Michie on Tic-Tac-Toe testing. Later, the concept of RL began to take shape in the 1970s when researchers like Richard Sutton and Christopher Watkins developed algorithms that enabled agents to learn through interactions with their environment and receive rewards or punishments based on their actions. RL has found applications in various fields. It has been applied to robotics, autonomous vehicles, recommendation systems, finance, healthcare, and more. RL algorithms have been used to optimize complex decision-making processes and achieve performance levels that were previously unattainable.

By breakthroughs in deep learning and the availability of vast amounts of data in the early 2010s, again renewed interest in RL. Deep RL combined RL with deep neural networks, enabling agents to learn directly from high-dimensional sensory inputs. Notable advancements, such as Deep Q-Network (DQN) by DeepMind, demonstrated the ability to surpass human performance in playing Atari games.

The release of OpenAI’s language model, GPT-3, in 2020 showcased unprecedented language generation capabilities, able to produce coherent and contextually relevant text across various domains. Currently, we are experiencing the breakthrough of

Large Language Models (LLMs). LLMs, such as OpenAI’s GPT (Generative Pre-trained Transformer) models, which have demonstrated remarkable capabilities in natural language processing and generation. These models have shown significant advancements in language understanding, contextual reasoning, and generating responses that can mimic human-like conversation. Reinforcement Learning also plays a significant role in the development and training of LLMs by providing a framework for training and optimizing to generate contextually relevant, coherent, and engaging text. RL is mainly used in Fine-tuning and Policy Optimization of LLMs, Reward Modeling, Exploration and Exploitation, Human Feedback, Curriculum Learning, and Adaptive and Interactive Systems. It helps enhance the quality, versatility, and adaptability of these models for a wide range of natural language processing tasks and applications.

Here at NIMI, RL is one of the mainstream research directions. We focus on the application of RL in healthcare in the IntelliLung project. The project is particularly focused in the care of invasively mechanically ventilated patients with acute respiratory failure. The group’s activities involve developing novel RL algorithms and techniques tailored to the specific challenges of optimizing mechanical ventilation and extracorporeal lung support. Through data analysis and modeling, our group aims to improve the combination of device settings by leveraging RL to learn from deviations made by experts and the patient’s condition. The ultimate goal is to enhance patient care by reducing the risk of lung injury while ensuring effective respiratory support. The group’s interdisciplinary approach combines expertise in artificial intelligence, medical research, and healthcare to create innovative solutions that can significantly impact patient outcomes in critical care settings.

If you would like to get in contact with the NIMI Team, please read here.