Points of View

Memory Networks Part I: Fix your AI’s goldfish memory by putting memory networks on your radar

Feb 28, 2019 Reetika Fleming Maria Terekhova

As we recently discussed, deep learning (DL) is the most sophisticated form of fundamental AI available to enterprises today(see Exhibit 1). Deep learning networks’ multi-layered structure makes it more computationally powerful and capable of performing complex tasks than other forms of AI (see Exhibit 2).


However, DL systems still have a fundamental limitation: the systems don’t have the equivalent of a memory. Without some type of memory, DL networks are incapable of developing the equivalent of transferrable human skills, rendering them costly to train and less useful than they could be for enterprises—especially those reliant on highly transactional interactions with clients or sensitive materials—looking to scale their AI implementations. A growing number of companies are trying to remedy this shortcoming by developing DL networks that can learn and apply knowledge more adaptively, making them more broadly useful and resource-efficient for enterprises.


Exhibit 1: The building blocks of AI


Source: HFS Research, 2019



Exhibit 2: Using deep learning networks for facial recognition


Source: MongoDB, 2018



Memory networks could be the key to help enterprises unlock deep learning’s full value


DL systems learn by adjusting their parameters to a specific task, meaning they have to reset these parameters every time they start on a new task, a phenomenon known as “catastrophic forgetting.” As such, they cannot transfer knowledge as memories between tasks, which makes them incapable of creating any sense of context, e.g., building a full picture of a customer or other entity or procedure, that exists beyond a one-time transaction. For instance, a memory-enhanced customer service chatbot would be able to remember the previous interaction it had with the same customer in the past and use that prior knowledge to anticipate some of the customer’s requests and preferences. If the customer was booking a specific hotel for the second time, for example, the chatbot could suggest the same type and quality of room. Enhancements like this could potentially make the reservation process both more efficient and more seamless for the consumer.


Because DL systems are significantly more GPU-intensive and data-hungry than simpler ML networks, having their memories wiped between tasks means more resources are spent on re-training them and obtaining more training data between tasks. This lack of memory makes DL systems less adaptable and diminishes their usefulness in dynamic environments.


The goal of advanced memory network researchers is to develop deep learning systems capable of learning sequentially rather than from absorbing large blocks of data in one go. As one author puts it, underpinning these efforts is “a novel way of looking at sequential data: instead of analyzing it piece by piece, updating an internal fixed-size memory representation (that forgets more from the past the more inputs it gets), memory networks consider the entire history so far explicitly, with a dedicated vector representation for each history element, effectively removing the chance to ‘forget.’” In theory, this should make DL networks more adaptable and able to move between diverse contexts in complex real-world environments.


The outcome of such tweaks is DL networks with “the equivalent of a working memory system that can store fragments of inferred knowledge and their relationships so that it can be easily accessed from different layers in the network.” This ability is the equivalent of teaching an employee a transferable skill that they can apply to another role when they move jobs.


Recently, advances in AI research and breakthroughs in neuroscientific research have led to significant progress in making such memory networks a reality. Among recent developments are recurrent neural networks (RNNs), neural Turing machines (NTMs), and convolutional neural networks (ConvNets), to name just a few.


Enterprises should keep memory network R&D leaders on their radars or risk being left behind


Several companies are emerging as pioneers in the memory network field. Much of this research is still academic, but it is focusing on commercializing the technology. Moreover, some companies have already started selling their solutions to enterprises. Below is our shortlist of leaders:

  • DeepMind. Google’s DeepMind Technologies was already responsible for building the public’s perception of AI when its AlphaGo system beat the best human player at the incredibly complex game of Go in 2017. That same year, DeepMind and Imperial College London, a renowned medical research institution, designed and tested the elastic weight consolidation (EWC) algorithm, modeled on the way the hippocampus stores useful experiences and memories for future use (see Exhibit 3). DeepMind used the EWC algorithm to augment its earlier Deep Q-Network, which made headlines by demonstrating extreme prowess with Atari games. The “elastic” in EWC refers to visualizing neural connections as springs, whose tightness increases based on how useful they are for solving a task. The tighter the spring, the more difficult it is to change it between tasks. The AI system analyzes its performance after every task and retains the connections that helped it execute well, carrying it over to the next. The EWC-enhanced Deep Q-Network, which was trained to play 10 different games (including Atari offerings), was proven to be capable of sequential learning and memory retrieval between games and was able to play 7/10 of the games as proficiently as a human. DeepMind’s next steps are to use and improve upon these findings to create a “general-purpose intelligence” able to adapt to any context, akin to a human brain.


Exhibit 3: EWC performance in training


Source: DeepMind, 2017

  • Facebook AI Research (FAIR). Founded in 2013, Facebook’s dedicated AI research center today has offices (or “labs”) in North America, Europe, and the Middle East. Sitting apart from Facebook’s in-house ML unit and with a separate, less product-focused mandate, it partners with the open-source community and academic institutions on long-term AI research projects. A reflection of its open-source ethos, much of the research coming out of FAIR is published as papers open to the general public and targeted specifically at the open-source developer community. FAIR’s focus is predominantly on developing AI solutions that allow for more seamless consumer interaction with data systems. Its ultimate goal is to make neural networks capable of self-supervised learning and exhibit something like “common sense,” and memory networks play a significant part in this. FAIR has already developed a system with a form of long-term memory that can extract information from unsupervised, dispersed sources like Wikipedia articles to answer a specific question, the result of open-source collaboration. Using recurring neural nets (RNNs), it also developed a system that, after analyzing a single passage from the Lord of the Rings book series, could later answer questions about the series as a whole. FAIR says its research could be instrumental in making the next generation of cognitive assistants far more flexible, knowledgeable, and efficient.


Exhibit 4: FAIR research timeline


Source: Facebook


  • Nnaisense. One of the few startups to break into the memory network race, US- and Switzerland-based Nnaisense was founded in 2014 by Jurgen Schmidhuber, the AI researcher credited with developing long short-term memory (LSTM), a form of RNN, earning him the moniker “the guy who taught AI to remember.” The company’s mission statement is to “build large-scale neural network solutions for superhuman perception and intelligent automation, with the ultimate goal of marketing general-purpose neural network-based artificial intelligence.” Its activity to date reflects this commercial focus: despite its youth, Nnaisense’s technology is already being used by German asset manager Acatis for financial market prediction, by a large unnamed steel producer for defect detection and materials classification, and—perhaps most publicly—by car giant Audi to autonomously park a model car. The two partners developed a reinforcement learning system that trained the car’s computer, using an RNN, to park in a simulation environment using raw car camera data. The trained RNN was then able to navigate and park the car without human supervision.


The Bottom Line


As this shortlist indicates, the holy grail of memory network R&D is developing a “general-purpose artificial intelligence,” i.e., an AI that’s capable of learning as adaptively and quickly as a human. Fast, adaptive learning would be of obvious value to enterprises looking to automate as much of their operations as possible, as it would make it easier to disseminate AI throughout their organizations.


Today, one of the biggest challenges to scaling AI through organizations and developing holistic AI strategies is that AI is trained to perform very specific tasks and specialize in narrow areas, in part due to training data scarcity and in part, as we’ve seen, to AI systems’ inability to develop transferable skills. As such, endowing AI with memory could be the key to overcoming this gargantuan challenge and disseminating AI throughout the business world. However, as we’ll see in Part II of this POV, what memory networks can achieve is still different in practice and in theory. 

Sign in or register an account to access HFS' Content

Sign In

Create an account

Enter a phone number
Select the newsletter(s) to which you wish to subscribe.