Machine learning systems are everywhere. Computer software in these machines predicts the weather, forecasts earthquakes, provides recommendations based on the books and movies we like and, even, applies the brakes on our cars when we are not paying attention.
To do this, computer systems are programmed to find predictive relationships calculated from the massive amounts of data we supply to them. Machine learning systems use advanced algorithms — a set of rules for solving math problems — to identify these predictive relationships using “training data.” This data is then used to construct the models and features within a system that enables it to correctly predict your desire to read the latest best-seller, or the likelihood of rain next week.
This intricate learning process means that a piece of raw data often goes through a series of computations in a given system. The data, computations and information derived by the system from that data together form a complex propagation network called the data’s “lineage.” The term was coined by researchers Yinzhi Cao of Lehigh University and Junfeng Yang of Columbia University who are pioneering a novel approach toward making such learning systems forget.
Considering how important this concept is to increasing security and protecting privacy, Cao and Yang believe that easy adoption of forgetting systems will be increasingly in demand. The pair has developed a way to do it faster and more effectively than what is currently available.
Their concept, called “machine unlearning,” is so promising that the duo have been awarded a four-year, $1.2 million National Science Foundation grant — split between Lehigh and Columbia — to develop the approach.
“Effective forgetting systems must be able to let users specify the data to forget with different levels of granularity,” said Yinzhi Cao, Assistant Professor of Computer Science and Engineering at Lehigh University’s P.C. Rossin College of Engineering & Applied Science and a Principal Investigator on the project. “These systems must remove the data and undo its effects so that all future operations run as if the data never existed.”
There are a number of reasons why an individual user or service provider might want a system to forget data and its complete lineage. Privacy is one.
Naturally, users unhappy with these newfound risks want their data and its influence on the models and statistics to be completely forgotten.
Security is another reason.