Protecting web browsing data from hackers
Protecting web browsing data from hackers
Researchers find the root cause of side-channel attacks that are easy to implement but hard to detect
Posted: Tuesday, July 5, 2022 – 12:02 PM
MMalicious agents can use machine learning to launch powerful attacks that steal information in ways that are hard to prevent and often even harder to investigate.
Attackers can capture data that “leaks” between software programs running on the same computer. They then use machine learning algorithms to decode these signals, allowing them to obtain passwords or other private information. These are called “side channel attacks” because the information is acquired through a channel not intended for communication.
MIT researchers have shown that machine learning-assisted side-channel attacks are both extremely robust and poorly understood. The use of machine learning algorithms, which are often impossible to fully understand due to their complexity, is a particular challenge. in a new paper, the team investigated a documented attack that was thought to work by capturing leaked signals when a computer accesses memory. They found that the mechanisms behind this attack were misidentified, which would prevent researchers from designing effective defenses.
To study the attack, they removed all memory access and noticed that the attack became even more powerful. Next, they looked for sources of information leaks and discovered that the attack was actually monitoring events that interrupt other processes on a computer. They show that an adversary can use this machine learning-assisted attack to exploit a security hole and determine the website a user is browsing with near-perfect accuracy.
With this knowledge in hand, the team developed two strategies that can thwart this attack.
“The focus of this work is really on the analysis to find the root cause of the problem,” says lead author Mengjia Yan, Homer A. Burnell Career Development Assistant Professor of Electrical Engineering and Computer Science (EECS) and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT. “As researchers, we should really try to dig deeper and do more analysis, rather than blindly using black-box-type machine learning tactics to demonstrate one attack after another. The lesson we learned is that these machine learning-assisted attacks can be extremely deceptive.
The lead author of the article is Jack Cook, a recent computer science graduate. Co-authors include CSAIL graduate student Jules Drean and Jonathan Behrens. The research will be presented at the International Symposium on Computer Architecture.
A secondary surprise
Cook started the project while taking Yan’s advanced seminar course. For a class assignment, he tried to replicate a machine learning-assisted side-channel attack from the literature. Previous work concluded that this attack counts the number of times the computer accesses memory when loading a website and then uses machine learning to identify the website. This is called a website fingerprinting attack.
He showed that previous work relied on faulty analysis based on machine learning to incorrectly identify the source of the attack. Machine learning cannot prove causation in these types of attacks, Cook says.
“All I did was remove the memory access, and the attack still worked just as well, if not better,” Cook says. “So, I wondered, what actually opens the side channel?”
This led to a research project in which Cook and his collaborators embarked on a careful analysis of the attack. They designed an almost identical attack, but without memory access, and studied it in detail.
They discovered that the attack actually records a computer’s timer values at fixed intervals and uses this information to infer which website is being accessed. Essentially, the attack measures computer occupancy over time.
A fluctuation in the timer value means that the computer is processing a different amount of information in that interval. This is due to system interrupts. A system interrupt occurs when computer processes are interrupted by requests from hardware devices; the computer must pause whatever it is doing to process the new request.
When a website loads, it sends instructions to a web browser to run scripts, display graphics, load videos, and more. Each of these elements can trigger many system interrupts.
An attacker monitoring the timer can use machine learning to infer high-level information from these system interrupts to determine which website a user is visiting. This is possible because the interrupt activity generated by a website, like CNN.com, is very similar each time it loads, but very different from other websites, like Wikipedia.com, Cook explains.
The attack is extremely successful. For example, when a computer is running Chrome on the Mac operating system, the attack was able to identify websites with 94% accuracy. All commercial browsers and operating systems they tested resulted in an attack with over 91% accuracy.
There are many factors that can affect a computer’s timer, so figuring out what led to an attack with such precision is like finding a needle in a haystack, Cook says. The team conducted many controlled experiments, removing one variable at a time, until they realized the signal must arrive for system interrupts, which often cannot be handled separately from the attacker’s code.
Once the researchers understood the attack, they devised security strategies to prevent it.
First, they created a browser extension that generates frequent interruptions, like pinging random websites to create bursts of activity. The added noise makes it much more difficult for the attacker to decode the signals. This dropped the attack’s accuracy from 96% to 62%, but it slowed down computer performance.
For their second countermeasure, they modified the timer to return values close to the actual time, but not. This makes it much harder for an attacker to measure computer activity over an interval, Cook says. This mitigation reduced the accuracy of the attack from 96% to just 1%.
“I was surprised how such a small mitigation like adding randomness to the timer could be so effective,” he says. “This mitigation strategy could really be implemented today. It does not affect how you use most websites.
From this work, the researchers plan to develop a systematic analysis framework for machine learning-assisted side-channel attacks. This could help researchers find the root cause of more attacks, Yan says. They also want to see how they can use machine learning to uncover other types of vulnerabilities.
“This paper introduces a new interrupt-based side-channel attack and demonstrates that it can be used effectively for website fingerprinting attacks, where previously such attacks were considered possible due to side-channels. sides of the cache,” says Yanjing Li, an assistant professor in the University of Chicago’s Department of Computer Science, who was not involved in this research. “I liked this article immediately after reading it for the first time, not only because the new attack is interesting and successfully challenges existing notions, but also because it highlights a key limitation of attacks by ML-assisted side channel: Blindly relying on machine learning. Models without careful analysis can provide no understanding of the real causes or sources of an attack, and can even be misleading. It’s very insightful, and I believe it will inspire much future work in this direction.
This research was funded, in part, by the National Science Foundation, the Air Force Office of Scientific Research, and the MIT-IBM Watson AI Lab.
First published June 9, 2022 at MIT News.