Using a CAREER award to advance the Insight Computer Architecture Lab

Towards a better GPU: Adwait Jog leads the Insight Computer Architecture Lab in William & Mary's Department of Computer Science, supported by a recent CAREER award from the National Science Foundation. Photo by Stephen Salpukas
The Insight Computer Architecture Lab: Adwait Jog (second from left) is an assistant professor of computer science leading of team that includes grad students Gurunath Kadam, Mohamed Ibrahim, Hongyuan Liu and (not pictured) Haonan Wang. Photo by Stephen Salpukas

Photo - of -

by Joseph McClain | March 7, 2018

GPUs — graphics processing units — aren’t just for computer graphics any more.

Actually, the GPU outgrew its role as little brother to the central processing unit (CPU) some time ago. Adwait Jog says that computer architects have been finding non-graphic uses for GPUs for a decade or so.

Jog is an assistant professor in the William & Mary Department of Computer Science. He leads the Insight Computer Architecture Lab, dedicated to advancing the performance of GPUs. He recently received a five-year CAREER award from the National Science Foundation to continue research and work with students on the next generation of GPU architecture.

The NSF awards CAREER funding to “early-career faculty who have the potential to serve as academic role models in research and education and to lead advances in the mission of their department or organization.”

The Insight Computer Architecture Lab deals with the often knotty trade-offs that arise when striving to design computer hardware that will deliver optimum performance. You sometimes increase one quality at the expense of another. For example, latency refers to how long it takes to move data from point A to point B. The lower the latency, the faster the processing.

Then there is throughput — the amount of data moving from point A to point B. Low latency/high throughput is the goal. Basic questions of latency and throughput are addressed by the strategic assignment of computational tasks to the kind of processors that do them best.

CPUs, Jog explained, are good for operations where speed — low latency— is key. On the other hand, GPUs are designed for high throughput. Originally engineered to handle a firehose of commands that resulted in continual refreshing of what a user sees on a monitor, GPUs were put into service for computational tasks that had little or nothing to do with graphics.

Such as?

“Bitcoin mining. That’s a very good example of what people are using GPUs for today,” Jog said.

Bitcoin mining is an open computational competition to verify bitcoin transactions. To complete the verification process, the miner must solve a complex mathematical problem, known as proof-of-work. Proof-of-work calculations require a lot of computing power and time is very much of the essence, as the first miner to submit the proof-of-work solution is rewarded with bitcoins.

To solve problems more quickly, miners invest in more and more powerful, high-throughput GPUs, driving up the price of graphic processing units — to the annoyance of video gaming enthusiasts and other GPU consumers.

Jog says that the interest in advanced GPUs goes far beyond bitcoin miners and gamers.

“If you want to do anything artificial intelligence or object detection, you will be interested in GPUs,” he said. “GPUs are used for object detection because they're very fast.”

He added that self-driving cars are a good example of the emerging use of GPUs for object detection. “Tesla has a good collaboration with Nvidia,” he said, referencing the U.S. GPU-design firm that has donated equipment to his lab.

Jog and his lab are taking on the challenges involved in designing GPUs for the new generation of high-performance computing machines capable of computing at exascale. Computing performance is expressed in units called flops — floating point operations per second.

The goal of exascale GPUs is 10¹⁸ flops. Jog pointed out that current GPUs top out in the neighborhood of 10¹² — teraflop range. The road to exascale is filled with trade-offs, but a few things are clear. For one thing, there are practical reasons why a single, large GPU isn’t the answer.

“The size of a GPU is becoming more and more an immediate concern,” Jog explained, “because the chip size is already huge. If you make it bigger and it breaks or there's some sort of fail and you have to throw it away and it’s a big problem.”

Jog is working on an idea that goes in the opposite direction. Instead of designing big, vulnerable chips, he is pursuing what the industry has termed chiplet technology.

“You build tiny chips, then put them together,” he said. “Multiple chiplets work together.”

Many chiplets can be assembled into a single GPU. The chiplet approach represents yet another way to improve GPUs by distributing computational eggs over a large number of processing baskets. The harnessing of many processing units to split up a computational task is known as parallelism.

Jog said the Insight Computer Architecture Lab has external funding of more than $1 million, including $450,000 from the NSF CAREER award allocated over five years and some important donations of equipment from Nvidia. The CAREER award will help to support the graduate students working with Jog in the Insight Computer Architecture Lab: Mohamed Ibrahim, Gurunath Kadam, Hongyuan Liu and Haonan Wang.

Jog explained that the lab is approaching the demands of GPU architecture from four perspectives.

“One is pure performance,” he said. “With chiplet technology, there is a concern about data movement. You need to move the data around to do computation on it. Data movement is expensive from an energy perspective, from a cost perspective and from a performance perspective.”

Jog said his lab is developing techniques to minimize data movement, basically by trying to localize computation packets onto a single chiplet. He added that data-management concerns can be addressed through software as well as hardware.

“We’re focusing on the hardware, because that’s my specialty. I’m looking at different aspects of data movement reduction. Approximation is one of them,” he said.

Approximation, or approximate computing, is another trade-off. It is essentially a strategic sacrifice of some degree of precision to gain increased performance.

The second perspective is how to optimize the resources of GPUs in challenging, but common, conditions such as those involving multiple users and multiple applications — often in cloud-based situations.

“From the hardware perspective, if two or three people are playing games, I want to put them on the same hardware, if possible. So, if the GPUs are larger, more people can be co-located on the same hardware,” he said.

Jog acknowledged that it’s becoming increasingly difficult to satisfy — and predict — the computing wants and needs of an increasingly heterogeneous computing public. The latency/throughput equation is prominent in optimization discussions.

“It’s very difficult from the hardware perspective to make sure that all users are happy,” he said. “That is very challenging, because every user has different requirements. Maybe someone wants low latency; maybe someone else wants high throughput.”

The third concern for the Insight Computer Architecture Lab is security.

“That’s always a concern for anything I do. We want to make sure that the data that goes through our GPUs is safe,” Jog said, adding that bitcoin is a good example. “Bitcoin has secret keys inside it. I want to make sure that nobody steals them.”

The fourth concern, but not the least, is reliability of the GPUs. That part of the project is in collaboration with Evgenia Smirni, the S.P. Chockley Professor in the computer science department.

“We want to make sure that GPUs don’t fail that often,” Jog said.