EXTREEMS mathematics initiative takes dead aim at 'big data'

William & Mary’s mathematicians are taking data analysis to a new extreme — and they’re looking for students to join them.

The Department of Mathematics was awarded an $880,000 grant from the National Science Foundation (NSF) for a new program called EXTREEMS-QED. This five-year grant will support undergraduate research in computational and statistical theory and techniques in the study of large data sets.

Professor of Mathematics Junping Shi is the principal investigator of the program. His co-PIs are Assistant Professor of Mathematics Tanujit Dey, Ferguson Professor of Mathematics Chi-Kwong Li and Associate Professor of Mathematics Gexin Yu. A total of 18 faculty members have their hand in this project, bringing big-data expertise from William & Mary’s departments of applied science, physics, biology and the Virginia Institute of Marine Science (VIMS).

The full title of the NSF-funded initiative is “EXTREEMS-QED: Computational and Statistical theory and techniques in the study of large data sets.” Shi says the program’s interdisciplinary approach will allow undergraduate mathematics majors to gain experience in manipulating data across other disciplines. Working in an eight-week session over the summer of next five years, students and faculty will form teams to address a number of “big data” research problems.

Shi explained that data-intensive and data-centric science is taking the main stage in the careers of the next generation of American scientists and engineers. Offering “big data” training to undergraduate mathematics majors will help these students to be prepared for careers in the sciences with the experience to be leaders in their fields as future technology develops. The sheer amount of the data collected today and the need of many different research and commercial endeavors to deal with big data make big data analysis a skill highly sought after by employers, he added.

“The amount of digital data is increasing dramatically in virtually every area of science, engineering, technology and daily life,” said Shi. “New innovative training in mathematical and statistical tools for analyzing large data sets is imminently needed.”

For example, Shi cites a Harvard Business Review article that says that more data cross the internet every second than were stored in the entire internet 20 years ago. Single sets of data, he explained, are measured in terabyte, petabytes and even exabytes. A petabyte is a million gigabytes, capable of storing the text from 20 million filing cabinets, he explained.

 According to Shi, the EXTREEMS-QED program will allow undergraduate mathematics majors to expand and enhance their knowledge in computational and data-enabled science and engineering (CDS&E). The program will push “big data” into William and Mary mathematics courses such as linear algebra, statistics, data analysis and probability, by introducing data-intensive teaching modules to these classes.

Additionally, new courses dealing with matrix and graph theory, bioinformatics and complex networks, among others, will be introduced to the department. EXTREEMS-QED will also include an eight-week summer research session that will bring undergraduate students and faculty together to work on theoretical and applied science projects related to “big data.” Shi said that another feature of the program is that it will also bring one or two teams of faculty and undergraduate students from Virginia State University, Hampton University or Norfolk State University to the annual summer research program, and they will work together with William and Mary faculty and students on research projects.

One project that the math faculty are particularly excited about is the search for predictive signals in electronic data from neonatal intensive care units. Shi explained the program recently acquired large data sets, including neonatal intensive care unit monitor records for over 3,000 infants. The records were used by a team led by Professor of Physics John Delos in a “big data” initiative to improve the performance of NICU monitors. Shi explained that through this program, undergraduates and faculty alike will be able to create mathematical models of these unique data sets to predict aspects of the electronic signals and save lives of premature infants.

Shi hopes the program will allow undergraduate mathematics majors to gain experience in computational and data-enabled science and engineering to prepare them for future graduate study and careers in math and science.

Mathematics students interested in studying big data can find more information at this web site.