Analysts and researchers ask questions that require a better handle on the “wheres” in the world.
Who is allowed to vote in an election? Where are disease rates increasing the fastest? What school districts are improving? Falling behind?
If you want to answer any of these questions, you need to have a precise understanding of subnational borders — those boundaries that separate states, provinces and counties.
And gaining that understanding hasn’t always been easy, or even possible. Until now.
Dan Runfola says no accurate, public, open-source database of the world’s states, provinces and counties exists. It’s a condition that persists even in prosperous countries.
“If you want to know exactly where the governorates in Saudi Arabia are, it can take months or years to figure out, as we just realized,” Runfola said. “That is, unless you have the money to buy the data from a for-profit company.”
Runfola is an assistant professor in William & Mary’s Department of Applied Science. He is the faculty member of a group that just published a paper in the journal PLOS ONE announcing the goBoundaries Global Administrative Database. It’s online, it’s free and the geoBoundaries team says it’s open – that is, every license for use is selected to be as permissive as possible.
But the geoBoundaries Global Administrative Database is not for everyone. The geoBoundaries database is not what you pull up on your phone at your cousin Brigid’s wedding to settle the argument between Uncle Seán and your mother about whether the ancestral home was in County Kildare or just over the border in Meath.
“This is predominantly aimed at analysts and researchers,” Runfola explained. “This is not a product that is intended for someone with no training that wants to see a map (though you can look at maps on the website!). We release what are called GeoJSONs and shapefiles, which are the industry standards for delineating geographic spaces.”
The database is the product of three years of work by a group that consisted substantially of William & Mary undergraduates and recent alumni. Co-authors on the PLOS ONE paper undertook a wide range of roles in the creation of the geoBoundaries product. Joshua Panganiban ’20, Lauren Hobbs ’19 and Leigh Seitz ’17 all served as team leads, providing mentorship and leadership to team members.
The students on the team spent thousands of hours drawing the lines that define geographic boundaries, as well as reaching out to governments all across the world to get permission to use existing products. Student contributors included Austin Anderson ’21, Heather Baier ’20, Matt Crittenden ’21, Elizabeth Dowker ’20, Seth Goodman ’21, Grace Grimsley ’19, Lauren Hobbs ’19, Rachel Layko ’19, Graham Melville ’19, Maddy Mulder ’21, Rachel Oberman ’19, Andrew Peck ’21, Hannah Slevin ’21 and Rebecca Youngerman ’19.
Co-authors Sylvia Shea ’21 and Sydney Fuhrig ’21 will be taking on leadership roles next academic year to continue to update the geoBoundaries product.
The geoBoundaries team is one of four groups that make up the geoLab, a group of student-driven data-science initiatives mentored by Runfola. The other teams are geoData, geoParsing and geoDev.
Runfola said that governmental agencies – ranging from the Department of State to Intelligence Community – have expressed interest in the geoBoundaries project, alongside academic researchers and NGOs.
“We’re in discussions right now with a large NGO that is dedicated to environmental sustainability,” Runfola said. “And they’re asking questions like: What are the rates of deforestation in different states around the world? And in order to answer that question, you have to know the boundaries of states.”
Increasingly, people and agencies with such questions are getting their answers from the database created and maintained by the geoBoundaries group. Runfola adds that the geoBoundaries database is especially important because more and more resource-allocation decisions are forced to be made at the subnational level.
“Let me just give you a real example,” Runfola said. “Let's say that you want to allocate funding for schools. You know there are school districts, but don't have the information on where the districts are. All the information in the world about how well individual districts are doing doesn’t matter unless you know where those districts are. Without that district data, you don't know what schools need help. You have to have more granular, district-level information to do the most meaningful things.”
The need for more granular data is global, he said, adding that the challenges in collecting that data become more acute in countries that need the information most — countries that lack the infrastructure to collect and maintain their internal boundaries.
Joshua Panganiban, a three-year veteran of the geoBoundaries team, gave an example of how the necessity of having accurate, fine-grained border information becomes manifest in different ways around the world. Some countries, he said, have local government structures and traditions that require information of a finer grain than others.
“In the Philippines, for example, the national government may pass the policy and give out a budget, but really the people making the real-time decisions on a local level are the barangays, or village,” he said.
Panganiban explained that barangays are the smallest political unit in the country, operating at levels below the state, the province and even the municipality.
“Policymakers or NGOs would like to identify which barangays may be the poorest. Which one needs more testing kits. Which one needs more education funding,” he said. “They don't know sometimes where exactly these barangays are. That’s something that we would be able to do with all these subnational boundaries — provide those researchers and policymakers with an understanding of where that very small unit is.”
Runfola explained that the benefit of the geoBoundaries database is not in defining the legal boundaries themselves: “The boundaries are what they are,” he said. “What we’ve done is put them all together so that any practitioner can get access to the information.”
But precision was very much on the mind of the geoBoundaries team and Runfola said that compilers of subnational borders have to weigh size of the files versus accuracy.
“Some organizations have decided to make their files relatively small, so that more people can use them. You can use them for web rendering or all sorts of other things,” he explained. “For our data, we have retained a tremendous amount of precision, but the cost of that is that it’s harder to use for purposes other than analysis.”
But, he added, providing precise, open data for analysts was the goal of the geoBoundaries team all along. “With open data comes a wide range of opportunity – from promoting research replication to ensuring anyone can openly share information in a meaningful way. “