The Social Concerns of Geo-Located Rectangles

Recently, this story on a ‘mapping glitch’ was posted. A farm close to the exact center of the USA is returned as the location for any failed geo-lookup by MaxMind, which does IP-to-Geo mapping. The article estimates some 600 million IP addresses map to that coordinate and make that farm in Kansas the brunt of targeting and harassment.

They’ve gotten visited by FBI agents, federal marshals, IRS collectors, ambulances searching for suicidal veterans, and police officers searching for runaway children. They’ve found people scrounging around in their barn. The renters have been doxxed, their names and addresses posted on the internet by vigilantes. Once, someone left a broken toilet in the driveway as a strange, indefinite threat.

What’s really interesting here is the social concern around geo-referenced locations. If you’ve seen almost any talk of mine in the past 3 years (or attended the geoCHI tutorial I ran with Brent Hecht), you would have heard me say, computer scientists love rectangles. We know how to store them and we know how to compute the center. Herein lies the problem, however, the problem here extends well beyond the IP-to-Geo lookup. Even a simple example can illustrate the greater issue.

In 2014, I was fortunate enough to spend a sabbatical at the Keio-NUS CUTE Center in Singapore. Not knowing the island, we (thanks Bart!) pulled about 1 million public geo-tagged photos from Flickr. From here, I made a simple ggplot2 in R to find out where people were taking the most photos.

A simple ggplot of 1 million public geotagged photos from Flickr.

Ok, that tells us nothing. A little command-line fu showed me the most photographed point.

21,857 photos at 1°21'55.1"N, 103°49'39.8"E

So now I made a more sophisticated kernel-density plot that shows a better heat-map over a Stamen map with our highlighted popular region.

A kernel-density plot of the 1 million photos with the highlighted popular area.

Ok, I’m not really sure whats there. Not knowing the island, I turned to Google Street View to find out.

Google Street View of the most popular lon/lat point from the 1 million public Flickr photos.

Nada. Just a highway and the wall of a country club. If we zoom out, we find this is the center of the bounding box that is Singapore.

Singapore, its approximate bounding box, and its exact center.

These are public geo-tagged photos, not erroneous IP-to-Geo lookups. What looks to be highly accurate (21,857 photos at 1°21'55.1"N, 103°49'39.8"E) is completely incorrect. What’s happening here is three fold: (1) geo-fencing (where the accuracy is obscured for privacy reasons), (2) GPS accuracy failing (where maybe someone went indoors and the accuracy of the signal was lost), and/or (3) manual geo-tagging (where one would just pick Singapore as the ‘venue’ where it was taken). If a geo-tagged photo has an accuracy at the city, state, or country level (which in Singapore is all the same), modern geographic-information systems will try to return a lon-lat which is going to be the center of the bounding box.

We have a problem. Geo-systems want to return a lon and lat. In these cases where it’s indeterminate, what should happen? Should it return null or nil? Should it just return the bounding box? Should it return 0°N, 0°E? Should it return something impossible like 91°N, 181°E? Or should it return the center point with an accuracy level? All of these things come down to being product/platform decisions in the end. And there is the problem; the behavior of returning the center of a box is all too easy, but there is a social interpretation cost that often falls out of engineering’s scope.

Beyond wondering how much existing research may exhibit similar errors, many people would just say engineering optimizations often do not consider the social consequences. In other cases, optimizations are called trade-offs to make product. It’s easy to criticize developers for being ignorant to these issues. From what I have seen, it’s often that these consequences are noted (and even discussed) by developers, designers, and managers, but they are often not prioritized or marked as an ‘edge-case.’ These so called glitches, as in the original news story, have lead to misinterpretation from individuals to government enforcement. This only begins the geo-systems problem. Take the versioning of location names for example—it’s something that rarely happens. It doesn’t take a cartographer to tell you places change their names over time, but many of today’s geocoding systems can’t tell you a neighborhood wasn’t called La Lengua back in the 1990s or those islands used to belong to another country. However unintended, these problems have become rather institutionalized by the systems that digitally feed us. In the meantime, it’s best as researchers and/or engineers that we begin to re-prioritize and understand the data that we are handed, and not just blindly use it in volume. Of course this is something that will take time and won’t help that farm and family in Kansas right now…sadly.

scientist/research director: @toyotaresearch @FXPAL @cwi_dis @yahoo @flickr @sigchi. instructions: place in direct sunlight, water daily.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store