Mar 31, 2023
5 mins read
Suppose we have a city, such as Cologne in Germany, although everything described in this post will apply to any other city.
I'm using Florence: my experimental library to analyze city places by writing simple, life-inspired functions. GeojsonCloud in turn, enables access to interesting geospatial datasets from the F# language level.
You can download the full notebook here. You can open and run it with Visual Studio Code (Polyglot Notebooks extension is required).
To analyze the best place for ... anything, we must determine what place means.
The first notion that comes to mind is an address, representing a registered, official place in a city.
Let's look at addresses in Cologne:
After examining the pictures, two immediate observations come to mind.
~160K addresses is a pretty large amount of data, especially if they are going to be joined with other datasets
Many nearby addresses will have the same value for the vast majority of analyses:
What is the red shape in the picture? It is an area with geographically concise value, typically spread by physical structures like streets, rivers, or other barriers. Crossing them makes a challenge and changes its context in some way, hence its potential value.
It is called a block. If the block is still large or many people live there (skyscrapers), it is often better to use census tracks for analyses.
24k places sound like a better number than 160k. Obviously, on a more central level, like budget participation (collaborative governance), we may want to analyze the city on a statistical or even district level:
Downsides of NOT using geospatial indexing systems
All we have seen so far is nice, and we can have fun with the data and think about what our analyses should do with these data.
But the World is not that perfect:
Having access to carefully crafted geospatial blocks like those presented is still a rear case for most the cities
Addresses data can be obtained from either open city data portals or general data sources like https://openaddresses.io/, but for a lot of cities, they are still missing (London), hence requires parsing them from Open Street Data, which is great for a lot of cases, but data are still often not complete and not unified
A particular point from one data source (or UI) will typically be slightly different or not reflected in addresses datasets, which makes the analyses not possible without applying further nearby searching, which significantly complicates them and impacts the performance
We often have to use centroids to analyze the blocks or similar tracts. As blocks vary in their shapes, the analyses are compromised. And we have to apply not trivial algorithms to select a specific block for requested points
All of it differs between cities; hence analyses cannot be easily shared among cities
H3 indexing to the rescue
H3 converts any point to a hierarchical hexagonal grid with a desired resolution. Converting points to places is fast and unified.
Geospatial indexing is a very interesting topic, but the details are out of the scope of this post, so please reach out to the docs if you are curious.
The downside of geospatial indexing is that if we choose the wrong resolution, close points can be represented completely differently:
We can tackle this by choosing a resolution carefully. In return, we get a range of fast algorithms for nearest neighbors, shortest paths, etc.
Going back to the city of Cologne.
You can split it with H3 indexing as below.
I'm using H3.net behind the scenes, a port of Uber's H3 to .NET.
The resolution parameter says how hierarchically we should go deep from the earth level down to the square meter.
What resolution should we use, then? Of course, it depends on what problem we are trying to address.
This is how many hexagons we get for resolutions from 4 to 13:
Resolutions from 7 to 10 seem sufficient for most analysis needs.
So let's do one!
Let's find the best place to live in Cologne.
All of you who live in Cologne (Koeln) can try my Mapbox editor, which enables you to specify your city life visually. You can autofill your places and receive a magical type that provides distances to your vital places through their names.
For those not living in Cologne (like me), we can easily randomize some places from any datasets we have used so far:
And write the analyzing function:
There are more ways to write it. Except for our life context, we can include city facilities, but for the scope of this post, it is fair enough. Now let's evaluate it and see the results on a map:
Among colored hexagons, you will notice exact function result, place, quantile rank, index and nice descriptive distances to all your vital places.
We can get the top scorers:
But what if we need to check how a particular location performs? (and we don't speak with grids and hexagons normally).
Suppose we have a job or a house portal, and we have a collection of locations from such a portal, and we want to check how our function ranks them:
The first step is to map these locations into indexes. We only have to ensure we use for this the same resolution we did split the city ealier:
The second and last step is to find these indexes in our results and display values:
We have already covered a lot of topics and will pause here.
Maybe except for one additional, small example :)
While describing the pros and cons of H3 indexing, I mentioned that we could apply various algorithms and optimizations.
Imagine we don't want to cover the entire city, just our neighborhood. Or we simply don't have any dataset. How to generate adequate hexagons then?
We can create it by providing the resolution and the distance we want to cover. However, in this specific scenario, distance is not a metric but a number of our desired hexagons in each direction:
We can rank these places in the same manner as previously.
That is all for now. If you like it, please buy me a coffee so I can reserve more time for similar endeavors.