The Problem is Stated

In 2015, while developing our first mobile application, I wanted to write an algorithm capable of handling a significant scale-up. Despite the lack of users on our mobile app, a developer must anticipate the worst technical scenarios (what a mistake!) to handle the best business scenarios. To recap, Ocurz allowed users to find geolocated activities around them easily. Existing databases did not adequately solve my problem, so I decided to create a custom solution.

Data Fragmentation

At that time, I understood that horizontal scaling required data fragmentation. Each machine would be responsible for only a portion of the events, which meant finding a method to index and group the information. Quickly, my research led me to geohash, a key concept of the protocol I adopted. A geohash represents a static area of the world map, encoding GPS coordinates on a variable base (16, 32, 64, etc.). The length of the geohash determines the precision of the area covered: the more characters, the smaller the area.

The Principle of Geohash

Encoding and decoding a geohash relies on the successive dichotomy of the world map, where each character translates into a series of bits (0, 1).

For example, on a base 16 (A: 0000, B: 0001, C: 0010, D: 0011, E: 0100, F: 0101, 0: 0110, 1: 0111, 2: 1000, 3: 1001, 4: 1010, 5: 1011, 6: 1100, 7: 1101, 8: 1110, 9: 1111), “F3” corresponds to the bits 0101 1001.

Starting Interval:  [-90, 90]           [-180, 180]

1st Dichotomy:      [-90, 0]            [-180, 180]
2nd Dichotomy:      [-90, 0]            [0, 180]
3rd Dichotomy:      [-90, -45]          [0, 180]
4th Dichotomy:      [-90, -45]          [90, 180]
5th Dichotomy:      [-67.5, -45]        [90, 180]
6th Dichotomy:      [-67.5, -45]        [90, 135]
7th Dichotomy:      [-67.5, -56.25]     [90, 135]
8th Dichotomy:      [-67.5, -56.25]     [112.5, 135]

image

Once these bits are decoded, they help determine the associated geospatial area. Each position is represented as a pair (latitude, longitude) within the interval ([-90, 90], [-180, 180]). For “F3,” the successive splitting algorithm applies as follows:

Initial Algorithm

With a method to encode and group positions, I tackled the question of loading the nearest positions from an origin point. My first algorithm worked rather naively:

  1. Encode the origin position ( p_0 ) into a geohash corresponding to the parent zone.
  2. Load the positions in that zone.
  3. Calculate and sort neighboring zones by distance to ( p_0 ).
  4. Load the content of neighboring zones incrementally.
  5. Repeat the calculation of the following zones.

Refining the Algorithm

A problem quickly arose: how to guarantee an exact order when loading the zones? In other words, how to ensure that there are no positions in an unloaded zone that are closer than a displayed position?

To solve this problem, zones must be sorted and loaded in a way that gradually expands a radius, ensuring a non-mutable order of already loaded positions. Therefore, we calculate the minimum distance of each zone with ( p_0 ). The displayed positions must be less than this minimum distance to the next zone to be loaded. This means that some positions, although already retrieved, must be kept in memory as long as their distance is not less than the distance to the next zone to be loaded.

Conclusion

It took me some time to clarify this rule, found by trial and error. Initially, I tried from the center of the geohash with more complex rules than the one finally exposed. Thanks to this first overcome difficulty, a first version of the algorithm was born, laying the foundation for the search algorithm. Obviously, many problems remained to be solved, such as how to ensure linear loading time and a smooth user experience? How to integrate the notion of an event (geolocated and dated)?

This is what I will explain in the next article.