learning System design as a landscape architect 11

Fri, Apr 1, 2022 5-minute read

Rethink Location Based Service system design in a much fun way, as a former urban planner/landscape planner. Take Uber as example

Design UBER
Scenario analysis
- Functional Requirement
- data
Service
Storage
- Schema
  - Trip Class
  - Location Class
save and query location info
- Search for vehicles within a 2km radius use geohash in database
Storage geohash
Scale

System Design = Logic Design(why) + Infrastructure Design(how)

Design UBER

Scenario analysis

Functional Requirement

First step:

Driver report location
Rider request Uber, match a driver with rider

Second step:

Rider cancel a request
Driver deny / accept a request
Driver cancel a request
Driver pick up a rider
Driver drop off a rider

data

2022 2 million drivers in Uber (google)

assume Peak Driver QPS = 300 k

Storage acculation

assume every location data: 600 k * 86400 / 4 * 100 bytes ~ 1.3 T (Driver report locations by every 4 seconds)
assume only store current location data: 600 k * 100 bytes = 60 M 1 M = 10^6 bites

Service

GeoServer (record locations)
DispatchServer

How Drivers receive Riders' request

Report location, at the same time, the Service return matched request

Storage

Dispatch Web Server –> Trip Table

Geo Web Server –> Location Table

Consider read/write frequency on a database table?

Schema

Trip Class

class Trip {
    public Integer tripId;
    public Integer driverId, riderId;
    public Double startLatitude, startLongitude;
    public Double endLatitude, endLongitude;
    public Integer status;
    public Datetime createAt;
}

Trip Table            type            comments
//
status                 int            New request / waiting for driver / on the way to pick up / in trip / cancelled / ended

Location Class

class Location {
    public Integer driverId;
    public Double Latitude, Longitude;
    public Datetime updateAt;
}

Location Table          type            comments
driver_id               fk              Primary key
lat                     float
lng                     float
updated_at              timestamp

Trip Class is a heavily read database, and Location class is the opposite

save and query location info

how to search nearby drivers

query the lat and lng from Location table, inorder to improve the query process, The lat and lng statement are used to create indexes in tables.

create composite index lat_lng_idx on location_table ?

actually, composite index can only be used in accurate lat value and certain range of lng.

[111, 300] ~ [111, 400]

The index of the database can only solve one dimension Range Query.

Range Query with multiple dimensions cannot be efficiently queried.

Search for vehicles within a 2km radius use geohash in database

according to the geohash length, eg: find all vehicles start with 9q9jv

Storage (geohash)

SQL

CREATE INDEX on geohash
SELECT * FROM location WHERE geohash LIKE 9q9hv%

NoSQL (Cassandra)

set geohash as column key
range query (9q9hv0, 9q9hvz)

NoSQL (Redis)

storing Driver Location data hierarchy in redis
- Driver locatoin is 9q9hvt, then store it in 9q9hvt, 9q9hv, 9q9h
- geohash of length 6 produces cells that cover less than a square 1 kilometer of land, which is enough for Uber
- geohash of length 4 produces cells that cover more than a square 20 kilometer of land
key = 9q9hvt, value = set of drivers in this location

NoSQL (Redis) as database

support list, set, remove in set is O(1).

Read and write speed close to memory access speed >100k QPS.

data flow between riders and drivers

Riders' request for drivers

User makes a taxi request to find drivers around a given location
- (lat,lng) → geohash → [driver1, driver2,…]
- Check the 6-digit geohash to find the one within 0.6 km
- If there is no reply, then check the 5th geohash to find the one within 2.4 km
- If there is no reply, then check the 4-digit geohash ༌ to find the one within 20 kilometers

Location Table

key         geohash
value       {driver1_id, driver2_id, driver3_id..}

The driver is successfully matched, and the user queries the location of the driver

Driver Table

key         driver_id
value       (lat, lng, status, updated_at, trip_id)

Drivers data from in database

Drivers report their location
- Calculate the geohash of the current location lat, lng
  - geohash4, geohash5, geohash6
- query old location
- updates in the Redis
- updates their latest active time in Driver Table
Drivers accepty the riders request
- edit the Trip status
  - when a riders sends request, Trip Table creates a trip and matches the closest driver.
- update their status as nonavailable in Driver Table
Driver complete the trip
- edit the Trip status
- update their status as available in Driver Table

data flow

Rider make requests, server create a trip
- return trip_id to rider
- send requests every few seconds to the server to check the match
server find the matched driver, write to the trip, waiting for the driver’s response
- update the driver's status as nonavailable, save in the related trip_id
drivers report their location
- find the trip_id in Driver table
- query the trip in trip table, return to the driver
driver accept the request
- update the status info in Driver Table, Trip table
- rider finds the match, get driver’s info
driver rejuect the request
- update the status info in Driver Table, Trip table
- server rematch a new driver, repeat step 2

Scale

demand: 300 QPS

Redis is good choice for > 100 QPS. So we need 3-4 Redis server?

Interviewer: one Redis server is crash?

DB Sharding

Distributing incoming network traffic
Avoiding Single Point Failure

how to shard

sharding is based on the first 4 length Geohash
query is based on the 4-6 length geohash numbers

or sharding by cities (geo fence)

Interviewer: How to check rider is in Airport

create a geofence around a city then create a geofence arround airport.

Interviewer: How to reduce impact on db crash?

method 1: Redis Master Slave

write to the Master server, but slave server the reduce the read traffic.

method 2: better NoSQL database choice

ig: 300k QPS, use 1000 Cassandra –the average QPS of 300 per unit.

It will help you better deal with Replica and recovery after a crash.

I'm Riley Shen