learning System design as a landscape architect 1

Tue, Mar 1, 2022 6-minute read

Rethink system design in a much fun way, as a former urban planner/landscape planner. Take design twitter as example

Design twitter – Present a Landscape Design

1. what’s is the scenario planning of this project

Landscape Architects won’t jump into drawing a good site planning immediately, nor the system design. The first step is always to clarify the project and know your client demands as much as possible.

  • what problem you want to solve?
  • what basic function need here?
  • Features, QPS, DAU, Interface analysis (site condition like land, weather, population, culture analysis in a project)

  1. need basic data about twitter: MAU 330M, DAO: 200m

  2. connunicate the basic function need here

    • Register / Login
    • User Profile Display / Edit
    • Upload Image / Video
    • Search
    • Post / Share a tweet
    • Timeline/News Feed
    • Follow / unFollow a user
  3. Pick the one your clients need (interviewee)

    • Post
    • Timeline
    • News Feed
    • Follow / Unfollow
    • Register / Login
  4. Features (site condition)

    • Concurrent User :

    • Read QPS: DAO * average number of queries by a user / seconds in a day

    150 M * 50 / 86400 ~ 100k

    • Peak = 100k * 3 ~ 300k
    • Fast Growing: Max peak users in 3 months = Peak users * 2 (not for twitter)
    • Write QPS

QPS

  • 100: your laptop can the web server
  • 1 k : a web server, need consider single point failure, SQL database
  • 1 m : 100 web server cluster, need consider maintainance, NoSQL database(Memchched), Cassandra is 10 k QPS

2. master plan and zoning concept

After site analysis, It’s time to give a draft drawing sketche, with enough imformation (which need communicate with your client/team members)

In a draft drawing sketches, it would contains:

  • traffic plan
  • commerical plan
  • green system

System design:(zoning plan)

Service

Router

+ User Service

  • Register
  • Login

+ Tweet Service

  • Post a tweet
  • News Feed
  • Timeline

+ Media Service

  • Upload Image
  • Upload Video

+ Friendship Service

  • Follow
  • Unfollow

3. core area plans(1: 500)

In a draft drawing sketches, it would contains:

  • pavment plan
  • rainstom plan

Storage

+ User Service

  • Register
  • Login
  • SQL

+ Tweet Service

  • Post a tweet
  • News Feed
  • Timeline
  • NoSQL

+ Media Service

  • Upload Image
  • Upload Video
  • File System

+ Friendship Service

  • Follow
  • Unfollow
  • SQL/NoSQL

Please design schema User Service

User Table
id          integer
username    varchar
email       varchar
password    varchar

Tweet Service

Tweet Table
id              integer
user_id         Foreign Key
content         text
created_time    timestamp

Media Service

Friendship Service

Friendship Table
from_user_id    Foreign Key     
to_user_id      Foreign Key
created_time    timestamp

Interviewer: how to storage and get the News Feed

PULL vs PUSH

PULL

algorithm: when users check New Feed, get every friend’s the first 100 tweets, merge them into the frist 100 News Feed

Mergr K sorted Arrays

time complexity: basically you can ignore the Mergr K sorted Arrays times, N times DB Reads is much more costly

User – fetch News Feed -> Web Server <-get followings->Friendship Table

Web Server <-get tweets from followings->Tweet Table

Web Server -merge and return->User

cons:

multiple DB Reads is very slow when users fetch News Feed

getNewsFeed(request)
     followings = DB.getFollowings(user=request.user)
     news_feed = empty
     for follow in followings:  tweets = DB.getTweets(follow.to_user, 100)  news_feed.merge(tweets)
     sort(news_feed)
     return news_feed[:100] # return the first 100


 postTweet(request, tweet)
     DB.insertTweet(request.user, tweet)
     return succes

PUSH

algorithm: every user has a list store his News Feed information

Fanout: User post a tweet, push the tweet into his followers’s News Feed lists

when a user check his News Feed, he just read from his News Feed list

time complexity: 1 DB Read, N times Writes. (Async)

User <– Post a new tweet -> Web Server <-Insert the tweet to DB->tweet Table

Web Server -sent tweets to friends->Async Tasks Server

Async Tasks Server <-get followers->friendship Table

Async Tasks Server <-Fanout: insert new tweet to followers news feed->New Feed Table

cons:

huge followers numbers

getNewsFeed(request)
     return DB.getNewsFeed(request.user)

 postTweet(request, tweet_info)

     tweet = DB.insertTweet(request.user, tweet_info)
     AsyncService.fanoutTweet(request.user, tweet)
     return success

AsyncService::fanoutTweet(user, tweet)
     followers = DB.getFollowers(user)
     for follower in followers:
     DB.insertNewsFeed(tweet, follower )   

PULL vs PUSH

tweet and facebook are using pull

4. details design show how to solve some specific problem and improve the plan in the future

  • Optimize

  • Scalability Enhancement of Pull Server functions

    • the slowest part is happening when user request News Feed
      • add Cache before DB Read
      • cache every user’s Timeline
        • n times cache reqest
        • trade off: all the cache, or the recent 1000 in cache
      • cache every user’s News Feed
        • with Cache: merge users recent 100 tweets, then get the first 100
        • without Cache: merger all the tweets after timestamp
  • Scalability Enhancement of Push Server functions

    • compare Pull model store News Feed in Memory, Push model stores in Disk, and Disk is cheap
    • Inactive Users
      • Rank followers by weight(last login time)
    • followers » following: fanout would take hours
      • trade off: Pull + Push vs Pull
  • right Optimize thinking

Try to optimize with minimal changes to the existing model;

Estimate long-term growth and assess whether it is worth switching the entire model

  • Push + Pull

    • ordinal user (Push)
    • influencer (Pull), followers come to influencer’s timeline fetch the News Feed, merge into their News Feed
User Table
id           integer
username     varchar
email        varchar
password     varchar
is_influencer boolean
  • Maintenance

    • Robust
    • Scalability

unfollow problem(async)

  • follow: merge timeline into News Feed asynchronously

  • unfollow: pick out tweets from News Feed asynchronously

  • Async reduces the processing time

  • async improve user experience

  • however, after unfollowing, the user may still find his News Feed in timeline

store likes

Tweet Table
id                  integer
user_id             Foreign Key
content             text
created_time        timestamp
nums_of_like *      integer
nums_of_comments *  integer
nums_of_retweets *  integer
Like Table
id                  integer
user_id             Foreign Key
tweet_id            Foreign Key
created_time        timestamp

Normalize vs Denormalize

Normalize vs Denormalize likes

Normalize likes

SELECT COUNT * FROM like_table where tweet_id=xxx;

  • Pros: Standardized, most accurate

  • cons: slow, would cause O(n) SQL Queries

Denormalize likes

UPDATE like_table SET num_of_likes = num_of_likes + 1 where tweet_id=xxx;

UPDATE like_table SET num_of_likes = num_of_likes - 1 where tweet_id=xxx;

  • no need SQL Queries, because num_of_likes stores in tweet

Thundering Herd

The “thundering herd” problem occurs in a highly concurrent environment (typically, many users). When many users make a request to the same piece of data at the same time, and there is a cache miss (the data for the cached element is not present in the cache, data is kicked out due to cache expiration or elimination by the elimination algorithm, etc) the thundering herd problem is triggered.