learning System design as a landscape architect 10

Fri, Apr 1, 2022 4-minute read

Rethink system design in a much fun way, as a former urban planner/landscape planner. Take Youtube as example


Design Youtube

Scenario analysis

Funct ional Requirement

  • upload video
  • watch and share
  • generate thumbnail
  • like and comment

other function:

  • high availability
  • Fluency
  • real time recommendation

data

  • MAU: 20B
  • DAU: 1.5B
  • video watched daily: 50B
  • more than 500 hours of video were uploaded to YouTube every minute
  • People Spend An Average Of 40 Minutes On YouTube

Calculation

  • How many videos are watched per second: 50B/ 86400s ~ 60000 video/ second
  • assume upload vs watch : 1 : 500, 60000 / 500 = 120 video/second
  • assume every video is 5 minutes long : 120 * 5 = 600 mins / second
  • storage calculation: 1 minute video need 50 MB: 600 * 50 = 30000 MB / S = 30 GB /S
  • bandwidth acculation: assume every upload take up 166KB /s bandwidth – 600 * 60 * 166 ~ 5 G /s

Service

  • User Service
  • Upload Service
  • Encode Service
  • ThumbService
  • Video Service

Storage

Upload Service

video upload process

Client –1.choose videos and upload–> Web Server –2. save the processed videos into File System–> Cloud Storage Server

Web Server: user login, encode, thumb generate.

How to avoid big data project failures?

  1. video sharding
    • JavaScript
  2. Continue execution after breakpoint
    • After the video is sharded, use hash to generate the chunk_id.
    • Client initiates an upload request, send chunk_ids to the server
    • Server generates video_id & saves the directory and returns it to client
    • Client starts uploading chunks

When uploading, if the user stops uploading, what to do with the uploaded part?

It will be stored for a period of time, and tge uploaded part will be deleted after expiration.

Transcoding and Thumbnail generation

Transcoding: format and quality inorder to play in Youtube website

Thumbnail: preview thumbnails over progress bar and cover thumbnail generated by Encode Serve.

file upload flow

Client –> Web Server –> Encode Server –> Cloud Storage Server

Optimize the file upload process

Client –1. send file upload request –> Web Server –2. return Encode server address –> Client

Client –3. upload file into Encode Server–> Encode Server –4. storage–> Cloud Storage Server

Video and Thumbnail image storage

metadata

how to display video’s title, author, create_time such infomation?

metadata is saved in Video Table

storage

Video Table

column              type            explain             example
video_id    Primary KeyVARCHAR                         'V0001' 
hash_code          VARCHAR                           'a8Lxxaaa0c9zqq' 
resolution         VARCHAR                               '1080P' 
size               INTEGER          size KB             545259520
duration           INTEGER           second                450
metadata            BLOB 
                                                        '{
                                                        language:xxx
                                                        tags:xxx
                                                        category:xxx
                                                        }'

why need hash_code here?

avoid uploading duplicate video files.

Chunk Table

column              type            explain             example
chunk_id    Primary KeyVARCHAR                         'd9908ljhoihj oih' 
video_id    Foreign KeyVARCHAR                         'V0001'  
start_time         INTEGER                                  8
end_time           INTEGER                                  23
folder             VARCHAR                               '/1000/' 
resolution         VARCHAR                               '1080P' 

Thumbnail Table

column              type            explain             example
thunb_id    Primary KeyVARCHAR                         'T000' 
video_id    Foreign KeyVARCHAR                         'V0001'  
folder             VARCHAR                               '/aaaa/10/' 
size               INTEGER          size KB                 10
type               VARCHAR                               'progress bar'
moment             INTEGER                                40

User Table

column              type            explain             example
user_id     Primary KeyVARCHAR                         'U001' 
user_name          VARCHAR                               'aa'  
gender              INTEGER          0 female             0

User Video Table

column              type            explain             example
user_id     Foreign KeyVARCHAR                         'U001' 
video_id    Foreign KeyVARCHAR                         'V001'  
create_time         DATETIME                            1988/03/24
metadata             BLOB                                 '{}'

complete video upload process

Client –1. send file upload request –> Web Server –2. return Encode server address –> Client

Client –3. upload file into Encode Server–> Encode Server –4. store metadata–> Database

Encode Server –5. store file–> Cloud Storage Server

Web Server –6. insert data into User Video Table –> Database

What is the difference between small file storage and normal file storage

  • Chunks and images size is small
  • The reading frequency is high

Video and thumbnail load

How to Play Video Smoothly

  1. loading and playing videos at the same time

  2. preload

How to load thumbnail

  1. cover thumbnail
  • read from file server when frontend load video list
  1. process bar thumbnail
  • When the user moves the mouse over the progress bar, the thumbnail is loaded into the local cache.
  • When the user clicks on a video, load all the thumbnails of the video into the local cache.

The front-end and back-end cooperate, and the front-end passes the loaded progress bar position data, which is processed by the back-end and abbreviated from the cache. data, returns the current progress thumbnail

Scale

How to optimize reads with a CDN

share the most popular videos to the CDN, then users can watch those video in their closest servers.

how to find the CND server

client –> webserver (video centent, client IP, find closest CND) – repost CDN–> video –> client

what about the user data growing exponentially?

sharding the database

How to sharding the database

  • user id when querying user data
  • video id when querying video data