learning System design as a landscape architect 10
Rethink system design in a much fun way, as a former urban planner/landscape planner. Take Youtube as example
Design Youtube
Scenario analysis
Funct ional Requirement
- upload video
- watch and share
- generate thumbnail
- like and comment
other function:
- high availability
- Fluency
- real time recommendation
data
- MAU: 20B
- DAU: 1.5B
- video watched daily: 50B
- more than 500 hours of video were uploaded to YouTube every minute
- People Spend An Average Of 40 Minutes On YouTube
Calculation
- How many videos are watched per second: 50B/ 86400s ~ 60000 video/ second
- assume upload vs watch : 1 : 500, 60000 / 500 = 120 video/second
- assume every video is 5 minutes long : 120 * 5 = 600 mins / second
- storage calculation: 1 minute video need 50 MB: 600 * 50 = 30000 MB / S = 30 GB /S
- bandwidth acculation: assume every upload take up 166KB /s bandwidth – 600 * 60 * 166 ~ 5 G /s
Service
- User Service
- Upload Service
- Encode Service
- ThumbService
- Video Service
Storage
Upload Service
video upload process
Client
–1.choose videos and upload–> Web Server
–2. save the processed videos into File System–> Cloud Storage Server
Web Server: user login, encode, thumb generate.
How to avoid big data project failures?
- video sharding
- JavaScript
- Continue execution after breakpoint
- After the video is sharded, use hash to generate the chunk_id.
- Client initiates an upload request, send chunk_ids to the server
- Server generates video_id & saves the directory and returns it to client
- Client starts uploading chunks
When uploading, if the user stops uploading, what to do with the uploaded part?
It will be stored for a period of time, and tge uploaded part will be deleted after expiration.
Transcoding and Thumbnail generation
Transcoding: format and quality inorder to play in Youtube website
Thumbnail: preview thumbnails over progress bar and cover thumbnail generated by Encode Serve.
file upload flow
Client
–> Web Server
–> Encode Server
–> Cloud Storage Server
Optimize the file upload process
Client
–1. send file upload request –> Web Server
–2. return Encode server address –> Client
Client
–3. upload file into Encode Server–> Encode Server
–4. storage–> Cloud Storage Server
Video and Thumbnail image storage
metadata
how to display video’s title, author, create_time such infomation?
metadata is saved in Video Table
storage
Video Table
column type explain example
video_id Primary Key,VARCHAR 'V0001'
hash_code VARCHAR 'a8Lxxaaa0c9zqq'
resolution VARCHAR '1080P'
size INTEGER size KB 545259520
duration INTEGER second 450
metadata BLOB
'{
language:xxx
tags:xxx
category:xxx
}'
why need hash_code here?
avoid uploading duplicate video files.
Chunk Table
column type explain example
chunk_id Primary Key,VARCHAR 'd9908ljhoihj oih'
video_id Foreign Key,VARCHAR 'V0001'
start_time INTEGER 8
end_time INTEGER 23
folder VARCHAR '/1000/'
resolution VARCHAR '1080P'
Thumbnail Table
column type explain example
thunb_id Primary Key,VARCHAR 'T000'
video_id Foreign Key,VARCHAR 'V0001'
folder VARCHAR '/aaaa/10/'
size INTEGER size KB 10
type VARCHAR 'progress bar'
moment INTEGER 40
User Table
column type explain example
user_id Primary Key,VARCHAR 'U001'
user_name VARCHAR 'aa'
gender INTEGER 0 female 0
User Video Table
column type explain example
user_id Foreign Key,VARCHAR 'U001'
video_id Foreign Key,VARCHAR 'V001'
create_time DATETIME 1988/03/24
metadata BLOB '{}'
complete video upload process
Client
–1. send file upload request –> Web Server
–2. return Encode server address –> Client
Client
–3. upload file into Encode Server–> Encode Server
–4. store metadata–> Database
Encode Server
–5. store file–> Cloud Storage Server
Web Server
–6. insert data into User Video Table –> Database
What is the difference between small file storage and normal file storage
- Chunks and images size is small
- The reading frequency is high
Video and thumbnail load
How to Play Video Smoothly
-
loading and playing videos at the same time
-
preload
How to load thumbnail
- cover thumbnail
- read from file server when frontend load video list
- process bar thumbnail
- When the user moves the mouse over the progress bar, the thumbnail is loaded into the local cache.
- When the user clicks on a video, load all the thumbnails of the video into the local cache.
The front-end and back-end cooperate, and the front-end passes the loaded progress bar position data, which is processed by the back-end and abbreviated from the cache. data, returns the current progress thumbnail
Scale
How to optimize reads with a CDN
share the most popular videos to the CDN, then users can watch those video in their closest servers.
how to find the CND server
client –> webserver (video centent, client IP, find closest CND) – repost CDN–> video –> client
what about the user data growing exponentially?
sharding the database
How to sharding the database
- user id when querying user data
- video id when querying video data