System design

Fri, Oct 1, 2021 3-minute read

流程：

1. 清楚 requirement，需要看到这个系统提供什么功能， clarify use cases and constraints

discuss assumptions
- who (use it)
- how
- how many users
- what does the system do
- input and out
- how much data expect to handle
- how many requests per second expected
- the expected read to wirte radio

2. high level design

draw
- components and connection
- jusity the idea

3. design core componets

4. scale the design，potential solutions and trade-offs

identify and address bottlenecks
- load balancer
- horizontal scaling
- caching
- database sharding

5. scale

basic concept

延迟(Latency) : 通过管道需要花费的时间

带宽(Bandwidth) : 管道的宽度

吞吐(Throughput) : 每秒钟流过的水的数量就是吞吐

scalability

Vertical scaling

MySQL

Horizontal scaling Horizontal scaling (aka scaling out) refers to adding additional nodes or machines to your infrastructure to cope with new demands.

Cassandra, MongoDB

Load balancing
Database replication
Database partitioning
Clones
Databases
Caches
Asynchronism

Performance vs scalability

Latency vs throughput

maximal throughput with acceptable latency

Availability vs consistency

CP - Consistency/Partition Tolerance

could result in a timeout error

等待分区节点的响应可能会导致延时错误。如果你的业务需求需要原子读写，CP 是一个不错的选择。

Choose Consistency over Availability when your business requirements dictate atomic reads and writes.

AP - Availability/Partition Tolerance

the system needs to continue to function in spite of external errors (shopping carts, etc.)

AP is a good choice if the business needs allow for eventual consistency or when the system needs to continue working despite external errors

如果业务需求允许最终一致性，或当有外部故障时要求系统继续运行，AP 是一个不错的选择

RPC VS REST

| 操作 | RPC | REST | | ：—- | ：—- |：—- | | 注册 | POST/signup | POST/persons | | 注销 | POST/resign {“personid”: “1234”} |DELETE/persons/1234 | | 读取用户信息 | GET/readPerson?personid=1234 |GET/persons/1234 | | 读取用户物品列表 | GET/readUsersItemsList?personid=1234 |GET/persons/1234/items | | 向用户物品列表添加一项 |Post/addItemToUserItemsList{“personid”: “1234”; “itemid”:“456”} |POST/persons/1234/items{“itemid”:“456”} | | 更新一个物品 | Post/modifyItem{“itemid”:“456”;“key”: “value” } |PUT/items/456{“key”: “value”} | |删除一个物品 | Post/removeItem{“itemid”:“456”} |DELETE/items/456|

该知道的时间数据

| power | Exact Value | Approx Value | Bytes | ：—- | ：—- |：—- |：—- | 7 | 128 | | | 8 | 256 | | | 10 | 1024 | 1 thousand | 1 kB | 16 | 65,536 | | 64 kB | 20 | 1,048,576 | 1 million | 1 MB | 30 | | 1 billion | 1 GB | 32 | | billion | 4 GB | 40 | | 1 trillion | 1 TB

延迟数字

2020 年数据

Latency Comparison Numbers

L1 cache reference 0.5 ns

Branch mispredict 5 ns

L2 cache reference 7 ns 14x L1 cache

Mutex lock/unlock 25 ns

Main memory reference 100 ns 20x L2 cache, 200x L1

cache

Compress 1K bytes with Zippy 10,000 ns 10 us

Send 1 KB bytes over 1 Gbps network 10,000 ns 10 us

Read 4 KB randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD

Read 1 MB sequentially from memory 250,000 ns 250 us

Round trip within same datacenter 500,000 ns 500 us

Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory

HDD seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip

Read 1 MB sequentially from 1 Gbps 10,000,000 ns 10,000 us 10 ms 40x memory, 10X SSD

Read 1 MB sequentially from HDD 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD

Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms

Notes

1 ns = 10^-9 seconds 1 us = 10^-6 seconds = 1,000 ns 1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

Read sequentially from HDD at 30 MB/s

Read sequentially from 1 Gbps Ethernet at 100 MB/s

Read sequentially from SSD at 1 GB/s

Read sequentially from main memory at 4 GB/s

6-7 world-wide round trips per second

2,000 round trips per second within a data center

以上是 2009 的数据，实际上内存、SSD 和机械硬盘顺序读取速度有了非常大的提升。

内存：100 秒

SSD：4.4 小时

同一数据中心往返：5.8 天

机械硬盘寻址：23.1 天

从远程服务器的内存中读数据要比直接从硬盘上读取要快的

对于读取 1MB 数据，内存、SSD 和磁盘基本差了一个数量级：

内存： 50 分钟

SSD： 13.6 小时

磁盘： 9.5 天

尤其在设计存储引擎时，很多开源软件（Kafka、Leveldb、Rocksdb）都充分利用了存储介质顺序读、写速度远远快过随机读、写的特性，只做追加写操作来达到最佳性能。

I'm Riley Shen