System design

Fri, Oct 1, 2021 3-minute read

流程:

1. 清楚 requirement, 需要看到这个系统提供什么功能, clarify use cases and constraints

  • discuss assumptions
    • who (use it)
    • how
    • how many users
    • what does the system do
    • input and out
    • how much data expect to handle
    • how many requests per second expected
    • the expected read to wirte radio

2. high level design

  • draw
    • components and connection
    • jusity the idea

3. design core componets

4. scale the design,potential solutions and trade-offs

  • identify and address bottlenecks
    • load balancer
    • horizontal scaling
    • caching
    • database sharding

5. scale

basic concept

延迟(Latency) : 通过管道需要花费的时间

带宽(Bandwidth) : 管道的宽度

吞吐(Throughput) : 每秒钟流过的水的数量就是吞吐

scalability

  • Vertical scaling

MySQL

  • Horizontal scaling Horizontal scaling (aka scaling out) refers to adding additional nodes or machines to your infrastructure to cope with new demands.

Cassandra, MongoDB

  • Load balancing
  • Database replication
  • Database partitioning
  • Clones
  • Databases
  • Caches
  • Asynchronism

Performance vs scalability

Latency vs throughput

maximal throughput with acceptable latency

Availability vs consistency

CP - Consistency/Partition Tolerance

could result in a timeout error

等待分区节点的响应可能会导致延时错误。如果你的业务需求需要原子读写,CP 是一个不错的选择。

Choose Consistency over Availability when your business requirements dictate atomic reads and writes.

AP - Availability/Partition Tolerance

the system needs to continue to function in spite of external errors (shopping carts, etc.)

AP is a good choice if the business needs allow for eventual consistency or when the system needs to continue working despite external errors

如果业务需求允许最终一致性,或当有外部故障时要求系统继续运行,AP 是一个不错的选择

RPC VS REST

| 操作 | RPC | REST | | :—- | :—- |:—- | | 注册 | POST/signup | POST/persons | | 注销 | POST/resign {“personid”: “1234”} |DELETE/persons/1234 | | 读取用户信息 | GET/readPerson?personid=1234 |GET/persons/1234 | | 读取用户物品列表 | GET/readUsersItemsList?personid=1234 |GET/persons/1234/items | | 向用户物品列表添加一项 |Post/addItemToUserItemsList{“personid”: “1234”; “itemid”:“456”} |POST/persons/1234/items{“itemid”:“456”} | | 更新一个物品 | Post/modifyItem{“itemid”:“456”;“key”: “value” } |PUT/items/456{“key”: “value”} | |删除一个物品 | Post/removeItem{“itemid”:“456”} |DELETE/items/456|

该知道的时间数据

| power | Exact Value | Approx Value | Bytes | :—- | :—- |:—- |:—- | 7 | 128 | | | 8 | 256 | | | 10 | 1024 | 1 thousand | 1 kB | 16 | 65,536 | | 64 kB | 20 | 1,048,576 | 1 million | 1 MB | 30 | | 1 billion | 1 GB | 32 | | billion | 4 GB | 40 | | 1 trillion | 1 TB

延迟数字

2020 年数据

Latency Comparison Numbers

L1 cache reference 0.5 ns

Branch mispredict 5 ns

L2 cache reference 7 ns 14x L1 cache

Mutex lock/unlock 25 ns

Main memory reference 100 ns 20x L2 cache, 200x L1

cache

Compress 1K bytes with Zippy 10,000 ns 10 us

Send 1 KB bytes over 1 Gbps network 10,000 ns 10 us

Read 4 KB randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD

Read 1 MB sequentially from memory 250,000 ns 250 us

Round trip within same datacenter 500,000 ns 500 us

Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory

HDD seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip

Read 1 MB sequentially from 1 Gbps 10,000,000 ns 10,000 us 10 ms 40x memory, 10X SSD

Read 1 MB sequentially from HDD 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD

Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms

Notes

1 ns = 10^-9 seconds 1 us = 10^-6 seconds = 1,000 ns 1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns

Read sequentially from HDD at 30 MB/s

Read sequentially from 1 Gbps Ethernet at 100 MB/s

Read sequentially from SSD at 1 GB/s

Read sequentially from main memory at 4 GB/s

6-7 world-wide round trips per second

2,000 round trips per second within a data center

以上是 2009 的数据,实际上内存、SSD 和机械硬盘顺序读取速度有了非常大的提升。

内存:100 秒

SSD:4.4 小时

同一数据中心往返:5.8 天

机械硬盘寻址:23.1 天

从远程服务器的内存中读数据要比直接从硬盘上读取要快的

对于读取 1MB 数据,内存、SSD 和磁盘基本差了一个数量级:

内存: 50 分钟

SSD: 13.6 小时

磁盘: 9.5 天

尤其在设计存储引擎时,很多开源软件(Kafka、Leveldb、Rocksdb)都充分利用了存储介质顺序读、写速度远远快过随机读、写的特性,只做追加写操作来达到最佳性能。