System design
流程:
1. 清楚 requirement, 需要看到这个系统提供什么功能, clarify use cases and constraints
- discuss assumptions
- who (use it)
- how
- how many users
- what does the system do
- input and out
- how much data expect to handle
- how many requests per second expected
- the expected read to wirte radio
2. high level design
- draw
- components and connection
- jusity the idea
3. design core componets
4. scale the design,potential solutions and trade-offs
- identify and address bottlenecks
- load balancer
- horizontal scaling
- caching
- database sharding
5. scale
basic concept
延迟(Latency) : 通过管道需要花费的时间
带宽(Bandwidth) : 管道的宽度
吞吐(Throughput) : 每秒钟流过的水的数量就是吞吐
scalability
- Vertical scaling
MySQL
- Horizontal scaling Horizontal scaling (aka scaling out) refers to adding additional nodes or machines to your infrastructure to cope with new demands.
Cassandra, MongoDB
- Load balancing
- Database replication
- Database partitioning
- Clones
- Databases
- Caches
- Asynchronism
Performance vs scalability
Latency vs throughput
maximal throughput with acceptable latency
Availability vs consistency
CP - Consistency/Partition Tolerance
could result in a timeout error
等待分区节点的响应可能会导致延时错误。如果你的业务需求需要原子读写,CP 是一个不错的选择。
Choose Consistency over Availability when your business requirements dictate atomic reads and writes.
AP - Availability/Partition Tolerance
the system needs to continue to function in spite of external errors (shopping carts, etc.)
AP is a good choice if the business needs allow for eventual consistency or when the system needs to continue working despite external errors
如果业务需求允许最终一致性,或当有外部故障时要求系统继续运行,AP 是一个不错的选择
RPC VS REST
| 操作 | RPC | REST | | :—- | :—- |:—- | | 注册 | POST/signup | POST/persons | | 注销 | POST/resign {“personid”: “1234”} |DELETE/persons/1234 | | 读取用户信息 | GET/readPerson?personid=1234 |GET/persons/1234 | | 读取用户物品列表 | GET/readUsersItemsList?personid=1234 |GET/persons/1234/items | | 向用户物品列表添加一项 |Post/addItemToUserItemsList{“personid”: “1234”; “itemid”:“456”} |POST/persons/1234/items{“itemid”:“456”} | | 更新一个物品 | Post/modifyItem{“itemid”:“456”;“key”: “value” } |PUT/items/456{“key”: “value”} | |删除一个物品 | Post/removeItem{“itemid”:“456”} |DELETE/items/456|
该知道的时间数据
| power | Exact Value | Approx Value | Bytes | :—- | :—- |:—- |:—- | 7 | 128 | | | 8 | 256 | | | 10 | 1024 | 1 thousand | 1 kB | 16 | 65,536 | | 64 kB | 20 | 1,048,576 | 1 million | 1 MB | 30 | | 1 billion | 1 GB | 32 | | billion | 4 GB | 40 | | 1 trillion | 1 TB
延迟数字
2020 年数据
Latency Comparison Numbers
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1
cache
Compress 1K bytes with Zippy 10,000 ns 10 us
Send 1 KB bytes over 1 Gbps network 10,000 ns 10 us
Read 4 KB randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
HDD seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from 1 Gbps 10,000,000 ns 10,000 us 10 ms 40x memory, 10X SSD
Read 1 MB sequentially from HDD 30,000,000 ns 30,000 us 30 ms 120x memory, 30X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Notes
1 ns = 10^-9 seconds 1 us = 10^-6 seconds = 1,000 ns 1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns
Read sequentially from HDD at 30 MB/s
Read sequentially from 1 Gbps Ethernet at 100 MB/s
Read sequentially from SSD at 1 GB/s
Read sequentially from main memory at 4 GB/s
6-7 world-wide round trips per second
2,000 round trips per second within a data center
以上是 2009 的数据,实际上内存、SSD 和机械硬盘顺序读取速度有了非常大的提升。
内存:100 秒
SSD:4.4 小时
同一数据中心往返:5.8 天
机械硬盘寻址:23.1 天
从远程服务器的内存中读数据要比直接从硬盘上读取要快的
对于读取 1MB 数据,内存、SSD 和磁盘基本差了一个数量级:
内存: 50 分钟
SSD: 13.6 小时
磁盘: 9.5 天
尤其在设计存储引擎时,很多开源软件(Kafka、Leveldb、Rocksdb)都充分利用了存储介质顺序读、写速度远远快过随机读、写的特性,只做追加写操作来达到最佳性能。