learning System design as a landscape architect 8

Tue, Mar 1, 2022 5-minute read

Rethink system design in a much fun way, as a former urban planner/landscape planner. Take Bigtable as example

Bigtable

Bigtable

NoSQL, how to scale

SQL and Complex Queries Are Needed for Real life, which is base on file system.

DB systems store data as files on the device’s local filesystem, provide interface to operate.

Better understand why we need DB system. You open a closet, find the drawer, get the socks you wanna wear today.

In this case, DB system is the drawer, the file is the sock.

Scenario analysis

search, return

Storage

how about save all data in memory? Bad idea, in District File System File sizes are always measured in PB.

how about save in Disk (sorting and binary search).

But how to Edit the file which is saved in Disk?

method1 :Edit the file directly. (original is 4 bites, now edit to 8 bytes, then the following need move)

method2: read the file, edit, delete the original one, rewrite the file

method3: without edit, but append a record at the end of a file (fast)

BigTable is method 3, use APPEND to improve write performance.

cons:

when reading data, how to recognize the newest record
- timestamp
how to Binary Search on unsorted data.
- Block order
repeat blocks affect query performance
- merge K sorted

Write process

Client 1 – Write key: Aa + appearance value: 3 –> Memory (file 0, file 1..) –Append key: Aa + appearance value: 3 –> file 1 (unordered)

when the file 1 reach to 256(or 512 bytes) –merge/sort–> Order Block –> open a new unorder block

make the last file from unOrdered to Ordered

method 1: sort in memory (Write once(disk), read once(disk), sorted in memory, Write once(disk))
method 2: saved in memory from beginning ( sorted in memory, Write once(disk))

Client 1 – Write key: Aa + appearance value: 3 –> Memory (file 0, file 1..unOrdered block, when read to full size) –Append key: Aa + appearance value: 3 –>Sorted List –Serialize to disk–> file 1 (ordered)

Recover data of different file formats from crashed memory

Write Ahead Log

Read process

Client 1 – 1. Read key: Aa + appearance –> Memory –2. Read key: Aa + appearance –>Sorted List(Aa + appearance value: 3)

no result – Memory –3. Read key: Aa + appearance –> file 1 (ordered)

query in ordered file

Index
BloomFilter

One easy way to build index

Memory Index

A: address
D:
S:

GFS Sstable 0

Apple: 10
Cxxx: 3
Daaaa:8
Xuuu: 19
Sxxx: 9

Look up key Xuuu, just need binary search from D to S

In real life, B tree index is good choice.

One better way to read file

BloomFilter
- use 2 hash keys

Compelete Read process with Index, BF

Memory:

file0, Address0
Index0, bloomfilter 0
file1, Address1
Index1, bloomfilter 1

Sorted List:

xx + appearance: 9

Disk: File 0 Ordered:

aa + appearance: 3
bb+ appearance: 8
Index 0
bloomfilter 0

Disk: File 1 Ordered:

cc + appearance: 3
dd+ appearance: 8
xx + appearance: 9
Index 1
bloomfilter 1

First query in unOrdered file

Client 1 – 1. Read key: Aa + appearance –> Memory –2. Read key: Aa + appearance –>Sorted List(Aa + appearance value: 3)

return result if it exists

if not, query in Order File start from lastest.

use BloomFilter to search the key, query the next file if not exists; otherwise use index to find the value for the key.

SSTable = Sorted String Table

Sorted List = Skip List

Complete Write process

Client 1 – Write key: Aa + appearance value: 3 –> Memory (file 0, file 1..) –Append key: Aa + appearance value: 3 –> Skip List (unordered) –> SSTable 1 (Ordered in Disk)

How to read/write key: value from 1PB file

Sharding

Horizontal Sharding
- Consistent Hash(row key: name)
multiple machine
- Master has HashMap(key, server address)
- Slave

How to read in BigTable with multi-server

client – 1. Read key –> BigTable Master – 2. Get the row key form key, Use the row key to get server id = 1 –> client

client –3. read key –> Slave Server 1 –4. return value –> client

How to write BigTable

write a key

client – 1. Write key –> BigTable Master – 2. Get the row key form key, Use the row key to get server id = 1 –> client

client –3. Write key, value –> Slave Server 1 –4. Done –> client

Memory

Write Ahead Log
save the data into memory

Disk

SkipList –> full
write to SSTable

Too many data in Slave Server local disk

save data in GFS

Table Server = Store Tablet Slave Server

client –> Tablet Server 1 in BigTable –> GFS

Read and write the same key at the same time

distributed lock

Zookeeper

write a key with distributed lock

client 1 – 1. Write key and lock the key–> Lock Server – 2. Get the row key form key –> client 1 –3. key value –>Tablet Server 1 –4. Done–> client 1

client 1 – 5. unlock key–> Lock Server

client 2 – 1. Read key and lock the key–> Lock Server – 2. key is locked –> client 2

Advantage Distribute Lock

the metadata is store on the Lock
master don’t need store MetaData

Write with lock

client 1 –> 1. Write key and lock the key –>Lock Server– 2. return Server Id–>client 1

client 1 –> 3. key value –>Tablet Server 1– 4. write log –>Lock Disk – 6. write to GFS–> GFS

Tablet Server 1– 5. write key. value–>Memory – 6. write to GFS–> GFS

Read with lock

client 1 –> 1. Read key and lock the key –>Lock Server– 2. return Server Id–>client 1

client 1 –> 3. Read key –>Tablet Server 1– 4. Check memory, Bloom Filter, index on each sstable, –>Memory

Tablet Server 1– 5. Binary search on sstable to find the value–>GFS

I'm Riley Shen