luk 收錄於 Tutorial

2025-01-15 約 2900 字預計閱讀 14 分鐘

🍃 MongoDB Wire Protocol 完整指南

⏱️ 閱讀時間： 12 分鐘 🎯 難度： ⭐⭐ (中等)

🎯 本篇重點

理解 MongoDB Wire Protocol 的原理、BSON 格式、OP_MSG 訊息結構、CRUD 操作流程，以及與關聯式資料庫協定的差異。

🤔 什麼是 MongoDB Wire Protocol？

MongoDB Wire Protocol = MongoDB 客戶端與伺服器之間的通訊協定

一句話解釋： MongoDB Wire Protocol 是基於 BSON（Binary JSON）的二進位協定，設計用於高效傳輸 JSON 文件資料。

比喻：包裹快遞系統

SQL 資料庫 = 整齊的表格
- 每個欄位都有固定位置
- 像是整齊排列的書架

MongoDB = 靈活的文件櫃
- 每個文件可以有不同欄位
- 像是資料夾，每個資料夾內容不同

🏗️ MongoDB Wire Protocol 在網路模型中的位置

OSI 7 層模型

┌──────────────────────────────┬──────────────────────┐
│ 7. Application Layer (應用層) │  MongoDB Wire Protocol│ ← MongoDB 在這裡
├──────────────────────────────┼──────────────────────┤
│ 6. Presentation Layer (表示層)│  BSON (加密、序列化)  │
├──────────────────────────────┼──────────────────────┤
│ 5. Session Layer (會話層)     │  建立、維護會話       │
├──────────────────────────────┼──────────────────────┤
│ 4. Transport Layer (傳輸層)   │  TCP                 │
├──────────────────────────────┼──────────────────────┤
│ 3. Network Layer (網路層)     │  IP                  │
├──────────────────────────────┼──────────────────────┤
│ 2. Data Link Layer (資料鏈結層)│  Ethernet            │
├──────────────────────────────┼──────────────────────┤
│ 1. Physical Layer (實體層)    │  網路線、光纖         │
└──────────────────────────────┴──────────────────────┘

MongoDB Wire Protocol 位於第 7 層（應用層）

MongoDB Wire Protocol 是應用層協定
使用 BSON (Binary JSON) 格式傳輸資料
提供 NoSQL 文件型資料庫通訊

TCP/IP 4 層模型

┌─────────────────────────────┬──────────────────────┐
│ 4. Application Layer (應用層) │  MongoDB Wire Protocol│ ← MongoDB 在這裡
├─────────────────────────────┼──────────────────────┤
│ 3. Transport Layer (傳輸層)  │  TCP                 │
├─────────────────────────────┼──────────────────────┤
│ 2. Internet Layer (網際網路層)│  IP                  │
├─────────────────────────────┼──────────────────────┤
│ 1. Network Access (網路存取層)│  Ethernet            │
└─────────────────────────────┴──────────────────────┘

MongoDB Wire Protocol 位於第 4 層（應用層）

在 TCP/IP 模型中，MongoDB Wire Protocol 是應用層協定
使用 TCP 作為傳輸層協定（Port 27017）
TCP 提供可靠的連線導向傳輸

對比表：

資料庫	協定	協定類型	OSI 層級	TCP/IP 層級	底層協定	Port
MySQL	MySQL Protocol	二進位	Layer 7	Layer 4	TCP	3306
PostgreSQL	PostgreSQL Protocol	二進位	Layer 7	Layer 4	TCP	5432
Redis	RESP	文字	Layer 7	Layer 4	TCP	6379
MongoDB	Wire Protocol	二進位(BSON)	Layer 7	Layer 4	TCP	27017

重點：

MongoDB Wire Protocol 是應用層協定（兩種模型都是）
使用 TCP 作為傳輸層（Port 27017）
二進位協定（BSON），比 JSON 更高效
OP_MSG 是現代主要的訊息類型

🏗️ MongoDB Wire Protocol 特性

與 SQL 資料庫協定對比

特性	MongoDB	MySQL	PostgreSQL	Redis
資料模型	Document（文件）	Table（表格）	Table（表格）	Key-Value
資料格式	BSON	欄位值	欄位值	RESP
Schema	❌ 無（靈活）	✅ 嚴格	✅ 嚴格	N/A
協定類型	二進位（BSON）	二進位	二進位	文字
查詢語言	JSON	SQL	SQL	Commands
預設 Port	27017	3306	5432	6379
適用場景	彈性 Schema	結構化資料	複雜查詢	快取

MongoDB 的特色：

✅ Schema-less：不需要預先定義結構
✅ BSON 格式：Binary JSON，效能更好
✅ 文件導向：每筆資料是一個完整的 JSON 文件
✅ 巢狀結構：支援陣列、巢狀物件
✅ 水平擴展：天生支援分片（Sharding）

📋 BSON（Binary JSON）格式

什麼是 BSON？

BSON = Binary JSON

JSON（文字）：
{
  "name": "Alice",
  "age": 25,
  "email": "alice@example.com"
}
→ 58 bytes（包含空格、引號）

BSON（二進位）：
\x3A\x00\x00\x00          ← 總長度 58 bytes
\x02name\x00\x06\x00\x00\x00Alice\x00
\x10age\x00\x19\x00\x00\x00
\x02email\x00\x15\x00\x00\x00alice@example.com\x00
\x00
→ 58 bytes（更緊湊）

優點：
✅ 類型明確（\x02=String, \x10=Int32）
✅ 快速解析（知道長度，直接跳過）
✅ 豐富的資料類型（Date, Binary, ObjectId）

BSON 資料類型

類型編碼	類型名稱	說明	範例
0x01	Double	64-bit 浮點數	3.14
0x02	String	UTF-8 字串	“hello”
0x03	Object	嵌套文件	{a: 1}
0x04	Array	陣列	[1, 2, 3]
0x05	Binary	二進位資料	Buffer
0x07	ObjectId	MongoDB 文件 ID	ObjectId("…")
0x08	Boolean	布林值	true/false
0x09	Date	UTC 時間（毫秒）	ISODate("…")
0x0A	Null	null 值	null
0x10	Int32	32-bit 整數	42
0x11	Timestamp	MongoDB 時間戳	Timestamp(1, 2)
0x12	Int64	64-bit 整數	9007199254740992
0x13	Decimal128	128-bit 十進位	Decimal128(“1.23”)

SQL 資料庫沒有的類型：

ObjectId：12 bytes 的唯一 ID
Timestamp：MongoDB 內部時間戳
Decimal128：精確十進位（金融計算）

BSON 編碼範例

簡單文件：

{
  name: "Alice",
  age: 25
}

BSON 編碼：
\x1D\x00\x00\x00           // 總長度 29 bytes
  \x02                     // String 類型
  name\x00                 // 欄位名 "name"
  \x06\x00\x00\x00         // 字串長度 6 (包含 null)
  Alice\x00                // 字串值 "Alice"

  \x10                     // Int32 類型
  age\x00                  // 欄位名 "age"
  \x19\x00\x00\x00         // 值 25 (little-endian)
\x00                       // 文件結束

巢狀文件：

{
  user: {
    name: "Alice",
    age: 25
  }
}

BSON 編碼：
\x2A\x00\x00\x00           // 外層總長度
  \x03                     // Object 類型
  user\x00                 // 欄位名 "user"
  \x1D\x00\x00\x00         // 內層文件長度 29
    \x02name\x00\x06\x00\x00\x00Alice\x00
    \x10age\x00\x19\x00\x00\x00
  \x00                     // 內層文件結束
\x00                       // 外層文件結束

陣列：

{
  tags: ["mongodb", "database", "nosql"]
}

BSON 編碼：
\x3F\x00\x00\x00           // 總長度
  \x04                     // Array 類型
  tags\x00                 // 欄位名 "tags"
  \x34\x00\x00\x00         // 陣列長度 52
    \x02 0\x00\x08\x00\x00\x00mongodb\x00   // tags[0]
    \x02 1\x00\x09\x00\x00\x00database\x00  // tags[1]
    \x02 2\x00\x06\x00\x00\x00nosql\x00     // tags[2]
  \x00                     // 陣列結束
\x00                       // 文件結束

注意：陣列索引用字串 "0", "1", "2"

ObjectId 詳解

ObjectId 是 MongoDB 的唯一 ID 格式：

ObjectId("507f1f77bcf86cd799439011")
         └────────────┬────────────┘
              12 bytes (24 hex chars)

結構（12 bytes）：
[0-3]  Timestamp    (4 bytes) - Unix 時間戳（秒）
[4-6]  Machine ID   (3 bytes) - 機器識別碼
[7-8]  Process ID   (2 bytes) - 程序識別碼
[9-11] Counter      (3 bytes) - 遞增計數器

範例：
507f1f77 bcf86c d799 439011
└──┬──┘ └──┬─┘ └┬┘ └──┬──┘
Timestamp Machine PID Counter

優點：
✅ 全域唯一（不需要中央協調）
✅ 有序（時間戳在前）
✅ 包含時間資訊（可反推建立時間）
✅ 分散式友好（不同機器產生不會衝突）

vs SQL 自增 ID：
MySQL AUTO_INCREMENT：
❌ 需要中央協調（單點）
❌ 分散式環境困難
✅ 簡單、連續

MongoDB ObjectId：
✅ 分散式友好
✅ 無需中央協調
❌ 12 bytes（vs 4 bytes INT）

📡 MongoDB Wire Protocol 訊息結構

標準訊息頭（所有訊息共用）

Message Header (16 bytes)：

[0-3]   messageLength    (int32)  - 訊息總長度
[4-7]   requestID        (int32)  - 請求 ID（唯一）
[8-11]  responseTo       (int32)  - 回應哪個請求（0=主動發送）
[12-15] opCode           (int32)  - 操作碼

範例：
\x3D\x00\x00\x00    // messageLength = 61 bytes
\x01\x00\x00\x00    // requestID = 1
\x00\x00\x00\x00    // responseTo = 0 (請求，非回應)
\xDD\x07\x00\x00    // opCode = 2013 (OP_MSG)

OP_MSG（現代 MongoDB 使用）

MongoDB 3.6+ 使用 OP_MSG（推薦）：

OP_MSG 結構：

Message Header (16 bytes)
flagBits (4 bytes)
Sections:
  Section 0: { command BSON document }
  Section 1: { document sequence } (可選)
  ...
Optional Checksum (4 bytes)

範例：查詢使用者
{
  "find": "users",
  "filter": { "age": { "$gt": 18 } },
  "limit": 10
}

完整訊息：
[Header: 16 bytes]
  messageLength: 95
  requestID: 1
  responseTo: 0
  opCode: 2013 (OP_MSG)

[flagBits: 4 bytes]
  0x00000000  // 無特殊旗標

[Section 0]
  kind: 0 (BSON document)
  { find: "users", filter: { age: { $gt: 18 } }, limit: 10 }

[Checksum: 4 bytes]
  0x12345678  // CRC32 checksum

flagBits 說明

Bit	名稱	說明
0	checksumPresent	是否包含 checksum
1	moreToCome	還有更多訊息（streaming）
16	exhaustAllowed	允許 exhaust cursor

moreToCome 範例：

批次插入 10 萬筆資料：

Message 1: { insert: "users", documents: [...1000 docs] }
  flagBits: moreToCome=1  ← 還有更多

Message 2: { insert: "users", documents: [...1000 docs] }
  flagBits: moreToCome=1

...

Message 100: { insert: "users", documents: [...1000 docs] }
  flagBits: moreToCome=0  ← 最後一個

Server 可以批次處理，減少往返！

🔧 CRUD 操作範例

1. Insert（插入）

命令：

db.users.insertOne({
  name: "Alice",
  age: 25,
  email: "alice@example.com"
})

OP_MSG：

{
  "insert": "users",
  "documents": [
    {
      "_id": ObjectId("..."),  // 自動產生
      "name": "Alice",
      "age": 25,
      "email": "alice@example.com"
    }
  ],
  "$db": "testdb"
}

Server 回應：

{
  "ok": 1,
  "n": 1  // 插入 1 筆
}

2. Find（查詢）

命令：

db.users.find({ age: { $gt: 18 } }).limit(10)

OP_MSG：

{
  "find": "users",
  "filter": {
    "age": { "$gt": 18 }
  },
  "limit": 10,
  "$db": "testdb"
}

Server 回應：

{
  "cursor": {
    "id": 12345678901234,  // Cursor ID
    "ns": "testdb.users",
    "firstBatch": [
      { "_id": ObjectId("..."), "name": "Alice", "age": 25, ... },
      { "_id": ObjectId("..."), "name": "Bob", "age": 30, ... },
      ...
    ]
  },
  "ok": 1
}

如果結果很多，使用 Cursor：

// 第一次 find 回傳 firstBatch + cursor ID

// 後續用 getMore 取得更多資料
{
  "getMore": 12345678901234,  // Cursor ID
  "collection": "users",
  "$db": "testdb"
}

// 回應
{
  "cursor": {
    "id": 12345678901234,
    "ns": "testdb.users",
    "nextBatch": [
      { ... },
      { ... },
      ...
    ]
  },
  "ok": 1
}

// 當 cursor.id = 0，代表沒有更多資料

3. Update（更新）

命令：

db.users.updateOne(
  { name: "Alice" },
  { $set: { age: 26 } }
)

OP_MSG：

{
  "update": "users",
  "updates": [
    {
      "q": { "name": "Alice" },      // query
      "u": { "$set": { "age": 26 } }, // update
      "multi": false                  // updateOne
    }
  ],
  "$db": "testdb"
}

Server 回應：

{
  "ok": 1,
  "nModified": 1,  // 修改 1 筆
  "n": 1           // 匹配 1 筆
}

4. Delete（刪除）

命令：

db.users.deleteMany({ age: { $lt: 18 } })

OP_MSG：

{
  "delete": "users",
  "deletes": [
    {
      "q": { "age": { "$lt": 18 } },
      "limit": 0  // 0 = deleteMany, 1 = deleteOne
    }
  ],
  "$db": "testdb"
}

Server 回應：

{
  "ok": 1,
  "n": 5  // 刪除 5 筆
}

5. Aggregation（聚合）

命令：

db.orders.aggregate([
  { $match: { status: "completed" } },
  { $group: { _id: "$customerId", total: { $sum: "$amount" } } },
  { $sort: { total: -1 } },
  { $limit: 10 }
])

OP_MSG：

{
  "aggregate": "orders",
  "pipeline": [
    { "$match": { "status": "completed" } },
    { "$group": { "_id": "$customerId", "total": { "$sum": "$amount" } } },
    { "$sort": { "total": -1 } },
    { "$limit": 10 }
  ],
  "cursor": { "batchSize": 100 },
  "$db": "testdb"
}

Server 回應：

{
  "cursor": {
    "id": 0,  // 結果全部在 firstBatch
    "ns": "testdb.orders",
    "firstBatch": [
      { "_id": "customer1", "total": 5000 },
      { "_id": "customer2", "total": 4500 },
      ...
    ]
  },
  "ok": 1
}

🔄 MongoDB vs SQL 協定對比

查詢方式對比

SQL（MySQL/PostgreSQL）：

-- 查詢
SELECT name, age FROM users WHERE age > 18 LIMIT 10

-- 協定
Text/Binary Protocol
Parse: "SELECT name, age FROM users WHERE age > ? LIMIT ?"
Bind: [18, 10]
Execute

-- 結果
RowDescription: [name: VARCHAR, age: INT]
DataRow: ["Alice", 25]
DataRow: ["Bob", 30]
...

NoSQL（MongoDB）：

// 查詢
db.users.find({ age: { $gt: 18 } }).limit(10)

// 協定
OP_MSG
{
  find: "users",
  filter: { age: { $gt: 18 } },
  limit: 10
}

// 結果
{
  cursor: {
    firstBatch: [
      { _id: ObjectId(...), name: "Alice", age: 25, ... },
      { _id: ObjectId(...), name: "Bob", age: 30, ... },
      ...
    ]
  }
}

關鍵差異：

SQL：
- 結構化查詢語言（字串）
- 欄位固定（RowDescription）
- 每一列值的順序相同

MongoDB：
- JSON 查詢（BSON 文件）
- 每個文件可以有不同欄位
- 完整的文件結構

資料插入對比

SQL：

INSERT INTO users (name, age, email)
VALUES ('Alice', 25, 'alice@example.com');

-- 問題：
❌ 需要預先定義 Schema
❌ 新增欄位需要 ALTER TABLE
❌ 每筆資料結構必須相同

MongoDB：

db.users.insertOne({
  name: "Alice",
  age: 25,
  email: "alice@example.com",
  hobbies: ["reading", "coding"],  // 陣列，SQL 需要額外表格
  address: {                        // 巢狀物件，SQL 需要 JOIN
    city: "Taipei",
    country: "Taiwan"
  }
})

// 優點：
✅ 無需預先定義 Schema
✅ 彈性新增欄位
✅ 支援巢狀結構、陣列

更新操作對比

SQL：

UPDATE users
SET age = 26, updated_at = NOW()
WHERE name = 'Alice';

-- 限制：
❌ 無法原子性操作陣列
❌ 複雜的巢狀更新需要多個 SQL

MongoDB：

db.users.updateOne(
  { name: "Alice" },
  {
    $set: { age: 26 },
    $push: { hobbies: "swimming" },  // 陣列新增元素
    $inc: { loginCount: 1 },          // 遞增
    $currentDate: { lastModified: true }
  }
)

// 優點：
✅ 豐富的更新運算子（$set, $push, $inc, $pull, etc.）
✅ 原子性操作陣列和巢狀物件
✅ 一個命令完成複雜更新

🎓 面試常見問題

Q1：什麼是 BSON？和 JSON 有什麼差異？

A：BSON = Binary JSON

BSON 是 MongoDB 使用的二進位格式

JSON（文字）：
{
  "name": "Alice",
  "age": 25,
  "active": true
}
→ 所有資料都是字串
→ 需要 parse（解析）

BSON（二進位）：
\x23\x00\x00\x00
\x02name\x00\x06\x00\x00\x00Alice\x00
\x10age\x00\x19\x00\x00\x00
\x08active\x00\x01
\x00
→ 類型明確（\x02=String, \x10=Int32, \x08=Boolean）
→ 可直接讀取，無需 parse

主要差異：

1. 資料類型
   JSON：
   - String, Number, Boolean, Null, Array, Object

   BSON：
   - String, Int32, Int64, Double, Boolean, Null, Array, Object
   - ObjectId, Date, Binary, Decimal128, Timestamp, Regex...

2. 效能
   JSON：
   - 文字格式，需要 parse
   - "123456" → parseInt() → 6 bytes

   BSON：
   - 二進位格式，直接讀取
   - 123456 → 4 bytes (Int32)

3. 大小
   JSON：
   { "age": 25 } → 12 bytes（包含引號、空格）

   BSON：
   \x10age\x00\x19\x00\x00\x00 → 9 bytes
   → 更緊湊

4. 可遍歷性
   JSON：
   需要 parse 整個文件才能知道結構

   BSON：
   每個元素都有長度標記
   → 可以快速跳過不需要的欄位

範例：
BSON 文件：
\x50\x00\x00\x00  ← 總長度 80 bytes
\x02name\x00\x06\x00\x00\x00Alice\x00  ← 15 bytes
\x02address\x00\x28\x00\x00\x00...     ← 知道長度 40，可直接跳過

優點：
✅ 類型豐富（ObjectId, Date, Binary）
✅ 快速解析（無需 parse）
✅ 可遍歷（知道長度）

缺點：
❌ 人類不可讀（二進位）
❌ 稍大（需要類型標記和長度）

結論：
BSON = JSON + 二進位 + 更多類型
適合資料庫儲存和網路傳輸

Q2：MongoDB Wire Protocol 和 SQL 資料庫協定有什麼差異？

A：文件導向 vs 表格導向

核心差異：

1️⃣ 資料模型
SQL（MySQL/PostgreSQL）：
- 表格（Table）
- 每列固定欄位
- 關聯式（JOIN）

範例：
users 表格：
| id | name  | age |
|----|-------|-----|
| 1  | Alice | 25  |
| 2  | Bob   | 30  |

MongoDB：
- 文件（Document）
- 每個文件可以不同
- 嵌套式（Embedded）

範例：
{
  _id: ObjectId(...),
  name: "Alice",
  age: 25,
  hobbies: ["reading", "coding"],  ← 陣列
  address: { city: "Taipei" }      ← 巢狀
}

{
  _id: ObjectId(...),
  name: "Bob",
  age: 30
  // 沒有 hobbies, address ← 彈性！
}

2️⃣ 查詢語言
SQL：
SELECT name, age FROM users WHERE age > 18

MongoDB：
db.users.find({ age: { $gt: 18 } })
→ JSON 格式查詢

3️⃣ 協定格式
MySQL/PostgreSQL：
- 二進位協定
- 欄位值（固定類型）

MongoDB：
- BSON 協定
- 完整文件（彈性類型）

4️⃣ Schema
SQL：
CREATE TABLE users (
  id INT PRIMARY KEY,
  name VARCHAR(50) NOT NULL,
  age INT
);
→ 必須預先定義

MongoDB：
❌ 無需定義 Schema
✅ 隨時新增欄位
✅ 每個文件可以不同

5️⃣ 更新操作
SQL：
UPDATE users SET age = 26 WHERE id = 1;

MongoDB：
db.users.updateOne(
  { _id: ObjectId(...) },
  {
    $set: { age: 26 },
    $push: { hobbies: "swimming" },
    $inc: { loginCount: 1 }
  }
)
→ 豐富的更新運算子

6️⃣ JOIN vs Embedded
SQL：
SELECT u.name, o.amount
FROM users u
JOIN orders o ON u.id = o.user_id

MongoDB：
{
  _id: ObjectId(...),
  name: "Alice",
  orders: [               ← 直接嵌入
    { amount: 100, date: ... },
    { amount: 200, date: ... }
  ]
}
→ 減少 JOIN，單一查詢取得所有資料

7️⃣ 擴展性
SQL：
- 垂直擴展（升級硬體）
- 水平擴展困難（分片複雜）

MongoDB：
- 水平擴展（Sharding 內建）
- 自動分片、負載平衡

總結：
SQL 協定 → 結構化、嚴格 Schema、關聯式
MongoDB 協定 → 彈性、文件導向、易擴展

Q3：ObjectId 是什麼？和 SQL 的 AUTO_INCREMENT 有什麼差異？

A：分散式友好的唯一 ID

ObjectId 結構（12 bytes）：
ObjectId("507f1f77bcf86cd799439011")

[0-3]  Timestamp  (4 bytes) - Unix 時間戳
[4-6]  Machine    (3 bytes) - 機器識別碼
[7-8]  Process    (2 bytes) - 程序識別碼
[9-11] Counter    (3 bytes) - 遞增計數器

範例：
507f1f77 bcf86c d799 439011
└──┬──┘ └──┬─┘ └┬┘ └──┬──┘
   │      │    │    └─ Counter (4387857)
   │      │    └────── Process ID (55193)
   │      └─────────── Machine ID (bcf86c)
   └────────────────── Timestamp (2012-10-17 20:46:47)

vs SQL AUTO_INCREMENT：

MySQL AUTO_INCREMENT：
優點：
✅ 簡單（1, 2, 3, 4...）
✅ 連續
✅ 4 bytes（省空間）

缺點：
❌ 單點（需要中央協調）
❌ 分散式環境困難
❌ 無時間資訊

分散式場景：
Server 1: id = 1, 2, 3...
Server 2: id = 1, 2, 3...  ← 衝突！

解決：UUID
→ 但 UUID 是 16 bytes，無序，查詢慢

MongoDB ObjectId：
優點：
✅ 分散式友好（不需中央協調）
✅ 全域唯一
✅ 有序（時間戳在前）
✅ 包含時間資訊

缺點：
❌ 12 bytes（vs 4 bytes INT）
❌ 不連續

分散式場景：
Server 1: ObjectId("507f1f77bcf86c...")
Server 2: ObjectId("507f1f77aef12d...")
→ 不衝突！（不同 Machine ID）

實際應用：

SQL（單機）：
CREATE TABLE users (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(50)
);

MongoDB（分散式）：
{
  _id: ObjectId("507f1f77bcf86cd799439011"),  ← 自動產生
  name: "Alice"
}

// 可以手動指定
db.users.insertOne({
  _id: "my-custom-id",  ← 也可以用字串
  name: "Bob"
})

從 ObjectId 取得時間：
objectId.getTimestamp()
→ 2012-10-17 20:46:47

適用場景：
✅ 分散式系統 → ObjectId
✅ 單機、需要連續 ID → AUTO_INCREMENT
✅ 需要完全隨機 → UUID

結論：
ObjectId 是為分散式 NoSQL 設計的 ID
比 AUTO_INCREMENT 更適合水平擴展

Q4：MongoDB 如何處理大量資料？Cursor 的作用是什麼？

A：使用 Cursor 分批取得資料

問題：
如果查詢結果有 100 萬筆：
db.users.find()

如果一次全部回傳：
→ Server 記憶體爆滿
→ 網路傳輸巨大
→ Client 記憶體爆滿

解決：Cursor（遊標）

原理：
1. Client 發送 find 查詢
2. Server 回傳第一批資料（firstBatch）+ Cursor ID
3. Client 用 Cursor ID 繼續取得下一批（getMore）
4. 重複步驟 3，直到 Cursor ID = 0（沒有更多資料）

流程：

// 第 1 次：find
Client → Server:
{
  find: "users",
  batchSize: 100  ← 每批 100 筆
}

Server → Client:
{
  cursor: {
    id: 12345678901234,  ← Cursor ID
    ns: "testdb.users",
    firstBatch: [
      { _id: ..., name: "User1" },
      { _id: ..., name: "User2" },
      ...  // 100 筆
    ]
  },
  ok: 1
}

// 第 2 次：getMore
Client → Server:
{
  getMore: 12345678901234,  ← 用 Cursor ID 繼續取
  collection: "users",
  batchSize: 100
}

Server → Client:
{
  cursor: {
    id: 12345678901234,  ← 還有更多
    ns: "testdb.users",
    nextBatch: [
      { _id: ..., name: "User101" },
      { _id: ..., name: "User102" },
      ...  // 100 筆
    ]
  },
  ok: 1
}

// 第 N 次：getMore（最後一批）
Server → Client:
{
  cursor: {
    id: 0,  ← 0 代表沒有更多資料
    ns: "testdb.users",
    nextBatch: [
      { _id: ..., name: "User9901" },
      ...  // 剩餘筆數
    ]
  },
  ok: 1
}

Python 範例：
from pymongo import MongoClient

client = MongoClient()
db = client.testdb

# Cursor 自動處理
for user in db.users.find():
    print(user['name'])
    # pymongo 會自動發送 getMore

# 手動控制 batch size
cursor = db.users.find().batch_size(100)
for user in cursor:
    print(user['name'])

# 限制記憶體使用
cursor = db.users.find().limit(1000)
→ 只取 1000 筆

優點：
✅ 記憶體控制（不會一次載入全部）
✅ 網路優化（分批傳輸）
✅ 即時處理（邊取邊處理）

注意：
❌ Cursor 有 10 分鐘 timeout
❌ 如果 10 分鐘沒有 getMore，Server 會自動關閉 Cursor

設定 timeout：
cursor = db.users.find().max_time_ms(60000)  # 60 秒

vs SQL：
SQL 也有 Cursor，但較少用
MongoDB Cursor 是預設行為，更常用

結論：
Cursor 是 MongoDB 處理大量資料的核心機制
自動分批取得，避免記憶體和網路問題

Q5：MongoDB 適合什麼場景？什麼時候該用 SQL 資料庫？

A：根據資料特性選擇

✅ 適合 MongoDB 的場景：

1️⃣ Schema 經常變動
範例：
用戶資料可能有各種不同欄位
- 一般用戶：{ name, email }
- VIP 用戶：{ name, email, vipLevel, vipExpireDate }
- 企業用戶：{ name, email, company, taxId }

SQL：需要 ALTER TABLE 或多個表格
MongoDB：直接插入不同結構的文件 ✅

2️⃣ 需要巢狀/陣列結構
範例：部落格文章
{
  _id: ObjectId(...),
  title: "MongoDB Tutorial",
  content: "...",
  tags: ["mongodb", "database", "nosql"],  ← 陣列
  comments: [                               ← 巢狀陣列
    { user: "Alice", text: "Great!" },
    { user: "Bob", text: "Thanks!" }
  ]
}

SQL：需要 3 個表格 + JOIN
MongoDB：單一文件 ✅

3️⃣ 水平擴展需求
範例：社交網路、IoT 資料
- 資料量快速增長
- 需要分片（Sharding）

MongoDB：內建 Sharding ✅
SQL：分片複雜，需要中介軟體

4️⃣ 讀多寫多
範例：即時分析、日誌系統
MongoDB：高寫入效能 ✅

5️⃣ 原型開發
快速迭代，Schema 不確定
MongoDB：無需預先定義 Schema ✅

---

✅ 適合 SQL（MySQL/PostgreSQL）的場景：

1️⃣ 資料結構固定且清晰
範例：財務系統
- 欄位明確：account_id, amount, date
- 不會經常變動

SQL：強制 Schema，保證資料一致性 ✅

2️⃣ 需要複雜 JOIN
範例：電商訂單系統
SELECT u.name, o.total, p.name
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN products p ON o.product_id = p.id
WHERE o.date > '2025-01-01'

SQL：強大的 JOIN 能力 ✅
MongoDB：需要 $lookup（較慢）或資料冗餘

3️⃣ 需要 ACID 事務
範例：銀行轉帳
BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

SQL：強 ACID 保證 ✅
MongoDB：4.0+ 支援多文件事務，但效能較差

4️⃣ 複雜查詢、聚合
範例：商業智慧分析
SQL：強大的 SQL 語法（GROUP BY, HAVING, Window Functions）✅
MongoDB：Aggregation Pipeline 可以做，但較複雜

5️⃣ 資料完整性約束
範例：用戶管理
- UNIQUE constraint（email 不可重複）
- FOREIGN KEY（外鍵約束）
- CHECK constraint（年齡 > 0）

SQL：資料庫層級約束 ✅
MongoDB：應用程式層級檢查

---

決策樹：

資料結構是否固定？
├─ 是 → SQL
└─ 否
    └─ 是否需要複雜 JOIN？
        ├─ 是 → SQL
        └─ 否
            └─ 是否需要水平擴展？
                ├─ 是 → MongoDB
                └─ 否
                    └─ 是否有巢狀/陣列結構？
                        ├─ 是 → MongoDB
                        └─ 否 → SQL（更成熟）

混合使用：
很多公司同時使用：
- SQL（核心業務資料）
- MongoDB（日誌、用戶行為、快取）
- Redis（會話、即時資料）

範例：
- 訂單資料 → PostgreSQL（ACID 重要）
- 用戶瀏覽記錄 → MongoDB（彈性、高寫入）
- 購物車 → Redis（快速存取）

結論：
沒有銀彈！根據資料特性和業務需求選擇

✅ 重點回顧

MongoDB Wire Protocol 特性：

BSON 格式（Binary JSON）
OP_MSG 訊息結構
文件導向（vs SQL 表格導向）
Schema-less（彈性結構）

BSON 核心：

二進位 JSON
豐富的資料類型（ObjectId, Date, Binary, Decimal128）
快速解析（類型明確、可遍歷）
稍大但更高效

ObjectId：

12 bytes 唯一 ID
包含時間戳、機器 ID、程序 ID、計數器
分散式友好（無需中央協調）
有序（時間戳在前）

CRUD 操作：

Insert: { insert: “collection”, documents: […] }
Find: { find: “collection”, filter: {…} }
Update: { update: “collection”, updates: […] }
Delete: { delete: “collection”, deletes: […] }
Aggregate: { aggregate: “collection”, pipeline: […] }

vs SQL 資料庫：

MongoDB：彈性、文件導向、易擴展
SQL：結構化、強 ACID、複雜查詢

面試重點：

✅ BSON vs JSON 差異
✅ MongoDB vs SQL 協定對比
✅ ObjectId vs AUTO_INCREMENT
✅ Cursor 機制
✅ 適用場景選擇

記憶口訣：

「文件彈性，水平擴」= Document, Schema-less, Sharding

🎉 資料庫協定系列完結

恭喜！你已經完成整個資料庫協定系列（4/4）：

✅ MySQL Protocol - 關聯式資料庫經典
✅ PostgreSQL Protocol - 進階 SQL 功能
✅ Redis Protocol (RESP) - 快取與 NoSQL
✅ MongoDB Wire Protocol - 文件導向 NoSQL

你現在掌握：

📊 SQL vs NoSQL 協定差異
🔐 Text vs Binary Protocol
💾 連線池與效能優化
🚀 批次操作（COPY, Pipeline）
🔒 SQL Injection 防護
📡 不同資料庫的設計哲學

上一篇： 05-3. Redis Protocol (RESP) 下一篇： 06-1. DNS 域名解析

相關文章：

最後更新：2025-01-15

05-4. MongoDB Wire Protocol 完整指南

🍃 MongoDB Wire Protocol 完整指南

🎯 本篇重點

🤔 什麼是 MongoDB Wire Protocol？

🏗️ MongoDB Wire Protocol 在網路模型中的位置

OSI 7 層模型

TCP/IP 4 層模型

🏗️ MongoDB Wire Protocol 特性

與 SQL 資料庫協定對比

📋 BSON（Binary JSON）格式

什麼是 BSON？

BSON 資料類型

BSON 編碼範例

ObjectId 詳解

📡 MongoDB Wire Protocol 訊息結構

標準訊息頭（所有訊息共用）

OP_MSG（現代 MongoDB 使用）

flagBits 說明

🔧 CRUD 操作範例

1. Insert（插入）

2. Find（查詢）

3. Update（更新）

4. Delete（刪除）

5. Aggregation（聚合）

🔄 MongoDB vs SQL 協定對比

查詢方式對比

資料插入對比

更新操作對比

🎓 面試常見問題

Q1：什麼是 BSON？和 JSON 有什麼差異？

Q2：MongoDB Wire Protocol 和 SQL 資料庫協定有什麼差異？

Q3：ObjectId 是什麼？和 SQL 的 AUTO_INCREMENT 有什麼差異？

Q4：MongoDB 如何處理大量資料？Cursor 的作用是什麼？

Q5：MongoDB 適合什麼場景？什麼時候該用 SQL 資料庫？

✅ 重點回顧

🎉 資料庫協定系列完結