2025-01-17 約 1200 字預計閱讀 6 分鐘

10-3. 索引優化技巧（Database Index Optimization）

📌 什麼是索引？

簡單說： 索引就像書的目錄，幫助快速找到數據

定義： 數據庫索引是一種數據結構，用於加速數據查詢，就像書籍的目錄可以快速定位章節一樣。

🔍 索引的工作原理

沒有索引的查詢（全表掃描）

-- 查詢用戶
SELECT * FROM users WHERE email = 'user@example.com';

-- 數據庫操作：
第 1 行: email = 'alice@example.com' → 不匹配
第 2 行: email = 'bob@example.com' → 不匹配
第 3 行: email = 'charlie@example.com' → 不匹配
...
第 100萬行: email = 'user@example.com' → 匹配！✓

時間複雜度: O(n) - 掃描所有行

性能： 100 萬行數據 → 需要掃描 100 萬行

有索引的查詢（B-Tree 索引）

B-Tree 索引結構：

                    [m]
                   /   \
              [c-l]     [n-z]
             /    \     /    \
        [a-f]  [g-l] [m-r] [s-z]
         /  \
    [a-c] [d-f]
     / \    / \
   [a][b][c][d][e][f]
         ↓
   [user@example.com] → 指向實際數據行

性能： 100 萬行數據 → 只需 log2(1,000,000) ≈ 20 次比較

速度提升： 1,000,000 / 20 = 50,000 倍！

🎯 Django 中的索引配置

1. 單字段索引

# models.py
class User(models.Model):
    username = models.CharField(max_length=50)
    email = models.EmailField(db_index=True)  # 添加索引
    phone = models.CharField(max_length=20)
    created_at = models.DateTimeField(auto_now_add=True, db_index=True)

    class Meta:
        indexes = [
            models.Index(fields=['username']),  # 另一種方式
            models.Index(fields=['phone'], name='phone_idx'),  # 自定義索引名
        ]

生成的 SQL：

CREATE INDEX users_email_idx ON users (email);
CREATE INDEX users_username_idx ON users (username);
CREATE INDEX phone_idx ON users (phone);
CREATE INDEX users_created_at_idx ON users (created_at);

2. 複合索引（Composite Index）

class Post(models.Model):
    title = models.CharField(max_length=200)
    author = models.ForeignKey(User, on_delete=models.CASCADE)
    status = models.CharField(max_length=20)  # draft, published, archived
    created_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        indexes = [
            # 複合索引：狀態 + 創建時間
            models.Index(fields=['status', 'created_at']),
            # 複合索引：作者 + 狀態
            models.Index(fields=['author', 'status']),
        ]

適用查詢：

# ✅ 會使用索引 (status, created_at)
Post.objects.filter(status='published').order_by('-created_at')

# ✅ 會使用索引 (author, status)
Post.objects.filter(author=user, status='published')

# ❌ 不會使用索引 (只查詢第二個字段)
Post.objects.filter(created_at__gte=date)

3. 唯一索引（Unique Index）

class User(models.Model):
    username = models.CharField(max_length=50, unique=True)  # 自動創建唯一索引
    email = models.EmailField(unique=True)
    phone = models.CharField(max_length=20)

    class Meta:
        constraints = [
            # 複合唯一約束
            models.UniqueConstraint(
                fields=['phone', 'country_code'],
                name='unique_phone_per_country'
            ),
        ]

4. 部分索引（Partial Index）

class Post(models.Model):
    title = models.CharField(max_length=200)
    status = models.CharField(max_length=20)
    created_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        indexes = [
            # 只為已發布的文章創建索引
            models.Index(
                fields=['created_at'],
                name='published_posts_date_idx',
                condition=models.Q(status='published')
            ),
        ]

SQL：

CREATE INDEX published_posts_date_idx
ON posts (created_at)
WHERE status = 'published';

優點： 索引更小，查詢更快

📊 複合索引的順序很重要！

索引順序規則

規則： 複合索引遵循「最左前綴原則」

class Order(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE)
    status = models.CharField(max_length=20)
    created_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        indexes = [
            # 複合索引：(user, status, created_at)
            models.Index(fields=['user', 'status', 'created_at']),
        ]

會使用索引的查詢

# ✅ 使用完整索引 (user, status, created_at)
Order.objects.filter(user=user, status='pending').order_by('-created_at')

# ✅ 使用部分索引 (user, status)
Order.objects.filter(user=user, status='pending')

# ✅ 使用部分索引 (user)
Order.objects.filter(user=user)

不會使用索引的查詢

# ❌ 跳過了第一個字段 (user)
Order.objects.filter(status='pending')

# ❌ 只查詢最後一個字段
Order.objects.filter(created_at__gte=date)

# ❌ 跳過了中間字段 (status)
Order.objects.filter(user=user, created_at__gte=date)

如何選擇索引順序？

原則 1：高選擇性字段在前

# ❌ 不好：status 的選擇性低（只有幾個值）
models.Index(fields=['status', 'user', 'created_at'])

# ✅ 好：user 的選擇性高（每個用戶唯一）
models.Index(fields=['user', 'status', 'created_at'])

選擇性計算：

# 選擇性 = 不同值的數量 / 總行數

# status: 3 種狀態 / 1,000,000 行 = 0.000003（低選擇性）
# user_id: 10,000 個用戶 / 1,000,000 行 = 0.01（高選擇性）
# id: 1,000,000 個唯一值 / 1,000,000 行 = 1（完美選擇性）

原則 2：常用查詢條件在前

# 如果 90% 的查詢是按 user 過濾
models.Index(fields=['user', 'status', 'created_at'])

# 如果 90% 的查詢是按 status 過濾
models.Index(fields=['status', 'user', 'created_at'])

原則 3：範圍查詢放最後

# ✅ 好：範圍查詢（created_at）放最後
models.Index(fields=['status', 'created_at'])

# 適用查詢
Post.objects.filter(status='published', created_at__gte=date)

⚠️ 索引失效的場景

1. 使用函數或表達式

# ❌ 索引失效：對索引字段使用函數
User.objects.filter(email__upper='USER@EXAMPLE.COM')

# ✅ 好：使用函數索引（PostgreSQL）
class User(models.Model):
    email = models.EmailField()

    class Meta:
        indexes = [
            models.Index(
                fields=['email'],
                name='email_upper_idx',
                opclasses=['text_pattern_ops']
            ),
        ]

# 或使用 __iexact（不區分大小寫）
User.objects.filter(email__iexact='user@example.com')

2. 使用 OR 查詢

# ❌ 可能不使用索引
User.objects.filter(
    models.Q(username='alice') | models.Q(email='alice@example.com')
)

# ✅ 改進：分別查詢再合併（如果可能）
users_by_username = User.objects.filter(username='alice')
users_by_email = User.objects.filter(email='alice@example.com')
users = users_by_username.union(users_by_email)

3. 使用 != 或 NOT

# ❌ 索引失效
Post.objects.filter(status__ne='draft')

# ✅ 好：使用 IN
Post.objects.filter(status__in=['published', 'archived'])

4. LIKE 開頭使用通配符

# ❌ 索引失效：% 在開頭
User.objects.filter(email__contains='example')  # %example%
User.objects.filter(email__endswith='example.com')  # %example.com

# ✅ 可以使用索引
User.objects.filter(email__startswith='user')  # user%

5. 隱式類型轉換

# models.py
class Product(models.Model):
    code = models.CharField(max_length=20, db_index=True)

# ❌ 隱式類型轉換，索引失效
Product.objects.filter(code=12345)  # 整數查詢字符串字段

# ✅ 好
Product.objects.filter(code='12345')

🎯 實戰案例

案例 1：電商搜索優化

# models.py
class Product(models.Model):
    name = models.CharField(max_length=200)
    category = models.ForeignKey(Category, on_delete=models.CASCADE)
    price = models.DecimalField(max_digits=10, decimal_places=2)
    is_active = models.BooleanField(default=True)
    created_at = models.DateTimeField(auto_now_add=True)

    class Meta:
        indexes = [
            # 1. 分類 + 上架狀態 + 價格（常用篩選條件）
            models.Index(fields=['category', 'is_active', 'price']),

            # 2. 只為上架商品創建索引
            models.Index(
                fields=['created_at'],
                name='active_products_date',
                condition=models.Q(is_active=True)
            ),

            # 3. 全文搜索索引（PostgreSQL）
            # 需要使用 GinIndex
        ]

# views.py
def product_list(request):
    category = request.GET.get('category')
    min_price = request.GET.get('min_price', 0)
    max_price = request.GET.get('max_price', 99999)

    # ✅ 會使用索引 (category, is_active, price)
    products = Product.objects.filter(
        category_id=category,
        is_active=True,
        price__gte=min_price,
        price__lte=max_price
    ).order_by('price')

    return render(request, 'products.html', {'products': products})

案例 2：用戶活動日誌

# models.py
class ActivityLog(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE)
    action = models.CharField(max_length=50)
    created_at = models.DateTimeField(auto_now_add=True)
    ip_address = models.GenericIPAddressField()

    class Meta:
        indexes = [
            # 1. 用戶 + 時間（查詢用戶歷史）
            models.Index(fields=['user', '-created_at']),

            # 2. 動作 + 時間（統計特定動作）
            models.Index(fields=['action', '-created_at']),

            # 3. 時間（定期清理舊數據）
            models.Index(fields=['-created_at']),
        ]

# 查詢示例
def user_activities(user, days=30):
    since = timezone.now() - timedelta(days=days)

    # ✅ 使用索引 (user, created_at)
    return ActivityLog.objects.filter(
        user=user,
        created_at__gte=since
    ).order_by('-created_at')

案例 3：社交媒體帖子

# models.py
class Post(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE)
    content = models.TextField()
    is_pinned = models.BooleanField(default=False)
    is_public = models.BooleanField(default=True)
    created_at = models.DateTimeField(auto_now_add=True)
    likes_count = models.IntegerField(default=0)

    class Meta:
        indexes = [
            # 1. 公開帖子 + 置頂 + 時間
            models.Index(
                fields=['-is_pinned', '-created_at'],
                name='public_posts_idx',
                condition=models.Q(is_public=True)
            ),

            # 2. 用戶帖子 + 時間
            models.Index(fields=['user', '-created_at']),

            # 3. 熱門帖子（按讚數）
            models.Index(
                fields=['-likes_count', '-created_at'],
                name='hot_posts_idx',
                condition=models.Q(is_public=True, likes_count__gte=100)
            ),
        ]

# 查詢示例
def get_feed(user):
    # ✅ 使用部分索引
    return Post.objects.filter(
        is_public=True
    ).order_by('-is_pinned', '-created_at')[:50]

📊 索引的代價

優點 ✅

加快查詢速度（SELECT）
加快排序速度（ORDER BY）
加快 JOIN 速度
保證唯一性（UNIQUE）

缺點 ⚠️

佔用磁盤空間

無索引：100 萬行 × 1KB = 1GB
5 個索引：5 × 200MB = 1GB
總計：2GB（翻倍！）

降低寫入速度

# 每次 INSERT/UPDATE/DELETE 都需要更新索引

無索引：
INSERT INTO users VALUES (...);  # 1 次寫入

有 5 個索引：
INSERT INTO users VALUES (...);  # 1 次寫入
UPDATE index_1 ...;              # 5 次索引更新
UPDATE index_2 ...;
UPDATE index_3 ...;
UPDATE index_4 ...;
UPDATE index_5 ...;

索引碎片
- 頻繁的 UPDATE/DELETE 會產生碎片
- 需要定期重建索引

🔧 索引維護

1. 查看索引使用情況（PostgreSQL）

-- 查看未使用的索引
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan as index_scans
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY schemaname, tablename;

2. 查看索引大小

-- PostgreSQL
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexname::regclass)) as size
FROM pg_indexes
WHERE tablename = 'your_table';

3. 重建索引

-- PostgreSQL
REINDEX INDEX index_name;
REINDEX TABLE table_name;

-- MySQL
OPTIMIZE TABLE table_name;

4. Django 管理命令

# management/commands/analyze_indexes.py
from django.core.management.base import BaseCommand
from django.db import connection

class Command(BaseCommand):
    help = '分析數據庫索引使用情況'

    def handle(self, *args, **options):
        with connection.cursor() as cursor:
            cursor.execute("""
                SELECT
                    tablename,
                    indexname,
                    idx_scan,
                    idx_tup_read,
                    idx_tup_fetch
                FROM pg_stat_user_indexes
                ORDER BY idx_scan DESC;
            """)

            for row in cursor.fetchall():
                self.stdout.write(f"{row[0]}.{row[1]}: {row[2]} scans")

💡 最佳實踐

1. 只在需要的地方加索引

# ❌ 過度索引
class User(models.Model):
    username = models.CharField(max_length=50, db_index=True)
    email = models.EmailField(db_index=True)
    first_name = models.CharField(max_length=50, db_index=True)
    last_name = models.CharField(max_length=50, db_index=True)
    bio = models.TextField(db_index=True)  # 錯誤！TextField 不應加索引
    created_at = models.DateTimeField(db_index=True)
    updated_at = models.DateTimeField(db_index=True)

# ✅ 合理索引
class User(models.Model):
    username = models.CharField(max_length=50, unique=True)
    email = models.EmailField(unique=True)
    first_name = models.CharField(max_length=50)
    last_name = models.CharField(max_length=50)
    bio = models.TextField()
    created_at = models.DateTimeField(db_index=True)
    updated_at = models.DateTimeField()

2. 基於查詢模式設計索引

# 分析查詢模式
# 80% 的查詢：按 status 和 created_at 過濾
# 15% 的查詢：按 author 過濾
# 5% 的查詢：按 category 過濾

class Post(models.Model):
    class Meta:
        indexes = [
            models.Index(fields=['status', '-created_at']),  # 80% 的查詢
            models.Index(fields=['author', '-created_at']),  # 15% 的查詢
            # category 不加索引（使用率低）
        ]

3. 監控查詢性能

# settings.py
LOGGING = {
    'version': 1,
    'handlers': {
        'file': {
            'class': 'logging.FileHandler',
            'filename': 'slow_queries.log',
        },
    },
    'loggers': {
        'django.db.backends': {
            'handlers': ['file'],
            'level': 'DEBUG',
            'filters': ['slow_query'],
        },
    },
    'filters': {
        'slow_query': {
            '()': 'path.to.SlowQueryFilter',
        }
    }
}

# filters.py
class SlowQueryFilter(logging.Filter):
    def filter(self, record):
        # 只記錄執行時間超過 100ms 的查詢
        duration = float(record.duration)
        return duration > 0.1

4. 定期審查索引

# 每季度檢查：
# 1. 未使用的索引
# 2. 重複的索引
# 3. 索引大小
# 4. 查詢性能

💡 面試要點

Q1: 什麼是索引？為什麼能加快查詢？

答：

索引是特殊的數據結構（通常是 B-Tree）
通過有序結構實現二分查找
時間複雜度從 O(n) 降到 O(log n)

Q2: 為什麼不給所有字段都加索引？

答：

佔用大量磁盤空間
降低寫入性能（INSERT/UPDATE/DELETE）
維護成本高（索引碎片）

Q3: 複合索引的最左前綴原則是什麼？

答：

索引 (a, b, c) 可以用於：(a), (a,b), (a,b,c)
不能用於：(b), (c), (b,c)
因為 B-Tree 按第一個字段排序

Q4: 如何判斷查詢是否使用了索引？

答：

使用 EXPLAIN 分析查詢計劃
查看 type 字段：index/range 表示使用索引
查看 key 字段：顯示使用的索引名稱

🔗 下一篇

在下一篇文章中，我們將學習 查詢性能分析工具，了解如何使用各種工具診斷和優化查詢性能。

閱讀時間：8 分鐘

Django 面試準備 10-3：索引優化技巧