2025-01-31 約 1500 字預計閱讀 7 分鐘

04-2. 記憶體洩漏問題

記憶體洩漏是 Django 生產環境中常見但難以察覺的問題。本章將深入探討記憶體洩漏的成因、診斷方法與解決方案。

1. 什麼是記憶體洩漏？

定義

記憶體洩漏（Memory Leak） 是指程式分配的記憶體無法被釋放，導致可用記憶體逐漸減少。

# 記憶體洩漏的示意圖
# Worker 啟動時：100MB
# 處理 1000 個請求後：500MB
# 處理 2000 個請求後：900MB
# 處理 3000 個請求後：1.2GB → 被系統 OOM killer 殺死

Python 的記憶體管理

Python 使用 引用計數 + 垃圾回收 來管理記憶體：

import sys

a = [1, 2, 3]
print(sys.getrefcount(a))  # 2（a 本身 + getrefcount 參數）

b = a  # 引用計數 +1
print(sys.getrefcount(a))  # 3

del b  # 引用計數 -1
print(sys.getrefcount(a))  # 2

# 當引用計數降到 0，記憶體會被釋放

但在某些情況下，記憶體無法被正確釋放：

循環引用
全局變量累積
快取未清理
C 擴展庫洩漏

2. 為什麼會發生記憶體洩漏？

原因 1：全局變量累積

# ❌ 錯誤：全局列表不斷累積
REQUEST_LOG = []  # 全局變量

def log_request(request):
    # 每次請求都添加到全局列表
    REQUEST_LOG.append({
        'path': request.path,
        'user': request.user,
        'time': datetime.now(),
    })

    # REQUEST_LOG 永遠不會清空
    # 10,000 個請求後可能占用 100MB+
    return JsonResponse({'status': 'ok'})

原因 2：快取未設置過期時間

# ❌ 錯誤：快取沒有過期時間
from django.core.cache import cache

USER_CACHE = {}  # 內存快取

def get_user_info(request, user_id):
    # 檢查快取
    if user_id in USER_CACHE:
        return JsonResponse(USER_CACHE[user_id])

    # 查詢資料庫
    user = User.objects.get(id=user_id)
    user_info = {
        'name': user.name,
        'email': user.email,
    }

    # 永久儲存到快取
    USER_CACHE[user_id] = user_info  # 永遠不會刪除！

    return JsonResponse(user_info)

# 1,000,000 個用戶 × 1KB = 1GB

原因 3：循環引用

# ❌ 錯誤：循環引用導致記憶體無法釋放
class Node:
    def __init__(self, value):
        self.value = value
        self.parent = None
        self.children = []

    def add_child(self, child):
        child.parent = self  # 循環引用：child → parent → child
        self.children.append(child)

def process_tree(request):
    root = Node('root')

    for i in range(10000):
        child = Node(f'child_{i}')
        root.add_child(child)

    # root 和 children 互相引用
    # 即使函數結束，記憶體也可能無法立即釋放
    return JsonResponse({'status': 'ok'})

原因 4：資料庫連接未關閉

# ❌ 錯誤：手動管理連接但忘記關閉
from django.db import connection

def complex_query(request):
    cursor = connection.cursor()

    cursor.execute("SELECT * FROM large_table")
    results = cursor.fetchall()

    # 忘記關閉 cursor！
    # cursor.close()

    return JsonResponse({'count': len(results)})

# 每次請求都會洩漏一個連接

原因 5：ORM 查詢集未清理

# ❌ 錯誤：QuerySet 快取大量資料
def get_all_users(request):
    # 載入所有用戶到記憶體
    users = User.objects.all()  # 假設有 1,000,000 筆

    # 遍歷所有用戶
    for user in users:
        process_user(user)

    # users 快取了所有資料，占用大量記憶體
    return JsonResponse({'status': 'ok'})

原因 6：第三方庫洩漏

# ❌ 某些 C 擴展庫可能有記憶體洩漏
import numpy as np

def process_images(request):
    for i in range(1000):
        # 某些 NumPy 操作可能不會正確釋放記憶體
        arr = np.random.rand(10000, 10000)
        result = np.dot(arr, arr)

        # 即使離開作用域，記憶體可能未完全釋放
    return JsonResponse({'status': 'ok'})

3. 如何診斷記憶體洩漏？

步驟 1：監控記憶體使用

# 監控 Gunicorn Workers 的記憶體使用
watch -n 1 'ps aux | grep gunicorn'

# 輸出：
# USER   PID  %CPU %MEM    VSZ   RSS
# www   1234  2.3  5.2  250000 210000  # RSS = 實際記憶體使用 (KB)
# www   1235  2.1  8.5  350000 340000  # 這個 Worker 記憶體使用較高

步驟 2：使用 memory_profiler

# 安裝
pip install memory-profiler

# views.py
from memory_profiler import profile

@profile  # 添加裝飾器
def memory_intensive_view(request):
    data = []

    for i in range(100000):
        data.append({
            'id': i,
            'value': 'x' * 1000,
        })

    return JsonResponse({'count': len(data)})

# 運行並查看記憶體使用
python -m memory_profiler manage.py runserver

# 輸出：
# Line #    Mem usage    Increment   Line Contents
# ================================================
#     45     50.2 MiB     50.2 MiB   @profile
#     46                             def memory_intensive_view(request):
#     47     50.2 MiB      0.0 MiB       data = []
#     48    150.5 MiB    100.3 MiB       for i in range(100000):
#     49    150.5 MiB      0.0 MiB           data.append({...})
#     52    150.5 MiB      0.0 MiB       return JsonResponse({'count': len(data)})

步驟 3：使用 tracemalloc（Python 內建）

# middleware.py - 追蹤記憶體使用
import tracemalloc
import logging

logger = logging.getLogger(__name__)

class MemoryTraceMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        # 開始追蹤
        tracemalloc.start()

        response = self.get_response(request)

        # 獲取記憶體使用快照
        current, peak = tracemalloc.get_traced_memory()
        tracemalloc.stop()

        # 記錄記憶體使用較高的請求
        if peak > 100 * 1024 * 1024:  # 超過 100MB
            logger.warning(
                f"High memory usage: {request.path} "
                f"current={current/1024/1024:.1f}MB "
                f"peak={peak/1024/1024:.1f}MB"
            )

        return response

步驟 4：使用 objgraph 分析對象

# 安裝
pip install objgraph

# Python shell
import objgraph

# 顯示最常見的對象類型
objgraph.show_most_common_types(limit=10)

# 輸出：
# dict       150000
# list       100000
# str         80000
# tuple       50000
# ...

# 顯示特定類型對象的數量增長
import gc
gc.collect()

# 處理一些請求...

objgraph.show_growth()

# 輸出：（只顯示增長的對象）
# dict       +5000
# User       +2000
# QuerySet   +1000

步驟 5：分析記憶體快照

# views.py
import tracemalloc
import linecache

def take_memory_snapshot(request):
    """生成記憶體使用快照"""
    snapshot = tracemalloc.take_snapshot()
    top_stats = snapshot.statistics('lineno')

    result = []
    for stat in top_stats[:10]:
        frame = stat.traceback[0]
        result.append({
            'file': frame.filename,
            'line': frame.lineno,
            'size_mb': stat.size / 1024 / 1024,
            'count': stat.count,
            'code': linecache.getline(frame.filename, frame.lineno).strip(),
        })

    return JsonResponse({'top_memory_usage': result})

4. 解決方案

方案 1：避免全局變量累積

# ❌ 錯誤
REQUEST_LOG = []

def log_request(request):
    REQUEST_LOG.append({...})

# ✅ 正確 1：使用 logging 模組
import logging

logger = logging.getLogger(__name__)

def log_request(request):
    logger.info(f"Request: {request.path}")

# ✅ 正確 2：使用 LRU 快取（有大小限制）
from functools import lru_cache

@lru_cache(maxsize=1000)  # 最多快取 1000 個
def get_user_data(user_id):
    return User.objects.get(id=user_id)

方案 2：使用 Redis 快取

# ✅ 正確：使用 Redis 代替內存快取
from django.core.cache import cache

def get_user_info(request, user_id):
    cache_key = f'user_info_{user_id}'

    # 從 Redis 讀取
    user_info = cache.get(cache_key)

    if user_info is None:
        user = User.objects.get(id=user_id)
        user_info = {
            'name': user.name,
            'email': user.email,
        }

        # 儲存到 Redis，TTL 5 分鐘
        cache.set(cache_key, user_info, timeout=300)

    return JsonResponse(user_info)

方案 3：使用 iterator() 處理大量資料

# ❌ 錯誤：載入所有資料到記憶體
def process_all_users(request):
    users = User.objects.all()  # 載入 1,000,000 筆到記憶體

    for user in users:
        process_user(user)

# ✅ 正確：使用 iterator() 逐筆處理
def process_all_users(request):
    # 不會將所有資料載入記憶體
    users = User.objects.all().iterator(chunk_size=1000)

    for user in users:
        process_user(user)

    return JsonResponse({'status': 'ok'})

# 記憶體使用：1,000,000 筆 → 降為 1,000 筆

方案 4：使用 max_requests 自動重啟 Worker

# gunicorn.conf.py
# ✅ Worker 處理一定數量請求後自動重啟

max_requests = 1000  # 處理 1000 個請求後重啟
max_requests_jitter = 50  # 隨機誤差（避免同時重啟）

# 好處：
# 1. 即使有小量記憶體洩漏，也會定期釋放
# 2. 避免長時間運行導致記憶體碎片化
# 3. 平滑重啟，不影響服務

方案 5：手動觸發垃圾回收

# ✅ 在處理大量資料後手動觸發 GC
import gc

def process_large_dataset(request):
    # 處理大量資料
    for batch in range(100):
        data = load_batch(batch)
        process_batch(data)

        # 每處理 10 個批次，手動觸發 GC
        if batch % 10 == 0:
            gc.collect()

    return JsonResponse({'status': 'ok'})

方案 6：使用上下文管理器確保資源釋放

# ✅ 正確：使用 with 確保資源釋放
from django.db import connection

def complex_query(request):
    with connection.cursor() as cursor:
        cursor.execute("SELECT * FROM large_table")
        results = cursor.fetchall()

        # cursor 會自動關閉

    return JsonResponse({'count': len(results)})

5. 最佳實踐

原則 1：設置記憶體限制

# gunicorn.conf.py
# 限制每個 Worker 的記憶體使用

import resource

def on_starting(server):
    """限制 Worker 記憶體"""
    # 限制為 512MB
    resource.setrlimit(
        resource.RLIMIT_AS,
        (512 * 1024 * 1024, 512 * 1024 * 1024)
    )

# 或使用 systemd
# [Service]
# MemoryLimit=512M

原則 2：定期監控記憶體

# monitoring.py
import psutil
import os

def get_memory_usage():
    """獲取當前進程的記憶體使用"""
    process = psutil.Process(os.getpid())
    mem_info = process.memory_info()

    return {
        'rss_mb': mem_info.rss / 1024 / 1024,  # 實際記憶體
        'vms_mb': mem_info.vms / 1024 / 1024,  # 虛擬記憶體
        'percent': process.memory_percent(),
    }

# 在 middleware 中記錄
class MemoryMonitorMiddleware:
    def __call__(self, request):
        mem = get_memory_usage()

        if mem['percent'] > 80:  # 超過 80%
            logger.warning(f"High memory usage: {mem}")

        response = self.get_response(request)
        return response

原則 3：使用弱引用（Weak References）

# ✅ 使用 weakref 避免循環引用
import weakref

class Node:
    def __init__(self, value):
        self.value = value
        self._parent = None  # 使用弱引用
        self.children = []

    @property
    def parent(self):
        return self._parent() if self._parent else None

    @parent.setter
    def parent(self, node):
        self._parent = weakref.ref(node) if node else None

    def add_child(self, child):
        child.parent = self  # 不會造成循環引用
        self.children.append(child)

原則 4：定期重啟 Worker

# gunicorn.conf.py
# 多層保護策略

# 1. 處理請求數量限制
max_requests = 1000
max_requests_jitter = 50

# 2. Worker 存活時間限制
max_worker_lifetime = 3600  # 1 小時

# 3. 記憶體使用限制（需要額外監控腳本）
# monitor_memory.sh:
# while true; do
#   for pid in $(pgrep -f 'gunicorn.*worker'); do
#     mem=$(ps -o rss= -p $pid)
#     if [ $mem -gt 524288 ]; then  # 512MB
#       kill -HUP $pid
#     fi
#   done
#   sleep 60
# done

6. 實戰案例：圖片處理服務記憶體洩漏

問題描述

# ❌ 問題：圖片處理後記憶體未釋放
from PIL import Image
import io

PROCESSED_IMAGES = {}  # 全局快取

def process_image(request):
    image_id = request.GET['id']

    # 檢查快取
    if image_id in PROCESSED_IMAGES:
        return FileResponse(PROCESSED_IMAGES[image_id])

    # 讀取原始圖片
    image = Image.open(f'/uploads/{image_id}.jpg')  # 10MB

    # 生成縮圖
    thumbnail = image.copy()
    thumbnail.thumbnail((200, 200))

    # 儲存到記憶體
    buffer = io.BytesIO()
    thumbnail.save(buffer, format='JPEG')
    buffer.seek(0)

    # 快取（永久！）
    PROCESSED_IMAGES[image_id] = buffer.getvalue()

    return FileResponse(buffer, content_type='image/jpeg')

# 問題：
# - 10,000 張圖片 × 50KB（縮圖）= 500MB
# - image 和 thumbnail 可能未正確釋放
# - 記憶體持續增長

解決方案

# ✅ 方案 1：移除全局快取，使用 Redis
from django.core.cache import cache
from PIL import Image
import io

def process_image(request):
    image_id = request.GET['id']
    cache_key = f'thumbnail_{image_id}'

    # 從 Redis 讀取
    thumbnail_data = cache.get(cache_key)

    if thumbnail_data is None:
        # 讀取並處理圖片
        with Image.open(f'/uploads/{image_id}.jpg') as image:
            # 使用 with 確保釋放
            thumbnail = image.copy()
            thumbnail.thumbnail((200, 200))

            buffer = io.BytesIO()
            thumbnail.save(buffer, format='JPEG')
            thumbnail_data = buffer.getvalue()

        # 儲存到 Redis，TTL 1 小時
        cache.set(cache_key, thumbnail_data, timeout=3600)

    # 返回圖片
    return FileResponse(
        io.BytesIO(thumbnail_data),
        content_type='image/jpeg'
    )

# ✅ 方案 2：預先生成縮圖
from celery import shared_task

@shared_task
def generate_thumbnail(image_id):
    """異步生成縮圖並儲存到硬碟"""
    with Image.open(f'/uploads/{image_id}.jpg') as image:
        thumbnail = image.copy()
        thumbnail.thumbnail((200, 200))
        thumbnail.save(f'/thumbnails/{image_id}.jpg')

def process_image(request):
    image_id = request.GET['id']

    # 直接返回已生成的縮圖
    thumbnail_path = f'/thumbnails/{image_id}.jpg'

    if not os.path.exists(thumbnail_path):
        # 尚未生成，觸發異步任務
        generate_thumbnail.delay(image_id)
        return JsonResponse({'status': 'processing'}, status=202)

    return FileResponse(open(thumbnail_path, 'rb'), content_type='image/jpeg')

面試常見問題

Q1：Python 的垃圾回收機制如何運作？為什麼還會有記憶體洩漏？

答案：

Python 使用 引用計數 + 分代垃圾回收：

引用計數：每個對象維護引用計數，降到 0 時立即釋放
分代 GC：處理循環引用問題

但仍可能洩漏：

全局變量：引用計數永遠不會降到 0
快取未過期：累積大量對象
C 擴展庫：不受 Python GC 管理

# 示例：循環引用
class Node:
    def __init__(self):
        self.ref = None

a = Node()
b = Node()
a.ref = b  # a → b
b.ref = a  # b → a

# 即使 del a, del b，循環引用可能延遲釋放

Q2：如何診斷 Django 應用的記憶體洩漏？

答案：

使用三種工具：

tracemalloc（內建）：追蹤記憶體分配

import tracemalloc
tracemalloc.start()
# ... 運行程式 ...
snapshot = tracemalloc.take_snapshot()
top = snapshot.statistics('lineno')

memory_profiler：逐行分析記憶體使用

pip install memory-profiler
@profile
def my_function():
    ...

objgraph：分析對象增長
```
import objgraph
objgraph.show_growth()
```

Q3：Gunicorn 的 max_requests 參數有什麼作用？

答案：

max_requests 讓 Worker 處理一定數量請求後自動重啟：

# gunicorn.conf.py
max_requests = 1000  # 處理 1000 個請求後重啟
max_requests_jitter = 50  # 加入隨機誤差

好處：

即使有小量記憶體洩漏，也會定期釋放
避免記憶體碎片化
平滑重啟，不影響服務（其他 Worker 繼續處理）

注意： 這是治標不治本，應該找出洩漏根源。

Q4：如何避免 Django ORM 造成的記憶體洩漏？

答案：

使用 .iterator() 處理大量資料：

# ❌ 錯誤：載入所有資料到記憶體
users = User.objects.all()  # 1,000,000 筆

# ✅ 正確：逐批處理
users = User.objects.all().iterator(chunk_size=1000)

for user in users:
    process(user)

其他技巧：

使用 .only() 和 .defer() 只載入需要的欄位
使用 .values() 或 .values_list() 返回字典而非模型實例
使用分頁而非一次性載入

小結

記憶體洩漏問題的處理原則：

監控記憶體使用：定期檢查 Worker 的記憶體占用
避免全局累積：不要使用全局變量作為快取
使用外部快取：Redis 代替內存快取
設置過期時間：所有快取都要有 TTL
逐批處理資料：使用 .iterator() 避免載入大量資料
定期重啟 Worker：使用 max_requests 作為保險機制
使用診斷工具：tracemalloc、memory_profiler、objgraph

記住：記憶體洩漏是累積的，小洩漏長時間後也會變成大問題！

Django 面試準備 04-2：記憶體洩漏問題

04-2. 記憶體洩漏問題

1. 什麼是記憶體洩漏？

定義

Python 的記憶體管理

2. 為什麼會發生記憶體洩漏？

原因 1：全局變量累積

原因 2：快取未設置過期時間

原因 3：循環引用

原因 4：資料庫連接未關閉

原因 5：ORM 查詢集未清理

原因 6：第三方庫洩漏

3. 如何診斷記憶體洩漏？

步驟 1：監控記憶體使用

步驟 2：使用 memory_profiler

步驟 3：使用 tracemalloc（Python 內建）

步驟 4：使用 objgraph 分析對象

步驟 5：分析記憶體快照

4. 解決方案

方案 1：避免全局變量累積

方案 2：使用 Redis 快取

方案 3：使用 iterator() 處理大量資料

方案 4：使用 max_requests 自動重啟 Worker

方案 5：手動觸發垃圾回收

方案 6：使用上下文管理器確保資源釋放

5. 最佳實踐

原則 1：設置記憶體限制

原則 2：定期監控記憶體

原則 3：使用弱引用（Weak References）

原則 4：定期重啟 Worker

6. 實戰案例：圖片處理服務記憶體洩漏

問題描述

解決方案

面試常見問題

Q1：Python 的垃圾回收機制如何運作？為什麼還會有記憶體洩漏？

Q2：如何診斷 Django 應用的記憶體洩漏？

Q3：Gunicorn 的 max_requests 參數有什麼作用？

Q4：如何避免 Django ORM 造成的記憶體洩漏？

小結