第七堂課：網頁爬蟲入門 | 程式學習

📖 第一階段：HTTP 基礎（約 20 分鐘）

1. 瀏覽器背後發生了什麼？

⏱ 10 分鐘

Python 也能扮演瀏覽器，這就是爬蟲的基礎：

# pip install requests
import requests
response = requests.get("https://jiu-tao.com")
print(response.status_code)  # 200 = 成功

2. requests 庫完整用法

⏱ 15 分鐘

# GET 請求
r = requests.get("https://api.example.com/data")
print(r.json())

# 帶參數
r = requests.get("https://api.example.com/search",
    params={"q": "Python", "page": 1})

# POST 請求
r = requests.post("https://api.example.com/login",
    json={"username": "ding"})

# 錯誤處理
try:
    r = requests.get("https://bad-site.com", timeout=5)
    r.raise_for_status()
except requests.exceptions.Timeout:
    print("請求超時")

📖 第二階段：HTML 解析（約 20 分鐘）

3. BeautifulSoup：HTML 解剖刀

⏱ 15 分鐘

# pip install beautifulsoup4
from bs4 import BeautifulSoup

soup = BeautifulSoup(r.text, "html.parser")
title = soup.find("title").text
all_links = soup.find_all("a")
for link in all_links[:5]:
    print(link.get("href"), link.text)

# CSS 選擇器
articles = soup.select("article h2")
prices = soup.select(".price")

📖 第三階段：實戰（約 20 分鐘）

4. 專案：即時天氣查詢

⏱ 15 分鐘

import requests

def get_weather(city="臺北"):
    url = "https://opendata.cwa.gov.tw/api/v1/rest/datastore/F-C0032-001"
    params = {"Authorization": "YOUR_API_KEY", "locationName": city}
    try:
        r = requests.get(url, params=params, timeout=10)
        r.raise_for_status()
        data = r.json()
        loc = data["records"]["location"][0]
        print(f"🌤️ {loc['locationName']} 天氣：")
        for wx in loc["weatherElement"]:
            if wx["elementName"] == "Wx":
                print(wx["time"][0]["parameter"]["parameterName"])
    except Exception as e:
        print(f"查詢失敗: {e}")

🌐 爬蟲禮儀：尊重 robots.txt、請求間加延遲、不要狂送請求。

🧪 小測驗：你學會了嗎？

5 題選擇題，選完按「交卷」看成績

第 1 / 5 題

requests.get() status_code 200 代表？

第 2 / 5 題

BeautifulSoup 的主要功能？

第 3 / 5 題

soup.find_all("a") 回傳什麼？

第 4 / 5 題

為什麼設定 User-Agent header？

第 5 / 5 題

r.raise_for_status() 作用？

你的得分

0/5

🎯 第7堂課，你學會了什麼？

HTTP 請求→ requests→ BeautifulSoup→ 天氣查詢

你現在可以把整個網路當成你的資料庫了！

→ 前往第8堂課

🕸️ 第7堂課：網頁爬蟲入門

📖 第一階段：HTTP 基礎（約 20 分鐘）

1. 瀏覽器背後發生了什麼？

2. requests 庫完整用法

📖 第二階段：HTML 解析（約 20 分鐘）

3. BeautifulSoup：HTML 解剖刀

📖 第三階段：實戰（約 20 分鐘）

4. 專案：即時天氣查詢

🧪 小測驗：你學會了嗎？

🎯 第7堂課，你學會了什麼？