Python爬虫实例爬取网站搞笑段子众所周知，python是写爬虫的利器，今天作者用python写一个小爬虫爬下一个段子网站的众多段子。目标段子网站为“http://ishuo.cn/”，我们先分析其下段子的所在子页的url特点，可以轻易发现发现为“http://ishuo.cn/subject/”+数字，经过测试发现，该网站的反扒机制薄弱，可以轻易地爬遍其所有站点。现在利用python的re及urllib库将其所有段子扒下 import sys import re import urllib #返回html格式 def gethtml(url): page=urllib.urlopen(url) html=page.read() return html def getmessage(html): p=re.compile(r'

(.*)