点击排行

您现在的位置：首页 > 技术文档 > Python网络爬虫

Python爬虫实例爬取网站搞笑段子

来源：中文源码网浏览：294 次日期：2024-04-18 23:51:57

Python爬虫实例爬取网站搞笑段子
众所周知，python是写爬虫的利器，今天作者用python写一个小爬虫爬下一个段子网站的众多段子。
目标段子网站为“http://ishuo.cn/”，我们先分析其下段子的所在子页的url特点，可以轻易发现发现为“http://ishuo.cn/subject/”+数字，
经过测试发现，该网站的反扒机制薄弱，可以轻易地爬遍其所有站点。
现在利用python的re及urllib库将其所有段子扒下
import sys
import re
import urllib
#返回html格式
def gethtml(url):
page=urllib.urlopen(url)
html=page.read()
return html
def getmessage(html):
p=re.compile(r'

(.*)

点击排行

您现在的位置：首页 > 技术文档 > Python网络爬虫

Python爬虫实例爬取网站搞笑段子

相关内容