Python利用Beautiful Soup模块搜索内容详解

Example of div tag with class identical

"""
combine_soup = BeautifulSoup(combine_html,'lxml')
identical_div = combine_soup.find("div",class_="identical")
print identical_div
使用 find_all() 方法搜索
使用 find() 方法会从搜索结果中返回第一个匹配的内容，而 find_all() 方法则会返回所有匹配的项。
在 find() 方法中用到的过滤项，同样可以用在 find_all() 方法中。事实上，它们可以用到任何搜索方法中，例如：find_parents() 和 find_siblings() 中。
# 搜索所有 class 属性等于 tertiaryconsumerlist 的标签。
all_tertiaryconsumers = soup.find_all(class_='tertiaryconsumerlist')
print type(all_tertiaryconsumers)
for tertiaryconsumers in all_tertiaryconsumers:
print tertiaryconsumers.div.string
find_all() 方法为：
find_all(name,attrs,recursive,text,limit,**kwargs)
它的参数和 find() 方法有些类似，多个了 limit 参数。limit 参数是用来限制结果数量的。而 find() 方法的 limit 就是 1 了。
同时，我们也能传递一个字符串列表的参数来搜索标签、标签属性值、自定义属性值和 CSS 类。
# 搜索所有的 div 和 li 标签
div_li_tags = soup.find_all(["div","li"])
print div_li_tags
print
# 搜索所有类属性是 producerlist 和 primaryconsumerlist 的标签
all_css_class = soup.find_all(class_=["producerlist","primaryconsumerlist"])
print all_css_class
print
搜索相关标签
一般情况下，我们可以使用 find() 和 find_all() 方法来搜索指定的标签，同时也能搜索其他与这些标签相关的感兴趣的标签。
搜索父标签
可以使用 find_parent() 或者 find_parents() 方法来搜索标签的父标签。
find_parent() 方法将返回第一个匹配的内容，而 find_parents() 将返回所有匹配的内容，这一点与 find() 和 find_all() 方法类似。
# 搜索父标签
primaryconsumers = soup.find_all(class_='primaryconsumerlist')
print len(primaryconsumers)
# 取父标签的第一个
primaryconsumer = primaryconsumers[0]
# 搜索所有 ul 的父标签
parent_ul = primaryconsumer.find_parents('ul')
print len(parent_ul)
# 结果将包含父标签的所有内容
print parent_ul
print
# 搜索,取第一个出现的父标签.有两种操作
immediateprimary_consumer_parent = primaryconsumer.find_parent()
# immediateprimary_consumer_parent = primaryconsumer.find_parent('ul')
print immediateprimary_consumer_parent
搜索同级标签
Beautiful Soup 还提供了搜索同级标签的功能。
使用函数 find_next_siblings() 函数能够搜索同一级的下一个所有标签，而 find_next_sibling() 函数能够搜索同一级的下一个标签。
producers = soup.find(id='producers')
next_siblings = producers.find_next_siblings()
print next_siblings
同样，也可以使用 find_previous_siblings() 和 find_previous_sibling() 方法来搜索上一个同级的标签。
搜索下一个标签
使用 find_next() 方法将搜索下一个标签中第一个出现的，而 find_next_all() 将会返回所有下级的标签项。
# 搜索下一级标签
first_div = soup.div
all_li_tags = first_div.find_all_next("li")
print all_li_tags
搜索上一个标签
与搜索下一个标签类似，使用 find_previous() 和 find_all_previous() 方法来搜索上一个标签。
总结
以上就是这篇文章的全部内容了，希望本文的内容对大家学习或者使用python能带来一定的帮助，如果有疑问大家可以留言交流，谢谢大家对中文源码网的支持。

上一篇：深入理解Python3中的http.client模块

下一篇：详解python调度框架APScheduler使用

点击排行

您现在的位置：首页 > 技术文档 > Python库/模块大全

相关内容