不知道爬啥,随便找个网页吧~
url:http://www.netbian.com/index.htm
首选获取目标网址HTML页面
- F12提取请求头信息,这里我们只需UA即可 根据网页
- meta标签设置编码格式
代码如下:
import requests
from lxml import etree
def get_image():
base_url = "http://www.netbian.com/index.htm"
headers = {
'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Mobile Safari/537.36'
}
#获取响应数据
response = requests.get(base_url,headers=headers)
response_data = response.content.decode('gbk')
# response_code = response.status_code
# print(response_code)
#保存数据
with open('wall.html','w',encoding='gbk')as f:
f.write(response_data)
get_image()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
本地打开验证:
是没有问题的。
不罗嗦了,直接上完整代码:
import requests
from lxml import etree
def get_image():
try:
headers = {
'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Mobile Safari/537.36'
}
#保存文件路径
path = "C://Users/Administrator/Desktop/image/"
#获取响应数据
response = requests.get(url,headers=headers)
response_data = response.content.decode('gbk')
#判断是否有响应
# response_code = response.status_code
# print(response_code)
#保存数据
#with open('wall.html','w',encoding='gbk')as f:
# f.write(response_data)
#数据解析
#1.将数据解析为HTML
parse_data = etree.HTML(response_data)
#2.将需要的内容以字段的形式赋值给item
item_list = parse_data.xpath('//div/ul/li/a/img/@src')
#用for循环遍历整个列表并保存
for item in item_list:
final_data = requests.get(item,headers=headers).content
with open(path + item[-7:],'wb')as f:
f.write(final_data)
#print(item)
except:
print('error')
def get_page():
#取前10页
urls = ["http://www.netbian.com/index_{}.htm".format(str(i)) for i in range(1,11)]
#输出验证
#print(urls)
return urls
if __name__ == '__main__':#主函数
get_page()
for url in get_page():
get_image()
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
运行结果:
简单总结为几个流程:
1.获取目标网址,填充请求头。
2.用urllib或requests保存数据。
3.用,正则,beautifulsoup,xpath解析数据。
4.保存数据。