python3.7使用requests爬取163邮件列表与邮件内容详情

python3.7使用requests爬取163邮件列表与邮件内容详情

2023年6月27日发(作者:)

python3.7使⽤requests爬取163邮件列表与邮件内容详情最近想使⽤⽹易邮箱批量注册⼏个⽹站⼩号,注册时需要获取邮箱收到的注册码,⼀个⼀个的点开邮件太⿇烦,于是就产⽣了写这个脚本的想法,中间也遇到⼀些问题,会在⽂章中详细的解说。本⽂将使⽤

requests 库来实现需求,当然也可以使⽤

urllib、imaplib、selenium:urllib 实现过程跟requests相近;imaplib 应⽤于开启了

IMAP/SMTP服务 的场景;selenium多应⽤于⾃动化测试中,结合浏览器使⽤的;接下来要实现我的⼩⽬标,⾸先先分解下功能:1.实现163邮箱⾃动登录2.获取邮件列表3.找到收到验证码的邮件(由于某些原因,本⽂忽略此步)4.抓取邮件详细内容先提前说⼀下,实现的过程遇到的⼀些问题:SSL校验问题抓取邮件详细内容时,⼀直提⽰登录超时以上问题都已解决,接下来细说⼀下实现的过程⼀、实现163邮箱⾃动登录思路:使⽤账号密码,获取 sid 和 cookie登录部分没有什么难点,主要是url找对,就没问题了,代码如下:import requestsimport reimport urllib3#忽略证书警告e_warnings(reRequestWarning)class MAIL163: def __init__(self, username, password): n = n() me = username rd = password = None def login(self): loginUrl = "/entry/cgi/ntesdoor?style=-1&df=mail163_letter&net=&language=-1&from=web&race=&iframe=1&product=mail163&funcid=loginone&pas headers = { 'Referer': "/", 'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.2 Safari/605.1.15" } postData = { 'savelogin': "0", 'url2': "/errorpage/", 'username': me, 'password': rd } response = (loginUrl, headers=headers, data=postData, verify=False) #提取sid,获取邮件信息需要使⽤它 pattern = e(r'sid=(.*?)&', re.S) = (pattern, ).group(1)⼆、获取邮件列表思路:利⽤登录获取的 sid 和 cookie ,请求数据获取列表部分,写的时候遇到了点问题,主要是获取mail的各种信息,当时⽤ charles抓包时,返回的数据是json格式的,但是⽤requests请求,得到响应结果,进⾏解析json时,提⽰格式错误:Expecting property name enclosed in double quotes: line 1 column 2(char 1)后来找到了解决办法:使⽤demjson的包来处理,参照⽂章,但是最后我还是选择了使⽤正则匹配(没有原因),代码如下: def messageList(self): listUrl = '/js6/s?sid=%s&func=mbox:listMessages' %

Headers = { 'Accept': "text/javascript", 'Accept-Language': "zh-CN,zh;q=0.9", 'Connection': "keep-alive", 'Host': "", 'Referer': "/js6/?sid=%s&df=mail163_letter" % , 'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.2 Safari/605.1.15" } response = (listUrl, headers=Headers, verify=False) pattern = e( "id..'(.*?)',.*?from..'(.*?)',.*?to..'(.*?)',.*?subject..'(.*?)',.*?sentDate..(.*?),n.*?receivedDate..(.*?),.*?hmid..(.*?),n", re.S) mails = l(pattern, ) for mail in mails: mid = mail[0] print('-' * 45) print('mid:', mid) print('发件⼈:', mail[1], '主题:', mail[3], '发送时间:', mail[4]) print('收件⼈:', mail[2], u'接收时间:', mail[5])三、抓取邮件详细内容思路:利⽤登录获取的 sid 和 cookie、邮件列表获取的 mid ,来请求数据这部分是卡住我的关键点,headers改来改去的,始终获取不到邮件内容,查阅了⼀些⽂章,说是sid失效的原因,但是我⽤charles抓包,sid从没有失效过,我猜想应该是哪⾥少了参数,于是仔细对⽐charles请求的headers、cookie、表单等⼀切信息,发现需要在原 cookie中新增sid内容,对应的字段名为‘’,具体代码如下: def message(self, mid): Headers = { 'Accept': "text/javascript", 'Accept-Language': "zh-CN,zh;q=0.9", 'Connection': "keep-alive", 'Host': "", 'Referer': "/js6/?sid=%s&df=mail163_letter" % , 'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.2 Safari/605.1.15" }

#cookie加上这个,才能获取邮件详情 cookie = { '': , } url = '/js6/read/?mid=%s&userType=ud&font=15&color=064977' % mid _dict_to_cookiejar(s, cookie) response = (url, headers=Headers, verify=False) print('邮件详情 =====>') print()四、最后⼀步:将所有功能串联起来email_整体的代码如下:#!/usr/bin/env# -*- coding:utf-8 -*-import requestsimport reimport jsonimport urllib3# 忽略证书警告e_warnings(reRequestWarning)class MAIL163: def __init__(self, username, password): n = n() me = username rd = password = None def login(self): loginUrl = "/entry/cgi/ntesdoor?style=-1&df=mail163_letter&net=&language=-1&from=web&race=&iframe=1&product=mail163&funcid=loginone&pas headers = { 'Referer': "/", 'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.2 Safari/605.1.15" } postData = { 'savelogin': "0", 'url2': "/errorpage/", 'username': me, 'password': rd } response = (loginUrl, headers=headers, data=postData, verify=False) #提取sid,获取邮件信息需要使⽤它 pattern = e(r'sid=(.*?)&', re.S) = (pattern, ).group(1) # 通过sid码获得邮箱收件箱信息 def messageList(self): listUrl = '/js6/s?sid=%s&func=mbox:listMessages' % # 新的请求头 Headers = { 'Accept': "text/javascript", 'Accept-Language': "zh-CN,zh;q=0.9", 'Connection': "keep-alive", 'Host': "", 'Referer': "/js6/?sid=%s&df=mail163_letter" % , 'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.2 Safari/605.1.15" } response = (listUrl, headers=Headers, verify=False) pattern = e( "id..'(.*?)',.*?from..'(.*?)',.*?to..'(.*?)',.*?subject..'(.*?)',.*?sentDate..(.*?),n.*?receivedDate..(.*?),.*?hmid..(.*?),n", re.S) mails = l(pattern, ) for mail in mails: mid = mail[0] print('-' * 45) print('id:', mid) print('发件⼈:', mail[1], '主题:', mail[3], '发送时间:', mail[4]) print('收件⼈:', mail[2], u'接收时间:', mail[5]) e(mid) def message(self, mid): Headers = { 'Accept': "text/javascript", 'Accept-Language': "zh-CN,zh;q=0.9", 'Connection': "keep-alive", 'Connection': "keep-alive", 'Host': "", 'Referer': "/js6/?sid=%s&df=mail163_letter" % , 'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.2 Safari/605.1.15" } #cookie加上这个,才能获取邮件详情 cookie = { '': , } url = '/js6/read/?mid=%s&userType=ud&font=15&color=064977' % mid _dict_to_cookiejar(s, cookie) response = (url, headers=Headers, verify=False) print('邮件详情 =====>') print()if __name__ == "__main__": mail = MAIL163('*****@', '****') () eList()参考⽂章:详细总结)python爬取 163收件箱邮件内容,收件箱列表的⼏种⽅法(urllib, requests, selenium)[Python解析json之ValueError: Expecting property name enclosed in double quotes: line 1 column 2(char 1)]/WYL-BruceLong/parse_attach

发布者:admin,转转请注明出处:http://www.yc00.com/web/1687866625a52137.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信