2023年7月20日发(作者:)
爬⾍:Requests⾼级⽤法⽬录了解了 requests 基本⽤法 ,如基本的 GET, POST 请求以及 Response 对象,我们再了解下 requests 些⾼级⽤法,如⽂件上传、Cookies 设置 代理设置等1、⽂件上传requests 可以模拟提交⼀些数据, 假如有的⽹站需要上传⽂件,我们也可以⽤它来实现, 这⾮常简单import requestsfiles = { "files": open("","rb")}response = ("192.168.1.104/post",files=files)print()结果:{ "args": {},
"data": "",
"files": { },
"form": {},
"headers": { "Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
"Content-Length": "4434",
"Content-Type": "multipart/form-data; boundary=cc063f258f2e82b60255f44bada47474",
"Host": "192.168.1.104",
"User-Agent": "python-requests/2.25.1" },
"json": null,
"origin": "192.168.1.101",
"url": "192.168.1.104/post"} "files": "data:application/octet-stream;base64,AAABAAEAICAAAAEAIACoEAAAFgAAACgAAAAgAAAAQAAAAAEAIAAAAAAAABAAABMLAAATCwAAAAAAAAAAA这个⽹站会返回响应,⾥⾯包含 files 这个字段,⽽ form 字段是空的,这证明⽂件上传部分会单独有 files 字段来标识2、Cookiescookies的遍历解析我们使⽤ urllib 处理过 Cookies ,写法⽐较复杂,⽽有了 requests ,获取和设置 Cookies 只需 ⼀步即可完成import requestsresponse = ("")print(s)for k ,v in (): print(k + "=" + v)结果:
import requestsresponse = ("/",verify=False )print(_code)结果:200当我们verify=False,还是会报⼀个警告,建议给我们指定证书:可通过⼀下做法⼀、忽略警告import requestsfrom es import e_warnings()response = ("/",verify=False )print(_code)⼆、捕获警告到⽇志的⽅式或略告警import requestsimport eWarnings(True)response = ("/",verify=False )print(_code)三、指定对应的客户端证书可以是单个⽂件(包含密钥和证书)或 ⼀个包含两个⽂件路径的元组import requestsresponse = ("/",cert=("/path/","/paht/key"))print(_code)5、代理的设置对于某些⽹站,在测试的时候请求⼏次, 能正常获取内容, 但是⼀旦开始⼤规模爬取,对于⼤规 模且频繁的请求,⽹站可能会弹出验证码,或者跳转到登录认证页⾯,更甚者可能会直接封禁客户端 ,导致⼀定时间段内⽆法访问为了防⽌这种情况我们就需要代理来解决这个问题,使⽤proxies参数import requestsproxess = { "http":"192.168.1.104:8989", "https":"192.168.1.104:8990"}response = ("",proxes=proxess)注:⾃⼰本地的代理,可能⽆效HTTP Basic Auth处理import requestsproxess = { "http":"user:passwd@192.168.1.104:8989"}response = ("",proxes=proxess)使⽤SOCKS协议处理安装:pip install requests[socks]import requestsproxess = { "http":"socks5://user:passwd@192.168.1.104:8989", "https":"socks5://user:passwd@192.168.1.104:8989"}response = ("",proxes=proxess)6、超时设置为了防⽌服务器不能即使响应,我们会增加⼀个超时时间,即超过这个时间还没有得到响应,就会报错。需要⽤到timeot参数import requestsresponse = ("",timeout=1)print(_code)结果:200这⾥我们将超过时间设置为1秒,超过⼀秒没有响应,就会抛出异常;这⾥时间包含两个阶段,即链接(connect)和 读取(read)也可以分开指定两个时间,传⼊⼀个元组import requestsresponse = ("",timeout=(5,11))print(_code)结果:200 不想指定超时的话,可以不加这个参数,或者timeout=None7、⾝份认证有些⽹页我们可能会遇到这样的认证页⾯ 我们可以使⽤request⾃带的⾝份认证功能import requestsfrom import HTTPBasicAuthresponse = ("localhost:8080",auth=HTTPBasicAuth("username","passwd"))print(_code)还可以直接给auth传递密码:import requestsresponse = ("localhost:8080",auth=("username","passwd"))print(_code)OAuth认证需要安装oauth包pip install requests_oauthlib使⽤OAuth认证的⽅法代码import requestsfrom requests_oauthlib import OAuth1url = "lovslhost:8080/1.1/account/verify_"auth = OAuth1("YOUR_APP_KEY","YOUR_APP_SECRET", "USER_OAUTH_TOKEN","USER_OAUTH_TOKEN_SECRET")(url=url,auth=auth)8、Prepared Request前⾯介绍urllib时,我们可以将请求表⽰为数据结构,其中各个参数都可以通过⼀个 Request对象来表⽰, 这在requests⾥同样可以做到,这个数据结构就叫 Prepared Request;from requests import Request,Sessionurl = "192.168.1.104/post"data = { "name": "germey"}headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36"}s = Session()req = Request("POST",url=url,headers=headers,data=data)prep = e_request(req)r =(prep)print()结果:{ "args": {},
"data": "",
"files": {},
"form": { "name": "germey" },
"headers": { "Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
"Content-Length": "11",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "192.168.1.104",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36" },
"json": null,
"origin": "192.168.1.101",
"url": "192.168.1.104/post"}引⼊Request,然后⽤url,data和headers参数构造了⼀个Request对象,这时需要再调⽤Session和prepare_request() ⽅法将其转化为⼀个Prepared Reuqest对象,然后调⽤ send() ⽅法发送即可有了 Request 这个对象,就可以将请求当作独⽴的对象来看待,这样在进⾏队列调度时会⾮常⽅便
发布者:admin,转转请注明出处:http://www.yc00.com/xiaochengxu/1689816075a288437.html
评论列表(0条)