python爬虫之爬取谷歌趋势数据

python爬虫之爬取谷歌趋势数据

2023年7月20日发(作者:)

python爬⾍之爬取⾕歌趋势数据⼀、前⾔

爬取⾕歌趋势数据需要科学上⽹~⼆、思路⾕歌数据的爬取很简单,就是代码有点长。主要分下⾯⼏个就⾏了爬取的三个界⾯返回的都是json数据。主要获取对应的token值和req,然后构造url请求数据就⾏token值和req值都在这个链接的返回数据⾥。解析后得到token和req就⾏socks5代理不太懂,抄⽹上的作业,假如了当前程序的全局代理后就可以跑了。全部代码如下import socketimport socksimport requestsimport jsonimport pandas as pdimport logging#加⼊socks5代理后,可以获得当前程序的全局代理_default_proxy(5,"127.0.0.1",1080) = cket#加⼊以下代码,否则会出现InsecureRequestWarning警告,虽然不影响使⽤,但看着糟⼼# 捕捉警告eWarnings(True)# 或者加⼊以下代码,忽略requests证书警告# from ions import InsecureRequestWarning# e_warnings(InsecureRequestWarning)# 将三个页⾯获得的数据存为DataFrametime_trends = ame()related_topic = ame()related_search = ame()#填⼊⾃⼰打开⽹页的请求头headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36', 'x-client-data': 'CJa2yQEIorbJAQjEtskBCKmdygEI+MfKAQjM3soBCLKaywEI45zLAQioncsBGOGaywE=Decoded:message ClientVariations {// Active client experiment variation ed int32 variation_id = [3300118, 3300130, 3300164, 3313321, 331877 'referer': '/trends/explore', 'cookie': '__utmc=10102256; __utmz==(direct)|utmccn=(direct)|utmcmd=(none); __utma=10102256.889828344.1617948191.1617948191.1617956555.3; __utmt=1; __utmb=10102256.5.9.32; SID=8AfEx31goq255}# 获取需要的三个界⾯的req值和token值def get_token_req(keyword): url = '/trends/api/explore?hl=zh-CN&tz=-480&req={{"comparisonItem":[{{"keyword":"{}","geo":"US","time":"today 12-m"}}],"category":0,"property":""}}&tz=-480'.format( keyword) html = (url, headers=headers, verify=False).text data = (html[5:]) req_1 = data['widgets'][0]['request'] token_1 = data['widgets'][0]['token'] req_2 = data['widgets'][2]['request'] token_2 = data['widgets'][2]['token'] req_3 = data['widgets'][3]['request'] token_3 = data['widgets'][3]['token'] result = {'req_1': req_1, 'token_1': token_1, 'req_2': req_2, 'token_2': token_2, 'req_3': req_3, 'token_3': token_3} return result# 请求三个界⾯的数据,返回的是json数据,所以数据不⽤解析,完美def get_info(keyword): content = [] keyword = keyword result = get_token_req(keyword) #第⼀个界⾯ req_1 = result['req_1'] token_1 = result['token_1'] url_1 = "/trends/api/widgetdata/multiline?hl=zh-CN&tz=-480&req={}&token={}&tz=-480".format( req_1, token_1) r_1 = (url_1, headers=headers, verify=False) if r__code == 200: try: content_1 = r_t content_1 = (content_('unicode_escape')[6:])['default']['timelineData'] result_1 = _normalize(content_1) result_1['value'] = result_1['value'].map(lambda x: x[0]) result_1['keyword'] = keyword except Exception as e: print(e) result_1 = None else: print(r__code) #第⼆个界⾯ req_2 = result['req_2'] token_2 = result['token_2'] url_2 = '/trends/api/widgetdata/relatedsearches?hl=zh-CN&tz=-480&req={}&token={}'.format( req_2, token_2) r_2 = (url_2, headers=headers, verify=False) if r__code == 200: try: content_2 = r_t content_2 = (content_('unicode_escape')[6:])['default']['rankedList'][1]['rankedKeyword'] result_2 = _normalize(content_2) result_2['link'] = "" + result_2['link'] result_2['keyword'] = keyword except Exception as e: print(e) result_2 = None else: print(r__code) #第三个界⾯ req_3 = result['req_3'] token_3 = result['token_3'] url_3 = '/trends/api/widgetdata/relatedsearches?hl=zh-CN&tz=-480&req={}&token={}'.format( req_3, token_3) r_3 = (url_3, headers=headers, verify=False) if r__code == 200: try: content_3 = r_t content_3 = (content_('unicode_escape')[6:])['default']['rankedList'][1]['rankedKeyword'] result_3 = _normalize(content_3) result_3['link'] = "" + result_3['link'] result_3['keyword'] = keyword except Exception as e: print(e) result_3 = None else: print(r__code) content = [result_1, result_2, result_3] return contentdef main(): global time_trends,related_search,related_topic with open(r'C:','r',encoding = 'utf-8') as f: words = nes() for keyword in words: keyword = () data_all = get_info(keyword) time_trends = ([time_trends,data_all[0]],sort = False) related_topic = ([related_topic,data_all[1]],sort = False) related_search = ([related_search,data_all[2]],sort = False)if __name__ == "__main__": main()到此这篇关于python爬⾍之爬取⾕歌趋势数据的⽂章就介绍到这了,更多相关python爬取⾕歌趋势内容请搜索以前的⽂章或继续浏览下⾯的相关⽂章希望⼤家以后多多⽀持!

发布者:admin,转转请注明出处:http://www.yc00.com/news/1689815711a288416.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信