VBA分别使用MSXML的DOM属性和XPATH进行网页爬虫

VBA分别使用MSXML的DOM属性和XPATH进行网页爬虫

2023年6月29日发(作者:)

VBA分别使⽤MSXML的DOM属性和XPATH进⾏⽹页爬⾍第⼀种⽅法——DOM经典属性:Sub Main()rl = "/public-holidays-by-date/"Set oHttp = CreateObject("P") '创建⼀个xmlhttp对象Set odom = CreateObject("htmlfile") '创建⼀个Dom对象With oHttp'open,创建⼀个新的http请求,并指定此请求的⽅法、URL以及验证信息(⽤户名/密码) 'send,发送请求到http服务器并接收回应 .Open "GET", Url, False '使⽤Open⽅法,⽤get请求,False代表⾮异步加载 .Open "GET", Url, False '使⽤Open⽅法,⽤get请求,False代表⾮异步加载 .send '将open⽅法的信息发送给⽹页服务器 TML = .responseText '将响应⽹页的HTML赋值给Dom对象,并只需要body标签⾥⾯的内容End Withdom (odom)End SubSub dom(odom As Object)i = 2For Each Item In If ame = "list-item" Then For Each itemch In en If ame = "list-item-heading" Then Range("a" & i) = ext ElseIf ame = "list-subitem" Then Range("b" & i) = en(1).innerText Range("c" & i) = en(3).innerText i = i + 1 End If Next Exit For End IfNextEnd Sub第⼆种⽅法——转换为XML并使⽤XPATH(⽐较⿇烦):Sub Main()Url = "/public-holidays-by-date/"Set oHttp = CreateObject("P") '创建⼀个xmlhttp对象Set odom = CreateObject("htmlfile") '创建⼀个Dom对象With oHttp'open,创建⼀个新的http请求,并指定此请求的⽅法、URL以及验证信息(⽤户名/密码) 'send,发送请求到http服务器并接收回应 .Open "GET", Url, False '使⽤Open⽅法,⽤get请求,False代表⾮异步加载 .Open "GET", Url, False '使⽤Open⽅法,⽤get请求,False代表⾮异步加载 .send '将open⽅法的信息发送给⽹页服务器 TML = .responseText '将响应⽹页的HTML赋值给Dom对象,并只需要body标签⾥⾯的内容End With'需要先将html⽂本进⾏格式化才能写⼊xmldoc,才能使⽤⾃带的xpath,⽐如节点⼀定要有开始和结束,节点属性⼀定要⽤双引号括起来'例如'sXML = ""'sXML = sXML & " true"'sXML = sXML & " APCD03"'sXML = sXML & " OIS"'sXML = sXML & " "'sXML = sXML & " "'sXML = sXML & " false"'sXML = sXML & " APCD04"'sXML = sXML & " OIS"'sXML = sXML & " "' sXMLDim sXML As String, xDoc, a, nodelist, nodeFor Each Item In If ame = "list-item" Then sXML = TML Exit For End IfNextsXML = rr(sXML, "", "")sXML = rr(sXML, "class=.*?>", ">")Set xDoc = CreateObject("ument")a = L(sXML)'a为true时代表写⼊成功,为false代表写⼊失败' a'⼀旦a为false就可以先写⼊txt再看哪些还不符合xml规范'file = & ""'Open file For Output As #1'Print #1, sXML'Close #1Set nodelist = Nodes("//P")Set node = SingleNode("//P")' For Each Item In xtEnd SubFunction rr(str As String, pattern As String, repstr As String)Set reg = CreateObject("")With = n = patternEnd Withrr = e(str, repstr)End Function

发布者:admin,转转请注明出处:http://www.yc00.com/web/1687985719a63939.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信