srapy_selector_css基础语法|江阴雨辰互联

2023年6月29日发(作者：)

srapy_selector_css基础语法CSS选择器表达式*EE1,E2E1 E2E1>E2E1+#ID[ATTR][ATTR=VALUE][ATTR~=VALUE]E:nth-child(n)E:nth-last-child(n)E:first-childE:last-childE:emtyE::text描述选中所有元素选中E元素选中E1和E2元素选中E1后代元素中的E2元素选中E1⼦元素中的E2元素选中E1兄弟元素中的E2元素选中class属性包含class的元素选中id属性为ID的元素选中包含ATTR属性的元素选中包含ATTR属性且值为 VALUE的元素选中包含ATTR属性且值包含VALUE的元素选中E元素，且该元素必须是⽗元素的第n个⼦元素选中E元素，且该元素必须是⽗元素的**（倒数）**第n个⼦元素选中E元素，且该元素必须是⽗元素的第⼀个⼦元素选中E元素，且该元素必须是⽗元素的**（倒数）**第⼀个⼦元素选中没有⼦元素的E元素选中E元素的⽂本节点（Text Node）例⼦*pdiv,prediv pdiv>pp+#main[href][method=post][class~=clearfix]a:nth-child(1)a:nth-last-child(2)a:first-childa:last-childdiv:emptyp::text#

《精通 scrapy

⽹络爬⾍》第3章

第4节（即3.4）CSS

实例from import HtmlResponsebody = ''' Example website

Name:Image 1
Name:Image 2
Name:Image 3

Name:Image 4
Name:Image 5

'''response = HtmlResponse(url='/', body=body, encoding='utf-8')# E:选中E元素print('[1]==========E:选中E元素==========')print(('img')) #

等同于 print(('//img'))# E1,E2:选中E1和E2元素print('[2]==========E1,E2:选中E1和E2元素==========')print(('base,title'))print(('base,title'))# E1 E2:选中E1后代中E2元素print('[3]==========E1 E2:选中E1后代中E2元素==========')print(('div img')) #

等同 print(('//div//img'))# E1>E2:选中E1元素中的E2元素print('[4]==========E1>E2:选中E1元素中的E2元素==========')print(('body>div'))# [ATTR]:选中包含ATTR属性的元素print('[5]==========[ATTR]:选中包含ATTR属性的元素==========')print(('[style]')) # print(('//div/@style'))# [ATTR=VALUE]:选中包含ATTR属性且值为VALUE的元素print('[6]==========[ATTR=VALUE]:选中包含ATTR属性且值为VALUE的元素==========')print(('[id="images-1"]')) # print(('//div[@id="images-1"]'))# E:nth-child(n):选中E元素，且该元素必须是其⽗元素的第n个⼦元素print('[7]==========E:nth-child(n):选中E元素，且该元素必须是其⽗元素的第n个⼦元素==========')#

选中每个div的第⼀个print(('div>a:nth-child(1)'))#

选中第⼆个div的第⼀个print(('div:nth-child(2)>a:nth-child(1)'))# E:first-child:选中E元素，该元素必须其⽗元素的第⼀个⼦元素# E:last-child:选中E元素，该元素必须其⽗元素的倒数第⼀个⼦元素print(('div:first-child>a:first-child'))print(('div:last-child>a:last-child'))# E::text:选中E元素的⽂本节点print('[8]==========E::text:选中E元素的⽂本节点==========')print(('a::text').extract()) # print(('//a/text()').extract())---------------------------D: D:/Project0611/ScrapyBook/practise/[1]==========E:选中E元素==========[, , , , ][2]==========E1,E2:选中E1和E2元素==========[, ][3]==========E1 E2:选中E1后代中E2元素==========[, , , , ][4]==========E1>E2:选中E1元素中的E2元素==========[, ][5]==========[ATTR]:选中包含ATTR属性的元素==========[][6]==========[ATTR=VALUE]:选中包含ATTR属性且值为VALUE的元素==========[][7]==========E:nth-child(n):选中E元素，且该元素必须是其⽗元素的第n个⼦元素==========[, ][][][][8]==========E::text:选中E元素的⽂本节点==========['Name:Image 1 ', 'Name:Image 2 ', 'Name:Image 3 ', 'Name:Image 4 ', 'Name:Image 5 ']Process finished with exit code 0