系列文章

  1. 网页爬虫第一课:从案例解构爬虫基本概念
  2. 填坑18年:我总结的CSS选择器
  3. 爬虫数据持久化方式的选择
  4. 爬取静态博客网页以分析本网站拓扑结构
  5. python程序的性能测试及瓶颈分析
  6. Python工程项目的规范开发指南

CSS选择器

这个坑我18年前就该填了.

十八年前, 我还是一个沉不住气的小朋友, 遇到困难随时准备放弃的那种. “CSS选择器”就是其中一个. 这么多年来, 这个坑时不时地折磨我一下, 让我错失很多机会. 痛定思痛, 今天我就要在两篇材料的辅佐下, 把它彻底解决掉.

下面首先给出全文总结出的CSS选择器的概览. 真是简单啊~

基本选择器

选择器 例子 含义
element p: 选择所有 <p> 元素 元素选择器: 通过元素标签选择 HTML 元素.
.class .intro: 选择 “class=”intro” 所有元素 类选择器: 通过类别名称选择具有特定类别的 HTML 元素.
#id #firstname: 选择 id=”firstname” 元素 ID 选择器: 通过元素的唯一标识符(ID)选择 HTML 元素.
* * 通用选择器: 选择所有元素.

多元素组合选择器

选择器 例子 含义
element,element div, p 选择所有<div>元素和所有<p>元素.
element element div p 选择<div>元素内的所有<p>元素.
element>element div > p 选择父元素是<div>的所有<p>元素.
element+element div + p 选择紧跟 <div> 元素的首个 <p> 元素.
element1~element2 p ~ ul 选择前面有 <p> 元素的每个 <ul> 元素.
.class1.class2 .name1.name2 选择 class 属性中同时有 name1 和 name2 的所有元素.
.class1 .class2 .name1 .name2 选择作为类名 name1 元素后代的所有类名 name2 元素.

属性选择器

选择器 例子 含义
element.class p.intro 选择 class=”intro” 的所有<p>元素。
[ attribute ] [target] 选择带有 target 属性的所有元素。
[ attribute = value ] [target=_blank] 选择带有 target=”_blank” 属性的所有元素。
[ attribute ~= value ] [title~=flower] 选择 title 属性包含单词 “flower” 的所有元素。
[ attribute |= value ] [lang|=en] 选择 lang 属性值以 “en” 开头的所有元素。
[ attribute ^= value ] a[href^=”https”] 选择其 src 属性值以 “https” 开头的每个 <a> 元素。
[ attribute $= value ] a[href$=”.pdf”] 选择其 src 属性以 “.pdf” 结尾的所有<a>元素。
[ attribute *= value ] a[href*=”abc”] 选择其 href 属性值中包含 “abc” 子串的每个<a>元素。

伪类选择器

CSS伪类是添加到选择器的关键字, 用于指定所选元素的特殊状态. 例如, 伪类:hover可以用于选择一个按钮, 当用户的指针悬停在按钮上时, 设置此按钮的样式.

1
2
3
4
/* 用户的指针悬停在其上的任何按钮 */
button:hover {
color: blue;
}

伪类由冒号(:)后跟着伪类名称组成(例如,:hover)。函数式伪类还包含一对括号来定义参数(例如,:dir())。附上了伪类的元素被定义为_锚元素_(例如,button:hover 中的 button)。

伪类让你可以将样式应用于元素,不仅与文档树内容有关,也与外部因素有关——如与导航历史有关的(例如,:visited)、与其内容的状态有关的(如某些表单元素上的 :checked)或者与鼠标位置有关的(如 :hover,它可以让你知道鼠标是否在一个元素上)。

树状结构伪类(12个)

These pseudo-classes relate to the location of an element within the document tree.

:root
Represents an element that is the root of the document. In HTML this is usually the <html> element.
:empty
Represents an element with no children other than white-space characters.
:nth-child
Uses An+B notation to select elements from a list of sibling elements.
:nth-last-child
Uses An+B notation to select elements from a list of sibling elements, counting backwards from the end of the list.
:first-child
Matches an element that is the first of its siblings.
:last-child
Matches an element that is the last of its siblings.
:only-child
Matches an element that has no siblings. For example, a list item with no other list items in that list.
:nth-of-type
Uses An+B notation to select elements from a list of sibling elements that match a certain type from a list of sibling elements.
:nth-last-of-type
Uses An+B notation to select elements from a list of sibling elements that match a certain type from a list of sibling elements counting backwards from the end of the list.
:first-of-type
Matches an element that is the first of its siblings, and also matches a certain type selector.
:last-of-type
Matches an element that is the last of its siblings, and also matches a certain type selector.
:only-of-type
Matches an element that has no siblings of the chosen type selector.

位置伪类(7个)

These pseudo-classes relate to links, and to targeted elements within the current document.

:any-link
Matches an element if the element would match either :link or :visited.
:link
Matches links that have not yet been visited.
:visited
Matches links that have been visited.
:local-link
Matches links whose absolute URL is the same as the target URL. For example, anchor links to the same page.
:target
Matches the element which is the target of the document URL.
:target-within
Matches elements which are the target of the document URL, but also elements which have a descendant which is the target of the document URL.
:scope
Represents elements that are a reference point for selectors to match against.

函数式伪类(4个)

These pseudo-classes accept a selector list or forgiving selector list as a parameter.

:is()
The matches-any pseudo-class matches any element that matches any of the selectors in the list provided. The list is forgiving.
:not()
The negation, or matches-none, pseudo-class represents any element that is not represented by its argument.
:where()
The specificity-adjustment pseudo-class matches any element that matches any of the selectors in the list provided without adding any specificity weight. The list is forgiving.
:has()
The relational pseudo-class represents an element if any of the relative selectors match when anchored against the attached element.

输入状态伪类(17个)

These pseudo-classes relate to form elements, and enable selecting elements based on HTML attributes and the state that the field is in before and after interaction.

:autofill
Matches when an <input> has been autofilled by the browser.
:enabled
Represents a user interface element that is in an enabled state.
:disabled
Represents a user interface element that is in a disabled state.
:read-only
Represents any element that cannot be changed by the user.
:read-write
Represents any element that is user-editable.
:placeholder-shown
Matches an input element that is displaying placeholder text. For example, it will match the placeholder attribute in the <input> and <textarea> elements.
:default
Matches one or more UI elements that are the default among a set of elements.
:checked
Matches when elements such as checkboxes and radio buttons are toggled on.
:indeterminate
Matches UI elements when they are in an indeterminate state.
:blank
Matches a user-input element which is empty, containing an empty string or other null input.
:valid
Matches an element with valid contents. For example, an input element with the type ‘email’ that contains a validly formed email address or an empty value if the control is not required.
:invalid
Matches an element with invalid contents. For example, an input element with type ‘email’ with a name entered.
:in-range
Applies to elements with range limitations. For example, a slider control when the selected value is in the allowed range.
:out-of-range
Applies to elements with range limitations. For example, a slider control when the selected value is outside the allowed range.
:required
Matches when a form element is required.
:optional
: Matches when a form element is optiona>
:hover
Matches when a user designates an item with a pointing device, such as holding the mouse pointer over the item.
:active
Matches when an item is being activated by the user. For example, when the item is clicked on.
:focus
Matches when an element has focus.
:focus-visible
Matches when an element has focus and the user agent identifies that the element should be visibly focused.
:focus-within
Matches an element to which :focus applies, plus any element that has a descendant to which :focus applies.

时间方面的伪类

参见

元素显示状态伪类

参见

语言的伪类

参见

资源状态伪类

参见


(需要时再补充. 现在写爬虫暂时用不到. 伪类的内容还是以该文[6]的形式为参考, 以w3school的内容作为权威[7])

结语

没想到CSS选择器这么简单, 我却让这个阿喀琉斯之踵折磨了我十几年.

每当想起这个过失对我的折磨, 我没有一天不后悔: 不仅仅为这一件事, 而是很多类似的遭遇. 回首前尘往事, 那个犯下错误的小笨蛋, 我想和他谈谈. 讲给他, 我现在的感受. 告诉他还可以有其他的方式解决问题. 但已经不能了, 那个少年早就不见了, 只剩下我垂老之躯. 所以我得接受事实.

但是让我改过自新吗? 狗屁不通的词儿. 我改了又能怎么样, 既然如此, 我根本不应该在乎.

参考与注释

www.runoob.com/cssref/css-selectors.html)的内容.


  1. 6.伪类的概念和种类. 英文版:Pseudo-classes
  2. 7.w3school的CSS 选择器参考手册是权威参考.