Processing math: 8%
  • 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

Web追踪技术综述

王晓茜, 刘奇旭, 刘潮歌, 张方娇, 刘心宇, 崔翔

王晓茜, 刘奇旭, 刘潮歌, 张方娇, 刘心宇, 崔翔. Web追踪技术综述[J]. 计算机研究与发展, 2023, 60(4): 839-859. DOI: 10.7544/issn1000-1239.202110681
引用本文: 王晓茜, 刘奇旭, 刘潮歌, 张方娇, 刘心宇, 崔翔. Web追踪技术综述[J]. 计算机研究与发展, 2023, 60(4): 839-859. DOI: 10.7544/issn1000-1239.202110681
Wang Xiaoxi, Liu Qixu, Liu Chaoge, Zhang Fangjiao, Liu Xinyu, Cui Xiang. Survey of Web Tracking[J]. Journal of Computer Research and Development, 2023, 60(4): 839-859. DOI: 10.7544/issn1000-1239.202110681
Citation: Wang Xiaoxi, Liu Qixu, Liu Chaoge, Zhang Fangjiao, Liu Xinyu, Cui Xiang. Survey of Web Tracking[J]. Journal of Computer Research and Development, 2023, 60(4): 839-859. DOI: 10.7544/issn1000-1239.202110681
王晓茜, 刘奇旭, 刘潮歌, 张方娇, 刘心宇, 崔翔. Web追踪技术综述[J]. 计算机研究与发展, 2023, 60(4): 839-859. CSTR: 32373.14.issn1000-1239.202110681
引用本文: 王晓茜, 刘奇旭, 刘潮歌, 张方娇, 刘心宇, 崔翔. Web追踪技术综述[J]. 计算机研究与发展, 2023, 60(4): 839-859. CSTR: 32373.14.issn1000-1239.202110681
Wang Xiaoxi, Liu Qixu, Liu Chaoge, Zhang Fangjiao, Liu Xinyu, Cui Xiang. Survey of Web Tracking[J]. Journal of Computer Research and Development, 2023, 60(4): 839-859. CSTR: 32373.14.issn1000-1239.202110681
Citation: Wang Xiaoxi, Liu Qixu, Liu Chaoge, Zhang Fangjiao, Liu Xinyu, Cui Xiang. Survey of Web Tracking[J]. Journal of Computer Research and Development, 2023, 60(4): 839-859. CSTR: 32373.14.issn1000-1239.202110681

Web追踪技术综述

基金项目: 中国科学院青年创新促进会(2019163);国家自然科学基金项目(61902396);中国科学院战略性先导科技专项项目(XDC02040100);中国科学院网络测评技术重点实验室和网络安全防护北京市重点实验室项目
详细信息
    作者简介:

    王晓茜: 1990年生. 博士研究生. 主要研究方向为网络安全、Web追踪

    刘奇旭: 1984年生. 博士,教授,博士生导师. 主要研究方向为Web安全和溯源取证

    刘潮歌: 1986年生. 博士,副教授. 主要研究方向为恶意代码和Web安全

    张方娇: 1989年生. 博士研究生. 主要研究方向为网络攻防和网安人才评估

    刘心宇: 1997年生. 博士研究生. 主要研究方向为Web安全和Android安全

    崔翔: 1978年生. 博士,教授,博士生导师. 主要研究方向为恶意代码分析和Web安全

    通讯作者:

    刘奇旭(liuqixu@iie.ac.cn

  • 中图分类号: TP391

Survey of Web Tracking

Funds: This work was supported by the Youth Innovation Promotion Association of Chinese Academy of Sciences (2019163), the National Natural Science Foundation of China (6190396), the Strategic Priority Research Program of Chinese Academy of Sciences (XDC02040100), and the Project of the CAS Key Laboratory of Network Assessment Technology and Beijing Key Laboratory of Network Security and Protection Technology
More Information
    Author Bio:

    Wang Xiaoxi: born in 1990. PhD candidate. Her main research interests include cyber security and Web tracking

    Liu Qixu: born in 1984. PhD, professor, PhD supervisor. His main research interests include Web security and attribution and forensic

    Liu Chaoge: born in 1986. PhD, associate professor. His main research interests include malware and Web security

    Zhang Fangjiao: born in 1989, PhD candidate. Her main research interests include cyber attack, defense and cybersecurity talents evaluation

    Liu Xinyu: born in 1997. PhD candidate. Her main research interests include Web security and Android security

    Cui Xiang: born in 1978. PhD, professor, PhD supervisor. His main research interests include malware analysis and Web security

  • 摘要:

    Web追踪技术已经成为信息化时代背景下的研究热点,是对用户进行身份标识和行为分析的重要手段. 通过跟进该领域的研究成果,从追踪技术和防御技术2方面分析Web追踪领域的研究与发展现状. 首先按照技术的实现方式将Web追踪分为了存储型追踪技术和指纹型追踪技术,分析了当前研究追踪现状. 其次按照追踪范围将Web追踪技术分为单浏览器追踪、跨浏览器追踪、跨设备追踪3个不同的层次,分析和讨论特征的获取技术和属性特点,论述特征、关联技术、追踪范围的关系;同时从Web追踪防御技术的形态角度,描述扩展防御、浏览器内嵌防御、防御框架工具和机制、防御对策或环境等不同技术的实现特点和抵御追踪的措施. 最后总结现有研究概况,针对性分析Web追踪技术和Web防御技术的优劣势,指出当下面临的问题及可能的发展方向.

    Abstract:

    Web tracking has become a research hotspot under the background of information age, and it is an important means for user identification and behavior analysis. By following up the research achievements in this field, we analyze the current status of researches and development situation in the field of Web tracking from the aspects of tracking technology or defense technology. Firstly, we introduce the current researches according to the implementation of technology, after dividing the Web tracking into storage tracking technology and fingerprint tracking technology. Secondly, according to the tracking ability of different technologies, we divide Web tracking technology into three different levels: single-browser tracking, cross-browser tracking and cross-device tracking, then we analyze and discuss the features’ acquisition technology and attribute characteristics of the features, expound the relationship among features, correlation technology and the scope of tracking technology. From the perspective of Web tracking defense technology, we describe the different technology implementation characteristics and defense countermeasures of defense of browser extension, browser-embedded defense, defense framework tools and other mechanisms, defense countermeasures or environment. Finally, we summarize the current research situation in the field of Web tracking, targetedly analyze the advantages and disadvantages of Web tracking technology and the Web defense technology, and point out the current problems and possible development directions in this field.

  • 随着互联网服务平台进入存量增长阶段,已注册用户不断产生各种交互行为. 在这种情况下,为现有用户提供更精准的服务已经成为大多数在线服务平台的首要任务. 序列推荐可以显式地建模用户的序列行为,对未来时刻用户可能感兴趣的物品进行预测. 早期的研究[1]根据购买相似物品的用户群来建立相关性模型. 然而,用户交互的项目数量与项目总数相比微不足道,导致严重的稀疏性问题. 许多研究[2-4]引入项目侧信息(如类别、标题、主页图像等)来丰富了项目表示. 尽管可以在一定程度上缓解稀疏性问题,然而这类信息都是商家按照模板上传生成的,缺乏多样性.

    互联网上存在着大量丰富的用户生成内容,这是一种用户自发创造的多模态信息,包括评论、图片、视频、评级等,可以作为购买和浏览等基本交互的补充. 研究人员尝试从用户生成内容中挖掘用户偏好,以应对稀疏性问题的挑战[5-8],然而,数据中存在的广告或误操作等噪音限制了推荐系统的性能. 如图1中,用户评论中存在着“绿色、优惠”等与真实偏好无关的噪音,影响用户文本偏好的捕获. 部分工作[9-10]利用不同模态间的一致性信息来减少噪音的干扰,但必须承认的是,完全消除用户生成内容中的噪音是不现实的.

    图  1  用户生成内容中噪音的示例
    Figure  1.  Illustration of noise in user-generated content

    大型语言模型(large language model,LLM)的兴起为推荐系统注入了新的活力,也为解决噪音问题带来了新的可能.LLM可以进行深度的语义分析,理解用户的隐含意图,生成个性化的推荐. 但大型语言模型的训练数据与推荐场景并不相关,语义空间难对齐. 部分工作试图通过调整指令来引导大型语言模型生成更准确的推荐结果,但结果容易被干扰,还可能会出现幻觉问题.

    针对上述问题,本文提出了一种基于大型语言模型的可信多模态推荐算法(large language model-based trusted multi-modal recommendation,Large-TR),它利用大型语言模型强大的语义理解能力过滤多模态数据中的噪音,对多模态用户偏好进行建模. 此外,针对噪音无法完全消除的问题,我们还设计了一种可信决策机制,动态评估推荐结果的不确定性,在高风险场景下保证推荐结果的可用性. 在4个广泛使用的公开数据集上的实验结果显示,相对于其他基线算法,本文提出的算法有更好的性能表现.

    综上,本文的主要贡献包括3个方面:

    1)我们借助大型语言模型过滤低质量多模态信息中的噪音,建模更精确和细致的用户偏好,有效提高推荐性能;

    2)我们设计了一种简单但有效的可信决策机制,可以动态评估推荐结果的可信度,保证推荐系统在有噪音的情况下依然可用;

    3) 我们在4个广泛使用的真实数据集上进行了实验,实验结果表明本文提出的算法可以显著提高性能.

    序列推荐从用户的历史交互序列中建模用户偏好的演变过程,预测下一个可能交互的商品. 早期序列推荐的工作[11-14]大多基于马尔可夫链假设,但稀疏性问题严重限制了性能的上限. Zhou等人[15]尝试建模项目与属性之间的关系,以增强项目的表示. Xie等人[16]从商品的描述、标题等文本信息中挖掘用户偏好,提升推荐性能. 很多情况下,图像作为一种视觉内容,可以提供语言无法表达的视觉信息. 受此启发,有工作提出[17]将视觉信号纳入用户偏好的建模过程,预测下一个可能交互的商品. 近年来,微视频作为电商平台新兴的内容形式,包含多场景、多角度的商品展示,Lei等人[18]提出了基于微视频的序列推荐算法. 综上,基于多模态的序列推荐已经开始集成来自图像、文本和其他模态(如视频和音频)的信息,以实现对项目更全面的表示和对用户偏好更准确的理解. 但现有方法仍面临2个主要问题:1)仅依赖商品相关的侧信息来丰富商品表征,未能直接对用户偏好进行建模,导致对用户需求的理解不够深入;2)多模态信息中往往存在不匹配不规范的噪音数据,严重影响推荐系统的性能. 因此本文提出了一种基于用户生成内容的多模态序列推荐方法,通过大型语言模型丰富的语义理解能力对多模态内容降噪,从用户的角度挖掘序列偏好和行为趋势.

    语言模型在推荐系统的发展中扮演着至关重要的角色. 早期研究[19-20]已经发现,语言模型框架具有统一建模各类推荐任务的潜力. 随着大型语言模型的出现,推荐领域迎来了巨大的进步. 这些大型语言模型具备极强的语言理解、生成和泛化能力,为解决推荐系统中的冷启动、个性化推荐和可解释性等问题提供了新的思路和方法. 但实现这一目标通常需要大量的推荐数据进行训练. 近期,Dai等人[21]通过特定的指令激发大型语言模型直接生成推荐结果,进一步证实了大型语言模型在推荐领域的巨大潜力. 在此基础上,部分研究[22-23]对指令进行微调,以期望将大型语言模型与推荐系统的语义空间尽可能对齐,从而获得更好的推荐效果. 基于大型语言模型的推荐方法仍处于探索阶段,其性能很大程度上依赖于指令的质量,生成结果易干扰,如何确保生成鲁棒的推荐结果是未来需要重点关注的问题. 因此,本文并没有通过大型语言模型直接生成推荐结果,而是利用大型语言模型已表现出的强大的语义理解与分析能力,过滤多模态交互数据中存在的噪音,将多模态信息和大型语言模型的知识共同注入到推荐系统中,提高推荐结果的准确性和鲁棒性.

    给定用户U的历史交互序列SU={I1,I2,,IN},用户U对序列中每个商品的文本评论可以表示为RU={rU1,rU2,,rUN},相应地,商品I收到的所有文本评论可以表示为RI={rI1,rI2,,rIM}. 本文收集交互序列中每个商品的首页图片作为多模态内容中的图片信息,表示为GU={gU1,gU2,,gUN}. 因此,本文的目标是通过观察到的交互序列SU和多模态内容(RU,RI,GU),预测出用户与未观察到的商品交互的概率,并提供预测结果的不确定性.

    本文提出的算法模型整体架构如图2所示,包括了大型语言模型去噪模块、多视图序列偏好模块和可信决策模块3个部分. Large-TR的设计基于以下假设:1)大型语言模型有理解图像和文本语义的能力.2)用户生成的多模态信息是揭示用户偏好的重要补充. 3)用户侧和商品侧都可以获得有效的推荐结果.

    图  2  模型的结构图
    Figure  2.  Overview of Large-TR

    用户的文本评论是体现个人偏好的重要信息来源,但其中存在的广告、误评等噪音屡见不鲜. 商品图片作为商家上传的模板化内容,其内容质量更高,还可以提供语言无法表达的视觉信息. 分析图像和文本之间的关联性,我们可以更好地判断评论的真实性和准确性. 大型语言模型在图文匹配的稀疏场景下具有很强的泛化能力,并在少量甚至零样本任务中实现了非常可观的性能. 我们假设大型语言模型已经具有理解图文信息,判断其是否一致的能力,我们需要做的就是通过调整提示符来触发这种能力. 提示符是提供给大型语言模型的文本片段,用来设定上下文或指示大型语言模型如何继续生成文本,广泛应用于各种自然语言处理任务,如文本生成、文档总结等. 本文旨在利用大型语言模型强大的图文理解能力,判断用户的评论文本与商品图片是否相关,从而过滤文本中的噪音. 为此,我们参考了相关上下文学习[24]和指令调整[22]的工作,并将这种能力表示为具有特定领域提示的任务.

    图2左侧说明了我们如何调整提示符从大型语言模型中引出图文匹配的功能,我们的提示由2个部分组成:1)描述:需要设计领域相关的文本,增强大型语言模型对执行相关任务所需知识的感知能力.2)任务:给定用户的评论与相应的商品图片,要求大型语言模型给出回答. 在优化过程中,我们会为大型语言模型添加示例,以帮助其准确理解任务要求. 随后,通过真实标签对模型输出进行评估,并基于评估结果不断改进提示符的设计,以提升模型的图文匹配能力. 相关的提示符优化过程在附录A中给出.

    为了更有效地将大型语言模型过滤后的多模态信息用于后续的推荐任务,我们利用预训练模型对文本和图片进行嵌入处理. 对于文本,我们将输入的文本句子分割成多个词汇单元token,并在句子的开头添加[CLS]和[SEP]等特殊标记. 预训练模型会输出[CLS]的隐藏状态,它可以表示整个句子的语义信息,用于下游任务. 具体来说,我们将rUnrIm视为token序列,即rUn={tokent}|rUn|t=1rIm={tokent}|rIm|t=1,每个评论按如下方式进行嵌入:

    {\boldsymbol{t}}_{n}^{U}={F}_{\mathrm{t}}\left({r}_{n}^{U}\right),{\boldsymbol{t}}_{m}^{I}={F}_{\mathrm{t}}\left({r}_{m}^{I}\right),{\boldsymbol{t}}_{n}^{U},{\boldsymbol{t}}_{m}^{I}\epsilon{\mathbb{R}}^{{d}_{\mathrm{t}}}\text{,}  (1)

    {F}_{\mathrm{t}}(\cdot ) 是Bert(bidirectional encoder representations from Transformers)模型[25]. 根据式 \left(1\right) 我们可以获得文本嵌入 {\boldsymbol{T}}^{U}=[{\boldsymbol{t}}_{1}^{U},{\boldsymbol{t}}_{2}^{U},… ,{\boldsymbol{t}}_{N}^{U}] {\boldsymbol{T}}^{I}=[{\boldsymbol{t}}_{1}^{I},{\boldsymbol{t}}_{2}^{I},… ,{\boldsymbol{t}}_{M}^{I}] . 通过汇总和分析不同用户对同一商品的多样化评论,我们可以提取到更全面、更丰富的商品特征信息,还可以在一定程度上过滤掉那些与商品无关的用户文本偏好. 基于上述考虑,我们通过共同注意力机制[26](co-attention mechanism)提取 {R}^{U} {R}^{I} 之间的匹配模式,对用户的文本偏好进行重新建模. 具体来说,该机制分为3个步骤:

    1) 构建亲和力矩阵.

    {\boldsymbol{A}}^{UI}=\boldsymbol{tan}h\left({\left({\boldsymbol{T}}^{I}\right)}^{\mathrm{T}}{\boldsymbol{M}}^{R}{\boldsymbol{T}}^{U}\right),{\boldsymbol{A}}^{UI}\epsilon{\left(-\mathrm{1,1}\right)}^{M\times N} \text{,}  (2)

    其中 {\boldsymbol{M}}^{R} 是一个参数矩阵, {\boldsymbol{A}}^{UI} 中第 i 行第 j 列的元素表示 {R}^{I} i 个评论与 {R}^{U} j 个评论之间的相似性.

    2)我们对 {\boldsymbol{A}}^{UI} 实施行行最大化操作,然后使用 softmax 生成相关性向量.

    {\boldsymbol{a}}^{UI}=softmax\left(RowMax\left({\boldsymbol{A}}^{UI}\right)\right),{\boldsymbol{a}}^{UI}\epsilon{\left(\mathrm{0,1}\right)}^{M}.   (3)

    3)对商品评论矩阵进行关注度聚合,计算最终的相关性嵌入,并与原文进行拼接.

    {\tilde{\boldsymbol{t}}}_{n}^{U}={\boldsymbol{t}}_{n}^{U}\earth{\boldsymbol{a}}^{UI}{\boldsymbol{T}}^{I},{\tilde{\boldsymbol{t}}}_{n}^{U}\epsilon{\mathbb{R}}^{{d}_{\mathrm{t}}},   (4)

    \left(4\right) 中的 \earth 代表向量连接操作,这一步代表对 {S}^{U} 中的每个 {r}_{n}^{U} 重新建模以获得重构的用户文本偏好: {\tilde{\boldsymbol{T}}}^{U}=[{\tilde{\boldsymbol{t}}}_{1}^{U},{\tilde{\boldsymbol{t}}}_{2}^{U},… ,{\tilde{\boldsymbol{t}}}_{N}^{U}] . 许多偏好信息通常很难通过文字准确传达,图像可以提供直观的视觉展示,捕捉到用户对物品外观、风格、颜色等方面的偏好. 与文本嵌入方式类似,我们使用预训练模型对每张图片进行嵌入:

    {\boldsymbol{z}}_{n}^{U}={F}_{\mathrm{v}}\left({g}_{n}^{U}\right),{\boldsymbol{z}}_{n}^{U}\epsilon{\mathbb{R}}^{{d}_{\mathrm{v}}},   (5)

    其中 {F}_{\mathrm{v}}(\cdot ) 是CLIP(contrastive language-image pre-training)模型[27],用户视觉偏好可以表示为 {\boldsymbol{Z}}^{U}=[{\boldsymbol{z}}_{1}^{U}, {\boldsymbol{z}}_{2}^{U},… ,{\boldsymbol{z}}_{N}^{U}] . 图文嵌入模块均支持灵活的嵌入维度,也支持扩展到更多的模态. 这些预训练模型可以看作对模态信息进行特征提取的编码器. 文本偏好和视觉偏好的结合可以增强对用户多层次、多维度偏好的理解. 因此,我们将文本偏好和视觉偏好作为多模态视角进行整合:

    {\boldsymbol{x}}_{1}^{U}={\boldsymbol{W}}_{\mathrm{m}}\left[{\boldsymbol{Z}}^{U}\earth{\tilde{\boldsymbol{T}}}^{U}\right],{\boldsymbol{x}}_{1}^{U}\epsilon{\mathbb{R}}^{{d}_{\mathrm{m}}\times N},   (6)

    其中 {\boldsymbol{W}}_{\mathrm{m}}\epsilon{\mathbb{R}}^{{{d}_{\mathrm{m}}\times (d}_{\mathrm{t}}+{d}_{\mathrm{v}})} 代表权重矩阵. 为了尽可能地让用户的文字偏好和视觉偏好保持对齐,减少文本偏好可能带来的噪音干扰,本文使用跨模态损失函数[28]. 图像-文本对可以构造为 \{\left({r}_{i}^{U},{g}_{j}^{U}\right),{o}_{ij}{\}}_{i,j=1}^{N} ,其中 {o}_{ij} =1表示图文是匹配的. 因此,图文匹配的投影概率可以表示为

    {p}_{ij}=\dfrac{\mathrm{e}\mathrm{x}\mathrm{p}\left(\right({\tilde{\boldsymbol{t}}}_{i}^{U}{)}^{\mathrm{T}}{\bar{\boldsymbol{z}}}_{j}^{U})}{\displaystyle\sum\limits_{k=1}^{N}\mathrm{e}\mathrm{x}\mathrm{p}\left(\right({\tilde{\boldsymbol{t}}}_{i}^{U}{)}^{\mathrm{T}}{\bar{\boldsymbol{z}}}_{k}^{U})},   (7)

    其中 {\bar{\boldsymbol{z}}}_{j}^{U}=\dfrac{{\boldsymbol{z}}_{j}^{U}}{\left|\right|{\boldsymbol{z}}_{j}^{U}\left|\right|} 代表归一化的图像表示, ({\tilde{\boldsymbol{t}}}_{i}^{U}{)}^{\mathrm{T}}{\bar{\boldsymbol{z}}}_{j}^{U} 是文本偏好 {\tilde{\boldsymbol{t}}}_{i}^{U} 在图像偏好 {\boldsymbol{z}}_{j}^{U} 上的标量投影. 通过计算投影概率 {p}_{ij} 和真实匹配概率 {q}_{ij} 之间的KL散度(Kullback-Leibler),得到文本与图像间的损失函数:

    {\mathcal{L}}_{\mathrm{r}\mathrm{e}\mathrm{c}}({S}^{U})=\frac{1}{N}\sum _{i=1}^{N}\sum _{j=1}^{N}{p}_{ij}\mathrm{l}\mathrm{n}\frac{{p}_{ij}}{{q}_{ij}+\varepsilon },   (8)

    其中 \varepsilon 是一个非常小的数用于避免数值溢出的问题, {q}_{ij}=\dfrac{{o}_{ij}}{\displaystyle\sum\limits_{k=1}^{N}{p}_{ik}} 代表 \left({r}_{i}^{U},{g}_{j}^{U}\right) 归一化的真实匹配概率. 通过最小化损失函数,可以使投影概率分布曲线的形状接近真实匹配概率分布曲线的形状,从而使得匹配的图像文本对的投影值最大,不匹配的图像文本对投影值最小.

    在推荐领域,使用项目唯一标识符(identity,ID)对项目进行表示一直是主流做法. 本文认可项目ID的作用,也希望尽可能地保留原始项目ID的信息,不受其他因素的干扰. 因此,本文将其视为一个独立的视图. 具体来说,将交互序列 {S}^{U} 输入到项目嵌入层,以获得商品视角的嵌入:

    {\boldsymbol{x}}_{2}^{U}={F}_{i}({S}^{U}),{\boldsymbol{x}}_{2}^{U}\in {\mathbb{R}}^{{d}_{\mathrm{i}}\times N}.   (9)

    众所周知,项目ID和多模态信息都能提供有效的推荐,然而,仅仅将两者拼接或者融合在一起会丢失有价值的语义信息. 因此我们认为应该从互补但独立的视角 {\{{\boldsymbol{x}}_{v}^{U}\}}_{1}^{\left|V\right|} 来考虑最终的推荐结果, v=1 代表多模态信息的视角, v=2 代表项目ID的视角,视角的数量可以随着推荐系统的需求进一步扩展. 我们使用门控循环单元 GRU (gated recurrent unit)来捕获各个视图中的顺序信息:

    {\boldsymbol{h}}_{v,t}^{U}=GRU\left({\boldsymbol{h}}_{v,t-1}^{U},{\boldsymbol{x}}_{v,t}^{U}\right),   (10)

    其中 {\boldsymbol{h}}_{v,t}^{U} 代表视角 v 的隐藏状态,我们使用隐藏状态作为每个视图下学习到的偏好. 直接将隐藏状态送入 softmax 网络层就可以产生推荐结果,但由于 softmax 函数会使得网络更倾向于选择置信度最高的类别,而忽视其他可能的选择. 这种情况下,即使某些推荐结果并不准确,它们也会被模型过度自信地排在前面. 这可能导致用户接收到与其实际兴趣不符的推荐,降低推荐系统的可靠性和用户满意度. 因此,需要设计一种能够有效保障推荐结果可靠性的方法,以提升推荐性能.

    本文并没有像传统做法一样简单地将所有视角中的信息整合到一个表征中,而是在证据层面评估每种视角的置信度,最后整合不同视角的信息获得每个项目的推荐概率和当前预测的总体不确定性.

    推荐本质上是一种具有 \left|I\right| 个类别的多分类问题. 只需把传统的分类器稍加改动转化为基于证据的分类器,神经网络可以从输入中捕捉证据来诱导分类意见[29]. 我们将传统分类器 softmax 层替换为激活函数层(即 ReLu 层). 因此,对于第 v 个视图,这些非负值就被视为证据向量 {\boldsymbol{e}}_{v}=\left[{e}_{v,1},{e}_{v,2},… ,{e}_{v,\left|I\right|}\right] . 基于主观逻辑[30]的可信决策理论定义了一个基于证据获得不同类别概率(信念质量)和总体不确定性(不确定性质量)的理论框架. 证据与狄利克雷分布参数密切相关,即狄利克雷分布的参数 {\alpha }_{v,i} 是由证据 {e}_{v,i} 导出的:

    {\alpha }_{v,i}={e}_{v,i}+1,   (11)

    主观逻辑会为每个项目分配一个信念质量 {b}_{v,i} ,并为该视角下的整体框架分配一个总体不确定性 {c}_{v} . 在视角 v 中, \left|I\right|+1 个质量值是非负的,且总和为1,

    \sum _{i=1}^{\left|I\right|}{b}_{v,i}+{c}_{v}=1.   (12)

    其中 {b}_{v,i}\ge 0 表示第 i 个项目被推荐的概率, {c}_{v} 代表该视角下的总体不确定性. 具体计算方式如下:

    {b}_{v,i}=\frac{{e}_{v,i}}{{D}_{v}}=\frac{{\alpha }_{v,i}-1}{{D}_{v}},{c}_{v}=\frac{\left|I\right|}{{D}_{v}},   (13)

    \left(13\right) {D}_{v}=\displaystyle\sum\limits_{i=1}^{\left|I\right|}\left({e}_{v,i}+1\right)=\displaystyle\sum\limits_{i=1}^{\left|I\right|}{\alpha }_{v,i} 代表狄利克雷强度. 信念分配可以被视为主观意见,也就是说,项目 i 的证据越多,它被分配的概率也就越大.

    不同视角的质量往往因样本质量不同而存在差异,对于不同视角间的融合,我们提出了一种自适应的融合方法,并不是给每个视角分配固定权重. Dempster-Shafer 理论允许将不同来源的证据结合起来,在我们的算法模型中,它结合了多模态视角信息和商品ID视角信息的质量分配集,从而获得联合质量. 这里可以灵活扩展到多个视图融合. 具体来说, {M}_{1}=\{{\left\{{b}_{1,i}\right\}}_{i=1}^{\left|I\right|},{c}_{1}\} 代表多模态视角的质量集, {M}_{2}= \{{\{{b}_{2,i}\}}_{i=1}^{\left|I\right|},{c}_{2}\} 代表商品ID视角的质量集,两者一起计算可以得到联合质量集 M

    M={M}_{1}\earth{M}_{2}.   (14)

    各部分更具体的计算规则如下:

    {b}_{i}=\frac{1}{1-\beta }\left({b}_{1,i}{b}_{2,i}+{b}_{1,i}{c}_{2}+{b}_{2,i}{c}_{1}\right),c=\frac{{c}_{1}{c}_{2}}{1-\beta },  (15)

    其中 \beta =\displaystyle\sum\limits_{i\ne j}{b}_{1,i}{b}_{2,j} 表2个质量集中的冲突程度, \dfrac{1}{1-\beta } 用于归一化. 因此,相应的联合证据和狄利克雷分布的参数可以诱导为

    表  2  各数据集的实验结果
    Table  2.  Experimental Results for Each Dataset
    k模型Amazon-BeautyAmazon-SportAmazon-ToysYelp
    Recall /% NDCG /% Recall /% NDCG /% Recall /% NDCG /% Recall /% NDCG /%
    10GRU4Rec5.292.663.121.573.701.843.611.84
    Caser4.742.392.271.183.611.863.801.97
    Bert4Rec5.292.372.951.305.332.345.243.27
    SASRec8.283.715.262.338.313.756.504.01
    RNS8.964.045.322.359.364.256.764.12
    DIF-SR9.084.465.562.6410.135.046.984.19
    MMSRec9.494.766.353.2311.546.147.494.52
    本文9.434.856.393.3511.836.928.434.90
    20GRU4Rec8.933.444.822.015.882.395.922.43
    Caser7.313.023.641.535.6602.386.082.55
    Bert4Rec8.153.094.651.737.872.977.563.85
    SASRec11.974.647.732.9511.684.609.284.71
    RNS12.314.787.742.9912.315.129.544.72
    DIF-SR12.845.417.983.2513.824.9710.032.96
    MMSRec13.415.779.574.2310.037.0411.075.25
    本文14.346.2210.184.7211.687.8312.695.95
    注:加粗的数字代表最佳实验结果.
    下载: 导出CSV 
    | 显示表格
    D=\frac{\left|I\right|}{c},{e}_{i}={b}_{i}\times D,{\alpha }_{i}={e}_{i}+1.   (16)

    综上,我们可以得到估计的联合证据 \boldsymbol{e} 和狄利克雷分布的相应参数 \boldsymbol{\alpha } ,从而得出每个商品的最终概率和总体不确定性. 相应的损失函数由交叉熵损失调整得到:

    {\mathcal{L}}_{\mathrm{c}\mathrm{e}\mathrm{r}}\left({\boldsymbol{\alpha }}_{v}\right)=\sum _{j=1}^{\left|I\right|}{y}_{ij}\left(\psi \left({D}_{\boldsymbol{v}}\right)-\psi \left({\alpha }_{v,j}\right)\right),   (17)

    其中 \psi (\cdot ) 是digamma函数. 损失函数是交叉熵损失函数在 {\boldsymbol{\alpha }}_{v} 确定的单纯性形上的积分,它允许每个样本的正确标签比其他类别标签产生更多的证据. 因此,本文提出的算法模型总体损失函数为

    L={\mathcal{L}}_{\mathrm{r}\mathrm{e}\mathrm{c}}\left({S}^{U}\right)+\lambda \left(\sum _{v=1}^{\left|V\right|}{\mathcal{L}}_{\mathrm{c}\mathrm{e}\mathrm{r}}\left({\boldsymbol{\alpha }}_{v}\right)+{\mathcal{L}}_{\mathrm{c}\mathrm{e}\mathrm{r}}\left(\boldsymbol{\alpha }\right)\right).   (18)

    为了验证模型的有效性,本文在4个真实公开的数据集上进行了实验. 首先介绍数据集、评估指标、基线方法和参数设置. 随后将模型的性能与基线的性能进行比较. 最后进行消融实验并回答以下问题:

    问题1. 模型能否超越经典的序列推荐方法和多模态序列推荐方法.

    问题2. 不同的组件如何影响模型的性能.

    问题3. 超参数的设置对模型有何影响.

    本文选择了4个真实世界的公共数据集进行实验,包括Amazon-Beauty,Amazon-Sport,Amazon-Toys,Yelp.我们为所有数据集选择了产品图片和用户评论作为附加的多模态信息. 参照以前的方法[31]对数据进行预处理,保留5次及5次以上交互的用户和项目. 然后,根据时间戳对每个用户的交互进行排序. 所有交互都被视为隐式反馈. 4组数据的统计结果如表1所示:

    表  1  数据集
    Table  1.  Dataset
    数据集用户项目交互稀疏度/%
    Amazon-Beauty22 36312 101198 50299.93
    Amazon-Sport35 59818 357256 308399.91
    Amazon-Toys19 41211 924167 59799.93
    Yelp30 49920 0682 563 08399.95
    下载: 导出CSV 
    | 显示表格

    为了评估序列推荐系统的性能,本文采用Top- k 召回率 ( Recall@k ) 和 Top- k 归一化折扣累积增益 ( NDCG@k )2种评价指标,值越高,表示模型的性能就越好. 其中 k 从{10,20}2个常用指标中选择. Recall 的计算公式如下:

    Recall=\frac{\left|R\left(u\right)\bigcap T\left(u\right)\right|}{T\left(u\right)},  (19)

    其中 R\left(u\right) 表示模型预测出需要推荐给用户的商品集合, T\left(u\right) 表示真实测试集中被推荐的集合. NDCG 的计算公式是

    NDCG=\frac{DCG}{IDCG}.   (20)

    式(20)中各部分的详细计算公式如下:

    DCG=\sum _{i=1}^{p}\frac{{2}^{{rel}_{i}}-1}{\mathrm{l}\mathrm{b}\left(i+1\right)},IDCG=\sum _{i=1}^{\left|REL\right|}\frac{{2}^{{rel}_{i}}-1}{\mathrm{l}\mathrm{b}\left(i+1\right)}.   (21)

    其中 {rel}_{i} 表示第 i 个位置的商品用户是否喜欢,喜欢为1,否则为0. REL 表示将被推荐的商品相关性从大到小排序后的集合, IDCG 的分子都为1,它代表了召回集合中的商品按用户喜欢相关性排序的最理想情况. 根据文献[32]的建议,我们以完整排名的方式评估模型性能,以进行公平比较. 排名结果来自整个项目集,而不是样本集上获得的.

    我们选择了2类方法进行比较:经典的序列推荐方法(即 GRU4Rec,SASRecCaser,BERT4Rec)和结合不同侧信息的新方法,其中包括利用商品属性的 DIF-SF、利用评论信息的RNS和利用多模态信息的 MMSRec. 以上提到的方法都只考虑了与项目相关的侧信息.

    GRU4Rec[3]:基于会话的推荐模型,使用门控循环单元更好地捕获用户序列行为,提高推荐性能.

    SASRec[31]:序列推荐模型,利用自我注意力机制分析整个用户行为序列,预测下一个可能交互的商品.

    Caser[33]:序列推荐模型,采用卷积滤波器捕获全局级别和局部级别的用户序列行为模式.

    BERT4Rec[34]:具有双向编码器的序列推荐模型,它使用类似完形填空任务去训练双向编码器并预测被屏蔽的项目.

    DIF-SR [16]:考虑商品各种属性的序列推荐模型,它将和商品相关的侧信息从输入层转移至注意力层,并将注意力计算与项目表征分离开来.

    RNS[35]:文本评论驱动的序列推荐模型,综合考虑了用户的内在偏好和序列行为模式.

    MMSRec[36]:自监督的多模态模型,整合了视觉模态和文本模态的特征,采用双塔架构和自监督学习来提高序列推荐性能.

    对于以上提到的模型,我们都使用了公开的源代码和真实世界的数据集. 模型中的用到的超参数,我们遵循了原作者的建议,列出了每个基线方法在建议超参数设置下的最优结果. 本文提出的模型Large-TR基于pytorch实现,大型语言模型为qwen-max-1201,使用自适应梯度算法进行了100次训练,批样本大小为 32,学习率为0.001. 对于其他超参数,我们遍历所有参数设置以找便到最优结果,包括:隐藏层大小∈{100, 200, 400, 500, 600, 700, 1000},层数∈{1, 2, 3, 4},图像嵌入维数∈{256, 512, 1024},文本嵌入维数∈{256, 512, 1024}. 不同数据集的结果可能会有所差异.

    在4个数据集上的实验结果如表2所示. 通过这些结果,我们可以观察到模型Large-TR在多模态序列推荐中展示出了优异的性能. 在大多数情况下,包含辅助信息的推荐方法(如MMSRec,RNS,DIF-SR)在性能表现上优于传统的推荐方法(如GRU4Rec,Caser,BERT4Rec,SASRec),这验证了引入辅助信息提升性能的可行性. 此外,MMSRec相较于其他引入辅助信息的方法(如RNS,DIF-SF)表现更为出色,这表明使用多模态内容的方法与仅使用单一模态的推荐方法相比,能让模型从更多维度和更丰富的上下

    文中提取信息,从而更准确地反映用户的真实意图和兴趣偏好. 最后,Large-TR几乎在所有基线方法中均取得了最优表现. 这主要得益于以下几个方面首先,我们充分利用了用户生成的多模态内容,这使得我们的算法模型具有更丰富和多样化的输入特征. 其次,我们特别考虑了多模态内容中存在的噪音问题,设计了基于大型语言模型的降噪策略,提高了数据质量. 最后,通过可信决策机制,我们从多模态信息和项目信息2个视角获取一致且可信的推荐决策.

    为了确定各种类型的辅助信息对模型Large-TR的影响,我们使用不同的辅助信息进行了消融实验. 我们采用在推荐平台上常见的数据:项目ID、用户评论(文本信息),商品图片(视觉信息). 针对以上数据,我们分别设计了4种类型的实验如表3所示:

    表  3  各数据集的辅助信息消融结果
    Table  3.  Side-Information Ablation Results for Each Dataset %
    辅助信息 Amazon-Beauty Amazon-Sport
    Recall @1 0 NDCG @ 10 Recall @1 0 NDCG @ 10
    项目ID 5.29 2.66 3.12 1.57
    项目ID+图片 7.21 3.78 4.84 2.39
    项目ID+文本 8.93 4.49 5.50 2.67
    全部 9.43 4.85 6.39 3.35
    下载: 导出CSV 
    | 显示表格

    实验结果表明,仅使用文本或图片信息就能提高推荐性能,这证明了侧面信息的有效性. 此外,2种模态信息的结合也能进一步提高推荐性能,这表明我们提出的融合算法可以有效地利用来自不同模态的信息.

    此外,为了研究模型Large-TR中不同组件对实验结果的影响,我们设计了相应的消融研究,并分别在Amazon-Beauty,Amazon-Sport,Yelp 数据集上进行了实验,实验结果如表4所示:

    表  4  各数据集的组件消融结果
    Table  4.  Component Ablation Results for Each Dataset %
    实验Amazon-BeautyAmazon-ToysYelp
    Recall @ 10 NDCG @ 10 Recall @ 10 NDCG @ 10 Recall @ 10 NDCG @ 10
    T18.824.499.364.257.114.32
    T29.114.6710.885.988.014.73
    T39.434.8511.836.928.434.90
    下载: 导出CSV 
    | 显示表格

    相关实验设置如下:

    T1. Large-TR w/o LLM:图片和文本经过对应的编码器后直接拼接,其他部分不变;

    T2. Large-TR w/o Trust:可信决策部分由单层 softmax 代替,其他部分不变;

    T3. Large-TR w/o LLM&Trust:包含所有组件的模型.

    根据实验结果,我们可以得出以下结论:引入大型语言模型去噪和可信决策2个组件可以有效提升推荐算法的性能,这说明我们模型设计是合理且有效的. 大型语言模型利用其丰富的语义知识,过滤不相关的图文内容,显著提高了多模态数据质量. 最后,可信决策机制在证据层面评估每个视角的置信度,整合不同视角的信息获得每个项目的推荐概率和当前预测的总体不确定性,提供可信推荐.

    本节主要研究了网络的隐藏层维度和层数这2个超参数对实验结果的影响,图3显示了模型Large-TR在不同隐藏层维度设置下的Recall@10和 NDCG@10 分数. 4个数据集的结果表明,当隐藏层维度大小为600时,模型性能最佳. 图4显示了不同网络层数与模型性能的关系,可以看出网络层数为2时,模型效果最好.

    图  3  隐藏层维度对性能的影响
    Figure  3.  Impact of hidden layer dimension on performance
    图  4  网络层数对性能的影响
    Figure  4.  Impact of network layers on performance

    本文探讨了基于多模态内容的序列推荐问题,针对如何从噪声较多的用户生成内容中获得可信推荐这一挑战,提出了Large-TR方法. 该方法利用大型语言模型的丰富语义知识,识别和过滤用户生成内容中的噪声,提高多模态数据质量. 接着,从商品ID和多模态信息2个视角挖掘用户偏好,动态评估推荐结果的不确定性,并提供相应的推荐结果及置信度. 在4个公开数据集上的实验结果表明,该模型在性能上优于大多数现有的序列推荐模型. 最后的消融实验证明了Large-TR中各个组件的有效性. 未来的工作会考虑可解释的可信推荐系统,帮助用户更好地理解推荐结果. 此外,对置信度的评估也是一个值得深入研究的方向.

    作者贡献声明:闫萌提出了算法思路和实验方案,并撰写论文;徐偲辅助写作并修改论文;黄海槟负责数据处理、实验实施及图表绘制;赵伟与管子玉提供了关键的指导意见.

  • 表  1   存储型追踪技术能力分析

    Table  1   Analysis of Storage Tracking Technology Capabilities

    存储型追踪使用范围(常规浏览器)是否跨域共享多位置置存储是否多浏览器共享存储时长其他
    Cookie全部是(可使用前后端技术实现)设置过期时间安全性较低
    Flash Cookie全部默认无过期时间已弃用
    UserDataWindows + IE5设置过期时间已弃用
    EverCookie全部是(使用Flash Cookie机制)可通过各种存储机制重建已弃用
    SessionStorage除IE8以下版本
    之外浏览器
    浏览器会话时效内安全性较低
    LocalStorage除IE8以下版本
    之外浏览器
    非主动删除不过期安全性较低
    Web SQL部分无限制已弃用
    IndexedDB全部无限制安全性较低
    下载: 导出CSV

    表  2   JavaScript对象的API及对应获取的信息

    Table  2   API of JavaScript Object and the Corresponding Obtained Information

    JavaScript对象API获取的信息
    Navigator(宿主对象)navigator.platform系统平台
    navigator.userAgent用户代理
    navigator.language浏览器首选语言
    navigator.cpuClass浏览器CPU等级
    navigator.plugins插件列表
    navigator.doNotTrack是否设置不追踪
    Screen(宿主对象)screen.width/screen.height屏幕分辨率
    screen.availWidth/screen.availHeight屏幕可用分辨率
    colorDepth色彩深度
    Date(内置对象)getTimezoneOffset()时区
    下载: 导出CSV

    表  3   指纹型追踪技术特征分类及介绍

    Table  3   Classification and Introduction of Fingerprint Tracking Technology Features

    特征分类可利用特征变化因素获取方式
    浏览器特征用户代理(user agent,UA)随浏览器升级而变化,短期内稳定API
    语言与用户设置相关,相对稳定API
    字体与系统安装字体相关,不稳定枚举探测
    历史记录、浏览器缓存随用户使用习惯变化,可清除,不稳定枚举探测
    插件随系统安装软件变化,不稳定API
    扩展随用户喜好和使用率变化,不稳定枚举探测
    操作系统特征操作系统类型与系统安装相关,较为稳定API
    时区与用户设置相关,相对稳定API
    主机缓存与用户行为相关,短期内相对稳定枚举、间接利用
    网络特征公网IP与网络环境相关,不稳定HTTP头获取
    内网IP、内网主机信息及开放端口与内网环境相关,不稳定漏洞利用、枚举探测
    TLS会话追踪与网站和浏览器设置相关,相对稳定攻击利用
    硬件特征硬件平台、分辨率、色彩深度与硬件本身相关,短期内无变化,较为稳定API
    CPU、GPU、音频、电池与硬件属性和性能相关,较为稳定API、测量获取
    用户交互特征是否设置“不追踪”、鼠标键盘记录与用户操作相关,不稳定算法分析
    存储型追踪标识Cookie、HTML5存储型API与用户操作相关,可清除,相对稳定API
    下载: 导出CSV

    表  4   浏览器扩展枚举技术研究

    Table  4   Researches on Technologies of Enumerating Browser Extension

    研究技术扩展数据集技术形式功性能
    XHOUND[29]10000个最流行的Google Chrome扩展对页面DOM进行的独特的
    修改进行检测
    几秒钟内对数十个扩展进行指纹识别
    Discovering Extensions
    via WARs[26]
    43429 个Chrome扩展通过Web可访问资源,
    检测扩展是否存在
    可检测前1000个免费的Chrome扩展中的50%以上和所有Chrome扩展的28%
    Extension breakdown[28]718个Safari扩展URI泄露技术可识别40%以上的Safari扩展
    Latex Gloves[27]62994个Chrome扩展,
    8646个Firefox扩展
    WAR检测,以及检测扩展
    在网页中注入的代码
    能够识别90%的内容注入型扩展
    Carnus[30]29428个可检测的
    Chrome扩展
    4种不同的检测技术83.6%~87.92%的指纹在最先进的对策下仍然有效
    Fingerprinting in Style[31]116485 Chrome扩展注入CSS的枚举插件Extension可识别4446个扩展,且有1074个扩展(24%)未被之前的技术识别
    下载: 导出CSV

    表  5   防御技术的形态与其使用的抵抗手段

    Table  5   The Forms and Used Resistance Means of Defense Technology

    防御技术形态抵抗手段
    扩展内嵌防
    御机制
    框架/工
    具/机制
    对策或
    环境
    随机化
    策略
    引入
    噪音
    属性
    重组
    重写/
    欺骗
    访问
    控制
    同质
    环境
    算法信息
    隔离
    用户
    参与
    FPGuard[101]
    PriVaricator[102]
    Blink[96]
    FP-Block[89]
    TrackingFree[95]
    Cliqz[98]
    TrackMeOrNot[88]
    隐藏特征属性[97]
    Tor[113]
    BrowsingFog[91]
    FPRandom[103]
    Latex Gloves[27]
    PETInspector[114]
    FingerprintAlert[90]
    抑制扩展膨胀[100]
    CloakX[111]
    UNIGL[105]
    docker集群组装[112]
    VisibleV8[106]
    Canvas Blocker[92]
    Canvas Deceiver[93]
    FPSelect[106]
    BrFAST[108]
    My Rules[94]
    注:“√”表示防御技术具备某种形态或包含某种抵抗手段.
    下载: 导出CSV

    表  6   防御技术对追踪技术的防御覆盖度

    Table  6   The Defensive Coverage that Defensive Technology Possessed to Tracking Technology

    防御技术基本特征IPCookieCanvas
    指纹
    WebGL
    指纹
    Audio
    指纹
    历史记录字体扩展缓存硬件信息JS属性
    枚举检测
    TLS会话交互式
    追踪
    FPGuard[101]
    PriVaricator[102]
    Blink[96]
    FP-Block[89]
    TrackingFree[95]
    Cliqz[98]
    TrackMeOrNot[88]
    隐藏特征属性[97]
    Tor[113]
    BrowsingFog[91]
    FPRandom[103]
    Latex Gloves[27]
    PETInspector[114]
    FingerprintAlert[90]
    抑制扩展膨胀[100]
    CloakX[111]
    UNIGL[105]
    docker集群组装[112]
    VisibleV8[106]
    Canvas Blocker[92]
    Canvas Deceiver[93]
    FPSelect[106]
    BrFAST[108]
    My Rules[94]
    注:“√”表示防御技术覆盖了该项追踪技术.
    下载: 导出CSV
  • [1]

    Eckersley P. How unique is your web browser? [C/OL] //Proc of the 10th Int Symp on Privacy Enhancing Technologies. Berlin: Springer, 2010[2021-03-21]. https://link.springer.com/content/pdf/10.1007/978-3-642-14527-8.pdf

    [2] 张玉清,武倩如,刘奇旭,等. 第三方追踪的安全研究[J]. 通信学报,2014,35(9):1−11 doi: 10.3969/j.issn.1000-436x.2014.09.001

    Zhang Yuqing, Wu Qianru, Liu Qixu, et al. Research on security of third-party tracking[J]. Journal on Communications, 2014, 35(9): 1−11 (in Chinese) doi: 10.3969/j.issn.1000-436x.2014.09.001

    [3]

    Bujlow T, Carela-Español V, Sole-Pareta J, et al. A survey on web tracking: Mechanisms, implications, and defenses[J]. Proceedings of the IEEE, 2017, 105(8): 1476−510

    [4]

    Takasu K, Saito T, Yamada T, et al. A survey of hardware features in modern browsers[C] //Proc of the 9th Int Conf on Innovative Mobile and Internet Services in Ubiquitous Computing. Piscataway, NJ: IEEE, 2015: 520−524

    [5]

    Soltani A, Canty S, Mayo Q, et al. Flash cookies and privacy[C] // Proc of the 2010 AAAI Spring Symp Series. Palo Alto: AAAI, CA: 2010: 22−24

    [6]

    Samyk. EverCookie[CP/OL]. (2017-11-13) [2021-04-20]. https://samy.pl/evercookie/

    [7]

    Acar G, Eubank C, Englehardt S, et al. The web never forgets: Persistent tracking mechanisms in the wild[C] //Proc of the 14th ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2014: 674−689

    [8]

    West W, Pulimood S M. Analysis of privacy and security in HTML5 Web Storage[J]. Journal of Computing Sciences in Colleges, 2012, 27(3): 80−87

    [9]

    Kimak S, Ellman J. The role of HTML5 IndexedDB, the past, present and future[C] //Proc of the 10th Int Conf for Internet Technology and Secured Transactions (ICITST). Piscataway, NJ: IEEE, 2015: 379−383

    [10] 张玉清,贾岩,雷柯楠,等. HTML5新特性安全研究综述[J]. 计算机研究与发展,2016,53(10):2163−2172 doi: 10.7544/issn1000-1239.2016.20160686

    Zhang Yuqing, Jia Yan, Lei Kenan, et al. Survey of HTML5 new features security[J]. Journal of Computer Research and Development, 2016, 53(10): 2163−2172 (in Chinese) doi: 10.7544/issn1000-1239.2016.20160686

    [11]

    Nair K V, RoseLalson E. The unique ID's you can't delete: Browser fingerprints[C/OL] //Proc of the Int Conf on Emerging Trends and Innovations in Engineering and Technological Research (ICETIETR). Piscataway, NJ: IEEE, 2018 [2021-05-04]. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8529040

    [12]

    Laperdrix P, Bielova N, Baudry B, et al. Browser fingerprinting: A survey[J]. ACM Transactions on the Web, 2020, 14(2): 1−33

    [13]

    Felten E W, Schneider M A. Timing attacks on web privacy[C] //Proc of the 7th ACM Conf on Computer and Communications Security. New York: ACM, 2000: 25−32

    [14]

    Weinberg Z, Chen E Y, Jayaraman P R, et al. I still know what you visited last summer: Leaking browsing history via user interaction and side channel attacks[C]//Proc of the 32nd IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2011: 147−161

    [15]

    Wondracek G, Holz T, Kirda E, et al. A practical attack to de-anonymize social network users[C] //Proc of the 31st IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2010: 223−238

    [16]

    Baron L D. Preventing attacks on a user’s history through CSS: Visited selectors[EB/OL]. 2010[2020-05-11]. https://dbaron.org/mo- zilla/visited-privacy

    [17]

    Janc A, Olejnik L. Web browser history detection as a real-world privacy threat[C] //Proc of the 15th European Symp on Research in Computer Security. Berlin: Springer, 2010: 215−231

    [18]

    Olejnik L, Castelluccia C, Janc A. Why johnny can't browse in peace: On the uniqueness of web browsing history patterns[C/OL] //Proc of the 5th Workshop on Hot Topics in Privacy Enhancing Technologies (HotPETs 2012). (2012-11-02)[2021-04-20]. https://hal.inria.fr/hal-00747841/document

    [19]

    Yan Z. Weird new tricks for browser fingerprinting[EB/OL]. 2015[2020-10-12]. https://zyan.scripts.mit.edu/presentations/toorcon2015.pdf

    [20]

    Smith M, Disselkoen C, Narayan S, et al. Browser history re: Visited[C/OL] //Proc of the 12th USENIX Workshop on Offensive Technologies (WOOT 18). 2018[2020-05-11]. https://www.usenix.org/system/files/conference/woot18/woot18-paper-smith.pdf

    [21]

    Huang Anxin, Zhu Chen, Wu Deweb, et al. An adaptive method for cross-platform browser history sniffing[C/OL] //Proc of the 2nd Measurements, Attacks, and Defenses for the Web Workshop. Rosten: The Internet Society, 2020[2021-03-08]. https://www.ndss-symposium.org/wp-content/uploads/2020/02/23006.pdf

    [22]

    Boda K, Földes Á M, Gulyás G G, et al. User tracking on the web via cross-browser fingerprinting[C] //Proc of the 16th Nordic Conf on Secure IT Systems. Berlin: Springer, 2011: 31−46

    [23]

    Fifield D, Egelman S. Fingerprinting web users through font metrics[C] //Proc of the 19th Int Conf on Financial Cryptography and Data Security. Berlin: Springer, 2015: 107−124

    [24]

    Saito T, Takahashi K, Yasuda K, et al. OS and application identification by installed fonts[C] //Proc of the 30th Int Conf on Advanced Information Networking and Applications (AINA). Piscataway NJ: IEEE, 2016: 684−689

    [25]

    Kotowicz K, OSBORNAND K. Advanced chrome extension exploitation leveraging API powers for better evil [EB/OL]. Black Hat USA, 2012[2020-05-11]. https://paper.bobylive.com/Meeting_Papers/BlackHat/USA-2012/BH_US_12_Osborn_Kotowicz_Advanced_Chrome_Extension_WP.pdf

    [26]

    Sjösten A, Van Acker S, Sabelfeld A. Discovering browser extensions via web accessible resources[C] //Proc of the 7th ACM Conf on Data and Application Security and Privacy. New York: ACM, 2017: 329−336

    [27]

    Sjösten A, Van Acker S, Picazo-Sanchez P, et al. Latex gloves: Protecting browser extensions from probing and revelation attacks[EB/OL]. 2018[2020-05-11]. http://singularity.be/public/papers/latexgloves.pdf

    [28]

    Sanchez-Rola I, Santos I, Balzarotti D. Extension breakdown: Security analysis of browsers extension resources control policies[C] //Proc of the 26th USENIX Security Symp. Berkeley, CA: USENIX Association, 2017: 679−694

    [29]

    Starov O, Nikiforakis N. XHOUND: Quantifying the fingerprintability of browser extensions[C] //Proc of the 38th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2017: 941−956

    [30]

    Karami S, Ilia P, Solomos K, et al. Carnus: Exploring the privacy threats of browser extension fingerprinting[C/OL] //Proc of the 27th Network and Distributed System Security Symp (NDSS). Rosten: The Internet Society, 2020[2020-05-11]. https://www.ndss-symposi- um.org/wp-content/uploads/2020/02/24383-paper.pdf

    [31]

    Laperdrix P, Starov O, Chen Quan, et al. Fingerprinting in Style: Detecting browser extensions via injected style sheets[C/OL] //Proc of the 30th USENIX Security Symp. Berkeley, CA: USENIX Association, 2021[2021-05-31]. https://www.usenix.org/system/files/sec21fall-laperdrix.pdf

    [32]

    Nikiforakis N, Kapravelos A, Joosen W, et al. Cookieless monster: Exploring the ecosystem of Web-based device fingerprinting[C] //Proc of the 34th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2013: 541−555

    [33]

    Mulazzani M, Reschl P, Huber M, et al. Fast and reliable browser identification with Javascript engine fingerprinting[C] //Proc of the 7th Web 2.0 Workshop on Security and Privacy (W2SP). 2013: 4−14

    [34]

    Takei N, Saito T, Takasu K, et al. Web browser fingerprinting using only cascading style sheets[C]//Proc of the 10th Int Conf on Broadband and Wireless Computing, Communication and Applications (BWCCA). Piscataway, NJ: IEEE, 2015: 57−63

    [35]

    Schwarz M, Lackner F, Gruss D. JavaScript template attacks: Automatically inferring host information for targeted exploits[EB/OL]. 2019[2021-05-31]. https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_01B4_Schwarz_paper.pdf

    [36]

    Mowery K, Bogenreif D, Yilek S, et al. Fingerprinting information in JavaScript implementations[C/OL] //Proc of the 5th Web 2.0 Workshop on Security and Privacy(W2SP). 2011[2021-04-20]. https://cseweb.ucsd.edu/~kmowery/papers/js-fingerprinting.pdf

    [37]

    Solomos K, Kristoff J, Kanich C, et al. Persistent tracking in modern browsers[C/OL] //Proc of the 28th Symp on Network and Distributed System Security (NDSS). Rosten: The Internet Society, 2021[2021-05-31]. https://www.ndss-symposium.org/wp-content/uploads/ndss2021_1C-5_24202_paper.pdf

    [38]

    Bansal C, Preibusch S, Milic-Frayling N. Cache timing attacks revisited: Efficient and repeatable browser history, OS and network sniffing[C] //Proc of the 30th Int Information Security and Privacy Conf. Berlin: Springer, 2015: 97−111

    [39]

    Solís-Martínez, J, Espada J P, Crespo R G, et al. UXJs: Tracking and analyzing Web usage information with a Javascript oriented approach[J]. IEEE Access, 2020, 8: 43725−43735 doi: 10.1109/ACCESS.2020.2977879

    [40]

    Navalpakkam V, Churchill E. Mouse tracking: Measuring and predicting users' experience of Web-based content[C] //Proc of the SIGCHI Conf on Human Factors in Computing Systems. New York: ACM, 2012: 2963−2972

    [41]

    Mueller F, Lockerd A. Cheese: Tracking mouse movement activity on websites, a tool for user modeling[C/OL] //Proc of the 1st CHI Conf on Human Factors in Computing Systems. 2001[2021-04-20]. https://www.cc.gatech.edu/fac/athomaz/papers/cheese.pdf

    [42]

    Katerina T, Nicolaos P, Charalampos Y. Mouse tracking for Web marketing: Enhancing user experience in Web application software by measuring self-efficacy and hesitation levels[J]. International Journal on Strategic Innovative Marketing, 2014(1): 233−247

    [43]

    Lipp M, Gruss D, Schwarz M, et al. Practical keystroke timing attacks in sandboxed Javascript[C] //Proc of the 22nd European Symp on Research in Computer Security. Berlin: Springer, 2017: 191−209

    [44]

    Mowery K, Shacham H. Pixel perfect: Fingerprinting canvas in HTML5[C/OL] //Proc of the 6th Web 2.0 Workshop on Security and Privacy(W2SP). 2012[2021-04-20]. https://cseweb.ucsd.edu/~kmowery/papers/html5-fingerprint.pdf

    [45]

    Le H, Fallace F, Barlet-Ros P. Towards accurate detection of obfuscated web tracking[C/OL] //Proc of the 5th IEEE Int Workshop on Measurement and Networking (M&N). Piscataway, NJ: IEEE, 2017[2021-04-20]. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8078365

    [46]

    Laperdrix P, Rudametkin W, Baudry B. Beauty and the beast: Diverting modern Web browsers to build unique browser fingerprints[C] //Proc of the 37th IEEE Symp on Security and Privacy. Piscataway, NJ: IEEE, 2016: 878−894

    [47]

    Daud N I, Haron G R, Othman S S S. Adaptive authentication: Implementing random canvas fingerprinting as user attributes factor[C] //Proc of the 4th IEEE Symp on Computer Applications & Industrial Electronics (ISCAIE). Piscataway, NJ: IEEE, 2017: 152−156

    [48]

    Raschke P, Küpper A. Uncovering canvas fingerprinting in real-time and analyzing its Usage for Web-tracking[C/OL]//Proc of the Workshops der INFORMATIK 2018-Architekturen, Prozesse, Sicherheit und Nachhaltigkeit. Köllen Druck+ Verlag GmbH. 2018[2021-04-20]. https://dl.gi.de/bitstream/handle/20.500.12116/17237/3032414_GI_P_285_09.pdf?sequence=1&isAllowed=y

    [49]

    Englehardt S, Narayanan A. Online tracking: A 1-million-site measurement and analysis[C] //Proc of the 16th ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2016: 1388−1401

    [50]

    Englehardt S, Eubank C, Zimmerman P, et al. OpenWPM: An automated platform for Web privacy measurement[J/OL]. Manuscript, 2015[2021-05-31]. https://senglehardt.com/papers/open- wpm_03 − 2015.pdf

    [51]

    Upathilake R, Li Yingkun, Matrawy A. A classification of Web browser fingerprinting techniques[C/OL] //Proc of the 7th Int Conf on New Technologies, Mobility and Security (NTMS). Piscataway, NJ: IEEE, 2015[2021-04-20]. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7266460

    [52]

    Nakibly G, Shelef G, Yudilevich S. Hardware fingerprinting using HTML5[J]. arXiv, preprint arXiv: 1503.01408, 2015

    [53]

    Cao Yinzhi, Song Li, Erik W. (Cross-) browser fingerprinting via OS and hardware level features[C/OL] //Proc of the 24th Symp on Network and Distributed System Security (NDSS). 2017[2021-05-31]. https://www.yinzhicao.org/TrackingFree/cross- browsertracking_NDSS17.pdf

    [54]

    Saito T, Yasuda K, Ishikawa T, et al. Estimating CPU features by browser fingerprinting[C] //Proc of the 10th Int Conf on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS). Piscataway, NJ: IEEE, 2016: 587−592

    [55]

    Saito T, Yasuda K, Tanabe K, et al. Web browser tampering: Inspecting CPU features from side-channel information[C] //Proc of Int Conf on Broadband and Wireless Computing, Communication and Applications. Berlin: Springer, 2017: 392−403

    [56]

    Diaz C, Olejnik L, Acar G, et al. The leaking battery: A privacy analysis of the HTML5 battery status API[G] //LNCS 9481: Int Workshop on Data Privacy Management. Berlin: Springer, 2015: 254−263

    [57]

    Sanchez-Rola I, Santos I, Balzarotti D. Clock around the clock: Time-based device fingerprinting[C] //Proc of the 18th ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2018: 1502−1514

    [58]

    Mishra V, Laperdrix P, Vastel A, et al. Don’t count me out: On the relevance of IP address in the tracking ecosystem[C] //Proc of the Web Conf. New York: ACM, 2020: 808−815

    [59]

    Hosoi R, Saito T, Ishikawa T, et al. A browser scanner: Collecting intranet information[C] //Proc of the 19th Int Conf on Network-Based Information Systems (NBiS). Piscataway, NJ: IEEE, 2016: 140−145

    [60]

    Al-Fannah N M, Li Wanpeng. Not all browsers are created equal: Comparing Web browser fingerprintability[C] //Proc of the Int Workshop on Security. Berlin: Springer, 2017: 105−120

    [61]

    Hazhirpasand M, Ghafari M. One leak is enough to expose them all[C] //Proc of the Int Symp on Engineering Secure Software and Systems. Berlin: Springer, 2018: 61−76

    [62]

    Sy E, Burkert C, Federrath H, et al. Tracking users across the Web via TLS session resumption[C] //Proc of the 34th Annual Computer Security Applications Conf. New York: ACM, 2018: 289−299

    [63]

    Jia Yaoqi, Dong Xinshu, Liang Zhenkai, et al. I know where you've been: Geo-inference attacks via the browser cache[J]. IEEE Internet Computing, 2014, 19(1): 44−53

    [64]

    Klein A, Pinkas B. DNS cache-based user tracking[C/OL] //Proc of the 26th Symp on Network and Distributed System Security (NDSS). 2019[2021-05-31]. https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_04B-4_Klein_paper.pdf

    [65]

    Mirheidari S A, Arshad S, Onarlioglu K, et al. Cached and confused: Web cache deception in the wild[C] //Proc of the 29th USENIX Security Symp. Berkeley, CA: USENIX Association, 2020: 665−682

    [66]

    Solomos K, Ilia P, Ioannidis S, et al. Cross-device tracking: Systematic method to detect and measure CDT[J]. arXiv preprint, arXiv: 1812.11393, 2018

    [67]

    Gómez-Boix A, Laperdrix P, Baudry B. Hiding in the crowd: An analysis of the effectiveness of browser fingerprinting at large scale[C] //Proc of the 2018 World Wide Web Conf. New York: ACM, 2018: 309−318

    [68]

    Brookman J, Rouge P, Alva A, et al. Cross-device tracking: Measurement and disclosures[J]. Proceedings on Privacy Enhancing Technologies, 2017(2): 133−148

    [69]

    Kane S K, Karlson A K, Meyers B R, et al. Exploring cross-device web use on PCs and mobile devices[C] //Proc of the IFIP Conf on Human-Computer Interaction. Berlin: Springer, 2009: 722−735

    [70]

    Karakaya C, Toğuç H, Kuzu R S, et al. Survey of cross device matching approaches with a case study on a novel database[C] //Proc of the 3rd Int Conf on Computer Science and Engineering (UBMK). Piscataway, NJ: IEEE, 2018: 139−144

    [71]

    Yen Tingfang, Xie Yinglian, Yu Fang, et al. Host fingerprinting and tracking on the Web: Privacy and security implications[C/OL] // Proc of the 19th Symp on Network and Distributed System Security(NDSS). Rosten: The Internet Society, 2012[2021-03-20]. https://www.ndss-symposium.org/wp-content/uploads/2017/09/11_3.pdf

    [72]

    Zimmeck S, Li J S, Kim H, et al. A privacy analysis of cross-device tracking[C] //Proc of the 26th USENIX Security Symp. Berkeley, CA: USENIX Association, 2017: 1391−1408

    [73]

    Díaz-Morales R. Cross-device tracking: Matching devices and cookies[C] //Proc of the IEEE Int Conf on Data Mining Workshop (ICDMW). Piscataway, NJ: IEEE, 2015: 1699−1704

    [74]

    Li Song, Cao Yinzhi. Who touched my browser fingerprint? A large-scale measurement study and classification of fingerprint dynamics[C] //Proc of the 20th ACM Internet Measurement Conf. New York: ACM, 2020: 370−385

    [75]

    Yamada T, Saito T, Takasu K, et al. Robust identification of browser fingerprint comparison using edit distance[C] //Proc of the 10th Int Conf on Broadband and Wireless Computing, Communication and Applications (BWCCA). Piscataway, NJ: IEEE, 2015: 107−113

    [76]

    Liu Xiaofeng, Liu Qixu, Wang Xiaoxi, et al. Fingerprinting Web browser for tracing anonymous Web attackers[C] //Proc of the 1st IEEE Int Conf on Data Science in Cyberspace (DSC). Piscataway, NJ: IEEE, 2016: 222−229

    [77]

    Jiang Wei, Wang Xiaoxi, Song Xinfang, et al. Tracking your browser with high-performance browser fingerprint recognition model[J]. China Communications, 2020, 17(3): 168−175 doi: 10.23919/JCC.2020.03.014

    [78]

    Dong Shichuan, Farha F, Cui Shan, et al. CPG-FS: A CPU performance graph based device fingerprint scheme for devices identification and authentication[C] //Proc of the IEEE 38th Int Conf on Dependable, Autonomic and Secure Computing, 17th Int Conf on Pervasive Intelligence and Computing, 5th Int Conf on Cloud and Big Data Computing, 4th Int Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). Piscataway, NJ: IEEE, 2019: 266−270

    [79]

    Vastel A, Laperdrix P, Rudametkin W, et al. Fp-stalker: Tracking browser fingerprint evolutions[C] //Proc of the 39th IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2018: 728−741

    [80] 刘奇旭,刘心宇,罗成,等. 基于双向循环神经网络的安卓浏览器指纹识别方法[J]. 计算机研究与发展,2020,57(11):2294−2311 doi: 10.7544/issn1000-1239.2020.20200459

    Liu Qixu, Liu Xinyu, Luo Cheng, et al. Android browser fingerprinting identification method based on bidirectional recurrent neural network[J]. Journal of Computer Research and Development, 2020, 57(11): 2294−2311 (in Chinese) doi: 10.7544/issn1000-1239.2020.20200459

    [81]

    Tanabe K, Hosoya R, Saito T. Combining features in browser fingerprinting[C] //Proc of the 18th Int Conf on Broadband and Wireless Computing, Communication and Applications. Berlin: Springer, 2018: 671−681

    [82]

    Iqbal U, Englehardt S, Shafiq Z. Fingerprinting the fingerprinters: Learning to detect browser fingerprinting behaviors[C] //Proc of the 42nd IEEE Symp on Security and Privacy (SP). Piscataway, NJ: IEEE, 2021: 1143−1161

    [83]

    Bird S, Mishra V, Englehardt S, et al. Actions speak louder than words: Semi-supervised learning for browser fingerprinting detection[J]. arXiv preprint, arXiv: 2003.04463, 2020

    [84]

    Durey A, Laperdrix P, Rudametkin W, et al. An iterative technique to identify browser fingerprinting scripts[J]. arXiv preprint, arXiv: 2103.00590, 2021

    [85]

    Acar G, Juarez M, Nikiforakis N, et al. FPDetective: Dusting the Web for fingerprinters[C] //Proc of the 13th ACM SIGSAC Conf on Computer & Communications Security. New York: ACM, 2013: 1129−1140

    [86]

    Hannak A, Soeller G, Lazer D, et al. Measuring price discrimination and steering on e-commerce Web sites[C] //Proc of the 14th Conf on Internet Measurement. New York: ACM, 2014: 305−318

    [87]

    Mathur A, Vitak J, Narayanan A, et al. Characterizing the use of browser-based blocking extensions to prevent online tracking[C] //Proc of the 14th Symp on Usable Privacy and Security (SOUPS 2018). Berkeley, CA: USENIX Association, 2018: 103−116

    [88]

    Meng Wei, Lee B, Xing Xinyu, et al. TrackMeOrNot: Enabling flexible control on Web tracking[C] //Proc of the 25th Int Conf on World Wide Web. New York: ACM, 2016: 99−109

    [89]

    Torres C F, Jonker H, Mauw S. FP-Block: Usable Web privacy by controlling browser fingerprinting[C] //Proc of the 20th European Symp on Research in Computer Security. Berlin: Springer, 2015: 3−19

    [90]

    Al-Fannah N M, Li Wanpeng, Mitchell C J. Beyond cookie monster amnesia: Real world persistent online tracking[C] //Proc of the 33rd Conf on Information Security. Berlin: Springer, 2018: 481−501

    [91]

    Starov O, Nikiforakis N. Extended tracking powers: Measuring the privacy diffusion enabled by browser extensions[C] //Proc of the 26th Int Conf on World Wide Web. New York: ACM, 2017: 1481−1490

    [92]

    kkapsner. Canvas Blocker[EB/OL]. [2020-04-05]. https://github.com/k- kapsner/CanvasBlocker

    [93]

    Obida M A, Obeidat S, Holst J, et al. Canvas Deceiver−A new defense mechanism against canvas fingerprinting[J/OL]. The Journal on Systemics, Cybernetics and Informatics. 2020[2021-05-31]. http://www.iiisci.org/journal/PDV/sci/pdfs/SA899XU20.pdf

    [94]

    Leiva L A, Arapakis I, Iordanou C. My mouse, My Rules: Privacy issues of behavioral user profiling via mouse tracking[C] //Proc of the 6th ACM SIGIR Conf on Human Information Interaction and Retrieval. New York: ACM, 2021: 51−61

    [95]

    Xiang Pan, Cao Yinzhi, Yan Chen. I do not know what you visited last summer: Protecting users from third-party Web tracking with trackingFree browser[C/OL] //Proc of the 22nd Annual Network and Distributed System Security Symp (NDSS). Rosten: The Internet Society, 2015[2021-05-31]. https://users.cs.northwestern.edu/~ychen/Papers/trackingfree_NDSS15.pdf

    [96]

    Laperdrix P, Rudametkin W, Baudry B. Mitigating browser fingerprint tracking: Multi-level reconfiguration and diversification[C] //Proc of the 10th Int Symp on Software Engineering for Adaptive and Self-Managing Systems. Piscataway, NJ: IEEE, 2015: 98−108

    [97]

    Baumann P, Katzenbeisser S, Stopczynski M, et al. Disguised chromium browser: Robust browser, flash and canvas fingerprinting protection[C] //Proc of the 15th ACM on Workshop on Privacy in the Electronic Society. New York: ACM, 2016: 37−46

    [98]

    Yu Zhonghao, Macbeth S, Modi K, et al. Tracking the trackers[C] //Proc of the 25th Int Conf on World Wide Web. New York: ACM, 2016: 121−132

    [99]

    Macbeth S. Tracking the trackers: Analysing the global tracking landscape with ghostrank. Technical report, Ghostery[EB/OL]. 2017[2021-04-20]. https://www.medienkraft.at/cms/wp-content/up- loads/2018/10/user-tracking-studie-ghostery.pdf

    [100]

    Starov O, Laperdrix P, Kapravelos A, et al. Unnecessarily identifiable: Quantifying the fingerprintability of browser extensions due to bloat[C] //Proc of the 28th World Wide Web. New York: ACM, 2019: 3244−3250

    [101]

    FaizKhademi A, Zulkernine M, Weldemariam K. FPGuard: Detection and prevention of browser fingerprinting[C] //Proc of the 29th IFIP Annual Conf on Data and Applications Security and Privacy. Berlin: Springer, 2015: 293−308

    [102]

    Nikiforakis N, Joosen W, LiVshits B. PriVaricator: Deceiving fingerprinters with little white lies[C] //Proc of the 24th Int Conf on World Wide Web. New York: ACM, 2015: 820−830

    [103]

    Laperdrix P, Baudry B, Mishra V. FPRandom: Randomizing core browser objects to break advanced device fingerprinting techniques[C] //Proc of the 6th Int Symp on Engineering Secure Software and Systems. Berlin: Springer, 2017: 97−114

    [104]

    Yokoyama S, Uda R. A proposal of preventive measure of pursuit using a browser fingerprint[C/OL] //Proc of the 9th Int Conf on Ubiquitous Information Management and Communication. New York: ACM, 2015[2021-04-20]. https://dl.acm.org/doi/pdf/10.1145/2701126.2701210

    [105]

    Wu Shujiang, Li Song, Cao Yinzhi, et al. Rendered private: Making {GLSL} execution uniform to prevent WebGL-based browser fingerprinting[C] //Proc of the 28th USENIX Security Symp. Berkeley, CA: USENIX Association, 2019: 1645−1660

    [106]

    Jueckstock J, Kapravelos A. VisibleV8: In-browser monitoring of JavaScript in the wild[C] //Proc of the 19th Internet Measurement Conf. New York: ACM, 2019: 393−405

    [107]

    Andriamilanto N, Allard T, Le Guelvouit G. FPSelect: Low-cost browser fingerprints for mitigating dictionary attacks against Web authentication mechanisms[C] //Proc of the Annual Computer Security Applications Conf. New York: ACM, 2020: 627−642

    [108]

    Andriamilanto N, Allard T. BrFAST: A tool to select browser fingerprinting attributes for Web authentication according to a usability-security trade-off[C] //Proc of the Web Conf. New York: ACM, 2021: 701−704

    [109]

    Solomos K, Ilia P, Ioannidis S, et al. Automated measurements of cross-device tracking[C] //Proc of the 13th Int Workshop on Information and Operational Technology Security Systems. Berlin: Springer, 2018: 73−80

    [110]

    Solomos K, Ilia P, Ioannidis S, et al. {TALON}: An automated framework for cross-device tracking detection[C] //Proc of the 22nd Int Symp on Research in Attacks, Intrusions and Defenses (RAID 2019). Berlin: Springer, 2019: 227−241.

    [111]

    Trickel E, Starov O, Kapravelos A, et al. Everyone is different: Client-side diversification for defending against extension fingerprinting[C] //Proc of the 28th USENIX Security Symp. Berkeley, CA: USENIX Association, 2019: 1679−1696

    [112]

    Gómez-Boix A, Frey D, Bromberg Y D, et al. A collaborative strategy for mitigating tracking through browser fingerprinting[C] //Proc of the 6th ACM Workshop on Moving Target Defense. New York: ACM, 2019: 67−78

    [113]

    Overdorf R, Juarez M, Acar G, et al. How unique is your onion? An analysis of the fingerprintability of Tor onion services[C] //Proc of the 17th ACM SIGSAC Conf on Computer and Communications Security. New York: ACM. 2017: 2021−2036

    [114]

    Datta A, Lu Jianan, Tschantz M C. The effectiveness of privacy enhancing technologies against fingerprinting[J]. arXiv preprint, arXiv: 1812.03920, 2018

    [115]

    Lanze F, Panchenko A, Engel T. A formalization of fingerprinting techniques[C] //Proc of the 2015 IEEE 14th Trustcom/1st BigDataSE/13th ISPA. Piscataway, NJ: IEEE, 2015: 818−825

    [116]

    Vastel A. Tracking versus security: Investigating the two facets of browser fingerprinting[D/OL]. 2019[2021-05-20]. https://tel.archives-ouvertes.fr/tel-02343930/do- Cument

    [117]

    Antonio E, Fajardo A, Medina R. Tracking browser fingerprint using rule based algorithm[C] //Proc of the 16th IEEE Int Colloquium on Signal Processing & Its Applications. Piscataway, NJ: IEEE, 2020: 225−229

    [118]

    Vastel A, Rudametkin W, Rouvoy R. FP-TESTER: Automated testing of browser fingerprint resilience[C] //Proc of the 3rd IEEE European Symp on Security and Privacy Workshops. Piscataway, NJ: IEEE, 2018: 103−107

    [119]

    Vastel A, Laperdrix P, Rudametkin W, et al. FP-scanner: The privacy implications of browser fingerprint inconsistencies[C] //Proc of the 27th USENIX Security Symp. Berkeley, CA: USENIX Association, 2018: 135−150

    [120]

    Queiroz J S, Feitosa E L. A Web browser fingerprinting method based on the Web audio API[J]. The Computer Journal, 2019, 62(8): 1106−1120 doi: 10.1093/comjnl/bxy146

    [121]

    Gulyas G, Some D F, Bielova N, et al. To extend or not to extend: On the uniqueness of browser extensions and Web logins[C] //Proc of the 17th Workshop on Privacy in the Electronic Society. New York: ACM, 2018: 14−27

    [122]

    Abouollo A, Almuhammadi S. Detecting malicious user accounts using canvas fingerprint[C] //Proc of the 8th Int Conf on Information and Communication Systems. Piscataway, NJ: IEEE, 2017: 358−361

    [123]

    Unger T, Mulazzani M, Frühwirt D, et al. Shpf: Enhancing http (s) session security with browser fingerprinting[C] //Proc of the 8th Int Conf on Availability, Reliability and Security. Piscataway, NJ: IEEE, 2013: 255−261

    [124]

    Jonker H, Krumnow B, Vlot G. Fingerprint surface-based detection of Web bot detectors[C] //Proc of the 24th European Symp on Research in Computer Security. Berlin: Springer, 2019: 586−605

    [125]

    Vastel A, Rudametkin W, Rouvoy R, et al. FP-Crawlers: Studying the resilience of browser fingerprinting to block crawlers[C/OL] //Proc of the NDSS Workshop on Measurements, Attacks, and Defenses for the Web. 2020[2021-04-20]. https://www.ndss-symposium.org/wp-content/uploads/2020/02/23010.pdf

    [126]

    Agarwal V, Vekaria Y, Agarwal P, et al. Under the spotlight: Web tracking in Indian partisan news websites[J]. arXiv preprint, arXiv: 2102.03656, 2021

    [127]

    Takahashi T, Kruegel C, Vigna G, et al. Tracing and analyzing Web access paths based on user-side data collection: How do users reach malicious URLs?[C] //Proc of the 23rd Int Symp on Research in Attacks, Intrusions and Defenses (RAID 2020). Berlin: Springer, 2020: 93−106

    [128]

    Jia, Zhaopeng, Cui Xiang, Liu Qixu, et al. Micro-honeypot: Using browser fingerprinting to track attackers[C] //Proc of the 3rd IEEE Int Conf on Data Science in Cyberspace (DSC). Piscataway, NJ: IEEE, 2018: 197−204

    [129]

    Li Bo, Vadrevu P, Lee K H, et al. JSgraph: Enabling reconstruction of Web attacks via efficient tracking of live in-browser Javascript executions[C/OL] //Proc of the 25th Network and Discributed System Security Symp (NDSS). 2018 [2020-05-11]. https://web.archive.org/web/20180307204133id_/http://wp.internetsociety.org/ndss/wp-content/uploads/sites/25/2018/02/ndss2018_07B-4_Li_paper.pdf

    [130]

    Oh J, Lee S, Lee S. Advanced evidence collection and analysis of web browser activity[J/OL]. Digital Investigation, 2011[2021-04-20]. https://dl.acm.org/doi/abs/10.1016/j.diin.2011.05.008

    [131]

    Winter P, Edmundson A, Roberts L M, et al. How do Tor users interact with onion services?[C] //Proc of the 27th USENIX Security Symp. Berkeley, CA: USENIX Association, 2018: 411−428

    [132]

    Fiore U, Castiglione A, De Santis A, et al. Countering browser fingerprinting techniques: Constructing a fake profile with Google Chrome[C] //Proc of the 17th Int Conf on Network-Based Information Systems. Piscataway, NJ: IEEE, 2014: 355−360

    [133]

    Luangmaneerote S, Zaluska E, Carr L. Survey of existing fingerprint countermeasures[C] //Proc of the 2016 Int Conf on Information Society. Piscataway, NJ: IEEE, 2016: 137−141

    [134]

    Samarasinghe N, Mannan M. Towards a global perspective on Web tracking[J]. Computers & Security, 2019(87): 101569

    [135]

    Luangmaneerote S, Zaluska E, Carr L. Inhibiting browser fingerprinting and tracking[C] //Proc of the IEEE 3rd Int Conf on Big Data Security on Cloud (BigDataSecurity), IEEE Int Conf on High Performance and Smart Computing (HPSC), and IEEE Int Conf on Intelligent Data and Security (IDS). Piscataway, NJ: IEEE, 2017: 63−68

  • 期刊类型引用(6)

    1. 韩宇捷,徐志杰,杨定裕,黄波,郭健美. CDES:数据驱动的云数据库效能评估方法. 计算机科学. 2024(06): 111-117 . 百度学术
    2. 刘传磊,张贺,杨贺. 地铁保护区智能化巡查系统开发及应用研究. 现代城市轨道交通. 2024(09): 23-30 . 百度学术
    3. 董文,张俊峰,刘俊,张雷. 国产数据库在能源数字化转型中的创新应用研究. 信息通信技术与政策. 2024(10): 68-74 . 百度学术
    4. 阎开. 计算机检测维修与数据恢复技术及应用研究. 信息记录材料. 2023(08): 89-91 . 百度学术
    5. 冯丽琴,冯花平. 基于人脸识别的可控化学习数据库系统设计. 数字通信世界. 2023(10): 69-71 . 百度学术
    6. 张惠芹,章小卫,杜坤,李江. 基于数字孪生的高校实验室高温设备智能化监管体系的探究. 实验室研究与探索. 2023(11): 249-252+282 . 百度学术

    其他类型引用(11)

表(6)
计量
  • 文章访问数:  352
  • HTML全文浏览量:  90
  • PDF下载量:  170
  • 被引次数: 17
出版历程
  • 收稿日期:  2021-06-10
  • 修回日期:  2022-06-22
  • 网络出版日期:  2023-02-26
  • 刊出日期:  2023-04-17

目录

/

返回文章
返回