《电子技术应用》
您所在的位置:首页 > 通信与网络 > 设计应用 > 基于聚类的HTTP/HTTPS协议资产发现
基于聚类的HTTP/HTTPS协议资产发现
电子技术应用
马琰1,2,苏马婧1,2,姚旺君1,2,权晓文3,刘红1,2
1.中国信息安全研究院有限公司;2.华北计算机系统工程研究所;3.远江盛邦(北京)网络安全科技股份有限公司
摘要: 网络探测扫描是发现网络资产的重要方法,在探测结果中HTTP/HTTPS协议占比较高,是重要的互联网资产识别来源。随着网络环境的日益复杂,应用HTTP/HTTPS协议的资产种类和数量也在急剧增加,这使得传统基于指纹规则的网络资产识别方法面临着识别效率低、适应性差等问题,无法满足HTTP/HTTPS协议识别的需要。因此,提出了一种新型HTTP/HTTPS协议资产发现方法,通过自动化规则生成器对HTTP/HTTPS协议响应数据进行处理,并基于词频统计和相似度信息对原始数据进行预过滤,利用文本编码模型实现对HTTP/HTTPS协议响应体信息的文本编码和特征融合,结合无监督聚类算法实现对HTTP/HTTPS协议资产的发现。实验结果表明,所提出的方法能够显著提高HTTP/HTTPS协议资产发现效率,提升资产标注速度,并可在无先验知识下发现未知资产。
中图分类号:TP393.08 文献标志码:A DOI: 10.16157/j.issn.0258-7998.256341
中文引用格式: 马琰,苏马婧,姚旺君,等. 基于聚类的HTTP/HTTPS协议资产发现[J]. 电子技术应用,2025,51(11):98-106.
英文引用格式: Ma Yan,Su Majing,Yao Wangjun,et al. HTTP/HTTPS protocol asset discovery based on clustering[J]. Application of Electronic Technique,2025,51(11):98-106.
HTTP/HTTPS protocol asset discovery based on clustering
Ma Yan1,2,Su Majing1,2,Yao Wangjun1,2,Quan Xiaowen3,Liu Hong1,2
1.China Information Security Research Institute Co., Ltd.;2.National Computer System Engineering Research Institute of China;3.WebRAY Tech (Beijing) Co., Ltd.
Abstract: Network probing and scanning is an essential method for discovering network assets, with HTTP/HTTPS protocols representing a significant proportion of the discovery results and serving as a key source for identifying Internet assets. As the network environment becomes increasingly complex, the variety and volume of assets utilizing the HTTP/HTTPS protocol have grown rapidly, which poses challenges for traditional network asset identification methods based on fingerprinting rules. These conventional approaches suffer from low recognition efficiency and poor adaptability, making them inadequate for identifying HTTP/HTTPS protocol assets. Therefore, this paper proposes a novel method for discovering HTTP/HTTPS protocol assets. The approach processes HTTP/HTTPS response data through an automated rule generator, performs pre-filtering of the raw data based on term frequency statistics and similarity information, and applies a text encoding model to encode the HTTP/HTTPS response body and fuse the features. By integrating an unsupervised clustering algorithm, this method enables the discovery of HTTP/HTTPS protocol assets. Experimental results show that the proposed method significantly improves the efficiency of HTTP/HTTPS protocol asset discovery, accelerates asset labeling, and enables the discovery of unknown assets without prior knowledge.
Key words : network asset discovery;HTTP/HTTPS protocols;automated rule generation;unsupervised clustering;Word2Vec;DBSCAN

引言

在数字化转型的推动下,网络资产的种类和数量呈指数级增长,网络安全面临日益复杂的挑战。网络资产不仅包括传统的网络设备(如网络摄像头、防火墙),还扩展至各种内容管理系统和网络服务。当前,网络资产识别主要依赖基于静态指纹规则匹配的方法,这种方法虽然在已知类型资产的识别中表现良好,但其局限性同样明显:首先,指纹规则构建和维护依赖于专家经验和大量人力资源投入;其次,基于静态指纹库的方法在面对新型设备时响应速度缓慢,导致对未知类型资产的识别率显著降低。这些缺陷限制了当前基于指纹规则匹配的资产识别技术的有效性和适应性。

为解决上述问题,本文创新性地提出了一种针对HTTP/HTTPS协议网络资产的发现方法,通过自动化规则生成器对主动探测所采集到的HTTP/HTTPS协议数据进行指纹规则生成和数据过滤,配合无监督聚类方法实现对网络资产数据按共同特征进行划分,以实现协议的自动发现,此方法可以发现未知资产,提高标注效率。本文提出的自动化规则生成器基于层次化分组策略,逐步对数据集进行细化,提炼具有高区分度的特征字段并构建可以进行粗分类的指纹规则,以过滤掉无共性资产特征的数据。针对HTTP/HTTPS响应头部字段的多样性,本文对大规模探测结果数据集进行了统计分析并结合专家经验,筛选出了21个响应头部字段用于生成自动化过滤规则,设计了自动化规则生成器;在此基础上,对经预过滤后的数据,设计了面向HTTP/HTTPS响应体信息的多特征融合资产聚类算法,该算法采用Word2Vec[1]进行特征编码,将处理后的数据转化为特征向量,结合特征融合技术与DBSCAN[2]聚类技术,在多维特征空间中进行高效聚类以实现对潜在资产的发现。最后,本文通过实验验证了所提方法的有效性。此方法不仅提高了HTTP/HTTPS协议资产发现的效率,还能够有效发现未知资产,进而提高指纹标注和规则提取的效率。


本文详细内容请下载:

https://www.chinaaet.com/resource/share/2000006847


作者信息:

马琰1,2,苏马婧1,2,姚旺君1,2,权晓文3,刘红1,2

(1.中国信息安全研究院有限公司,北京 102200;

2.华北计算机系统工程研究所,北京 100083;

3.远江盛邦(北京)网络安全科技股份有限公司,北京 100084)


subscribe.jpg

此内容为AET网站原创,未经授权禁止转载。