中图分类号:TP393文献标志码:ADOI:10.19358/j.issn.2097-1788.2026.05.001 中文引用格式:陈倩怡,苏马婧,陈紫璇,等. 基于大语言模型的HTTP/HTTPS网络资产设备类型识别方法[J].网络安全与数据治理,2026,45(5):1-10. 英文引用格式:Chen Qianyi, Su Majing, Chen Zixuan, et al. Device type identification of HTTP/HTTPS network asset based on large language models[J].Cyber Security and Data Governance,2026,45(5):1-10.
Device type identification of HTTP/HTTPS network asset based on large language models
Chen Qianyi1, Su Majing1,2, Chen Zixuan1,2, Zhang Yongqi1,2, Ma Yan1,2
1.National Computer System Engineering Research Institute of China; 2.China Information Security Research Institute Co., Ltd.
Abstract: To address the limited generalization ability of traditional network asset identification methods based on static fingerprint rules and discriminative models in complex and open environments, this paper proposes an HTTP/HTTPS network asset device type identification method based on instruction fine-tuning of a large language model. A multi-source data collection scheme with multi-platform label aggregation is designed to construct the original network asset dataset. A data preprocessing strategy that prioritizes key feature retention is applied to reduce redundant noise in model inputs. Multiple heterogeneous features, including HTTP/HTTPS response bodies, response headers, SSL certificates, ports, and protocols, are further integrated to construct a unified serialized representation. Based on this representation, the LoRA technique is employed to perform parameter-efficient fine-tuning on the LLaMA.3.8B.Instruct model, enabling the model to learn the semantic associations between network asset characteristics and device types. Experimental results on a test dataset containing 380 000 real-world network assets demonstrate that the proposed method maintains stable performance under highly imbalanced samples and long-tail device scenarios, achieving a Weighted F1.score of 0.959 1, which significantly outperforms the unfine-tuned base model. In addition, the model inference throughput is improved by 62.81%. These results verify the effectiveness and practicality of the proposed method for large-scale automated network asset device identification.
Key words : network asset identification; large language models; instruction tuning; multi-source heterogeneous features