跨社交网络的同一用户识别算法-AET-电子技术应用

跨社交网络的同一用户识别算法

2022年电子技术应用第1期

沈佳琪1，周国民2

1.浙江工业大学信息工程学院，浙江杭州310023；2.浙江警察学院计算机与信息技术系，浙江杭州310053

摘要： 针对跨社交网络的同一用户识别问题，提出了一种综合用户兴趣、写作风格和档案属性的识别方法。通过在这3种不同的特征维度下分别判定用户关系，然后综合判定结果，提高同一用户识别准确性。其中，用户兴趣分为静态兴趣和动态兴趣，静态兴趣采用TextRank算法从用户背景信息中提取，动态兴趣则利用主题模型从用户发表的文本内容中挖掘出随时间变化的兴趣点。对于用户写作风格则通过One-Class SVM算法进行识别，最后利用信息熵赋权法比较用户档案属性相似度。实验结果表明，与传统机器学习算法相比，所提算法精确率、召回率均有所提升。

关键词： 跨社交网络用户识别用户兴趣写作风格档案属性

中图分类号： TN01；TP391
文献标识码： A
DOI：10.16157/j.issn.0258-7998.211518
中文引用格式： 沈佳琪，周国民. 跨社交网络的同一用户识别算法[J].电子技术应用，2022，48(1)：109-114.
英文引用格式： Shen Jiaqi，Zhou Guomin. User alignment across social networks[J]. Application of Electronic Technique，2022，48(1)：109-114.

User alignment across social networks

Shen Jiaqi1，Zhou Guomin2

1.College of Information Engineering，Zhejiang University of Technology，Hangzhou 310023，China； 2.Department of Computer and Information Security，Zhejiang Police College，Hangzhou 310053，China

Abstract： For the problem of identifying the same user across social networks, a recognition method that integrates user interests, writing style and profile attributes is proposed. By determining user relationships under these three different feature dimensions separately, and then synthesizing the results, the same user identification accuracy is improved. Among them, user interest is divided into static interest and dynamic interest, static interest is extracted from user background information by TextRank algorithm, while dynamic interest is mined from user published text content by using topic model to find out interest points that change over time. For user writing style, it is identified by One-Class SVM algorithm, and finally, the information entropy empowerment method is used to compare the similarity of user profile attributes. The experimental results show that the proposed algorithm has improved accuracy and recall rate compared with traditional machine learning algorithms.

Key words : across social networks；users identification；user interest；writing style；file attribute

0 引言

近年来，个人信息数据随社交网络的普及变得越来越丰富，目前对社交网络的用户分析主要针对单一平台，但由于单一平台数据存在局限性^[1]，因此可通过挖掘同一用户在不同社交网络中的多个账号，为社交网络分析提供数据支撑^[2]。

基于用户档案属性的识别方式是研究最广的方法。Zafarani等^[3]通过比较用户选取用户名的行为特征相似度判断是否为同一用户；Zhang等^[4]结合用户名、头像等多个属性，利用朴素贝叶斯进行识别。然而上述研究中的特征容易缺失和伪造^[5]。因此，一些研究从发表的文本内容入手，挖掘用户兴趣，比较兴趣相似度来判定用户关系^[6]。何力等^[7]采用LDA模型来挖掘文本内容中的用户兴趣；吕志泉等^[8]在LDA模型的基础上引入了时间因子。但上述研究仅考虑了文本内容体现的动态兴趣，没有结合静态兴趣，同时，即使是同一用户，在不同社交平台关注和发表的内容也可能有较大差别，这影响了用户识别效果。

本文详细内容请下载：http://www.chinaaet.com/resource/share/2000003919。

作者信息：

沈佳琪1，周国民2

(1.浙江工业大学信息工程学院，浙江杭州310023；2.浙江警察学院计算机与信息技术系，浙江杭州310053)

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容