基于约束的多维Apriori改进算法-AET-电子技术应用

基于约束的多维Apriori改进算法

电子技术应用

王志昊，苏明月，李东方，沈炜，杨光

（北京计算机技术及应用研究所，北京 100854）

摘要： 针对经典多维关联规则挖掘算法执行效率不高、存在冗余规则的不足，提出基于约束的多维Apriori改进算法，在多维Apriori算法的基础上，将用户约束引入挖掘过程，根据关于谓词的约束产生用户感兴趣的频繁谓词集，并以此为依据删减事务集。该算法一方面通过用户约束大大缩减了候选谓词集的产生，另一方面经过删减的事务集也降低了扫描数据库的开销，最终实现了挖掘效率的提高以及冗余规则的减少。应用该算法在FPGA代码缺陷事务集上进行对比实验，实验结果证明了该算法相比多维Apriori算法，在搜索效率以及挖掘结果的准确性方面均得到了改善，有效提高了FPGA代码缺陷分析的准确性。

关键词： 关联规则挖掘多维关联规则 Apriori算法频繁谓词集谓词约束

中图分类号：TP311 文献标志码：A DOI: 10.16157/j.issn.0258-7998.233873
中文引用格式： 王志昊，苏明月，李东方，等. 基于约束的多维Apriori改进算法[J]. 电子技术应用，2023，49(10)：100-105.
英文引用格式： Wang Zhihao，Su Mingyue，Li Dongfang，et al. Algorithm of multi-dimensional Apriori with constraints[J]. Application of Electronic Technique，2023，49(10)：100-105.

Algorithm of multi-dimensional Apriori with constraints

Wang Zhihao，Su Mingyue，Li Dongfang，Shen Wei，Yang Guang

(Institute 706， Second Academy of China Aerospace Science and Industry Corporation， Beijing 100854， China)

Abstract： Aiming at the inefficiency of multi-dimensional association rules mining algorithm and the existence of redundant rules, an algorithm of multi-Dimensional apriori with constraints is proposed. Based on the multi-dimensional Apriori algorithm, the algorithm controls the mining process with user constraints. According to the predicate constraint, the frequent predicate set that is of interest to the user is generated, and the transaction set is deleted based on the predicate constraint. On the one hand, the algorithm greatly reduces the generation of candidate predicate sets through user constraints. On the other hand, the reduced transaction set also reduces the scanning database overhead. Finally, the efficiency of mining is improved and the redundant rules are reduced. This algorithm is used to compare experiments on FPGA code defect transaction sets. The experimental results show that compared with the multi-dimensional Apriori algorithm, this algorithm has improved the search efficiency of frequent predicate sets and the accuracy of mining results.

Key words : association rules mining；multi-dimensional association rule；Apriori；frequent predicate set；predicate constraint；data mining

0　引言

现代社会，生产力快速发展，通过不断变革生产信息技术，人们大大提高了创造和收集数据的能力，迅速扩大了数据资料的规模。急剧增长的数据资料和数据库迫使人们采用新的技术手段和工具来处理海量的数据，自动自主地帮助人们管理、提取并分析有用的信息，来发掘有价值的知识，为人们提供决策服务。由此，数据挖掘（Data Mining）[1] 在这样的宏观背景下诞生。将数据挖掘技术充分运用到现实的生产中，提高企业生产的效率，降低生产成本。数据挖掘的应用范围较广，如聚类、预测、分类、异常分析以及相互关联性分析。

数据挖掘中，关联规则是较为主要的研究对象。其中频繁项集的产生是最核心、最受关注的问题。关联规则反映了一个事物与其他事物之间的相互依存和关联性[2]。换句话说，关联规则是一种隐含在数据中的知识模型，其通过量化数字，从海量数据中挖掘出有价值的数据项之间的相关关系[3]。

关联规则挖掘最初由Agrawal[4]等人于1993年提出，通过关联规则的挖掘可以找出潜藏在数据库中各个属性之间的关系，辅助人们更合理地进行商业活动、金融决策和生产生活等。

目前，典型的挖掘关联规则的算法主要是Apriori算法[5]，其核心在于找到数据库中的所有频繁项集。Apriori算法通过逐级产生频繁项集并利用先验性质缩减候选项集产生。在扫描数据集的过程中，Hossain提出可使用自动递归连接来挖掘候选项目集[6]，然后剪枝用于挖掘频繁项集。2021年，Li等人提出基于时序约束的关联规则挖掘，减小了系统开销[7]。Wang等人利用MapReduce的思想改进Apriori算法，有效提高了搜索效率[8]。2022年，Dhinakaran等人集成Apriori算法和仿生算法，通过降低处理大型数据集时的低运行时性能来解决频繁项集问题[9]。

本文详细内容请下载：https://www.chinaaet.com/resource/share/2000005721

作者信息：

王志昊，苏明月，李东方，沈炜，杨光

（北京计算机技术及应用研究所，北京 100854）

微信图片_20210517164139.jpg

原创声明：此内容为AET网站原创，未经授权禁止转载。

相关内容