中图分类号:F49文献标志码:ADOI:10.19358/j.issn.2097-1788.2026.04.002 中文引用格式:涂群,耿贵宁,张茜茜. 数据工厂的构成、建设模式和运营机制研究[J].网络安全与数据治理,2026,45(4):9-16. 英文引用格式:Tu Qun, Geng Guining, Zhang Qianqian. Research on the composition, construction models and operation mechanisms of data factories[J].Cyber Security and Data Governance,2026,45(4):9-16.
Research on the composition, construction models and operation mechanisms of data factories
Tu Qun1, Geng Guining2, Zhang Qianqian3
1. School of Economics and Management, Beijing University of Chemical Technology; 2. 360 Digital Security Technology Group Co., Ltd., ; 3. School of Computer Science and Artificial Intelligence
Abstract: High-quality datasets are the core fuel for training large AI models. Currently, the construction of high-quality datasets is mainly carried out by AI enterprises themselves, which presents the characteristics of fragmentation, workshopstyle operation and non-standardization, making it difficult to meet the rapid development needs of large AI models. Drawing on the development patterns of resource-based infrastructure such as water and power plants, and combining domestic and international best practices in facility-based production, this paper proposes the concept of "data factory",defining it as a production facility specifically designed for the application of large AI models and for the facility-based, largescale construction of highquality datasets. The paper systematically expounds the threelevel architecture system of the data factory, which consists of storage workshop, production workshop, and pilot workshop. Four construction models and four operation mechanisms are proposed, providing theoretical support and practical references for promoting the facilitybased and largescale supply of highquality datasets.
Key words : data factory; high-quality dataset; data infrastructure; data element