• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

基于开源生态系统的大数据平台研究

雷军, 叶航军, 武泽胜, 张鹏, 谢龙, 何炎祥

雷军, 叶航军, 武泽胜, 张鹏, 谢龙, 何炎祥. 基于开源生态系统的大数据平台研究[J]. 计算机研究与发展, 2017, 54(1): 80-93. DOI: 10.7544/issn1000-1239.2017.20150492
引用本文: 雷军, 叶航军, 武泽胜, 张鹏, 谢龙, 何炎祥. 基于开源生态系统的大数据平台研究[J]. 计算机研究与发展, 2017, 54(1): 80-93. DOI: 10.7544/issn1000-1239.2017.20150492
Lei Jun, Ye Hangjun, Wu Zesheng, Zhang Peng, Xie Long, He Yanxiang. Big-Data Platform Based on Open Source Ecosystem[J]. Journal of Computer Research and Development, 2017, 54(1): 80-93. DOI: 10.7544/issn1000-1239.2017.20150492
Citation: Lei Jun, Ye Hangjun, Wu Zesheng, Zhang Peng, Xie Long, He Yanxiang. Big-Data Platform Based on Open Source Ecosystem[J]. Journal of Computer Research and Development, 2017, 54(1): 80-93. DOI: 10.7544/issn1000-1239.2017.20150492
雷军, 叶航军, 武泽胜, 张鹏, 谢龙, 何炎祥. 基于开源生态系统的大数据平台研究[J]. 计算机研究与发展, 2017, 54(1): 80-93. CSTR: 32373.14.issn1000-1239.2017.20150492
引用本文: 雷军, 叶航军, 武泽胜, 张鹏, 谢龙, 何炎祥. 基于开源生态系统的大数据平台研究[J]. 计算机研究与发展, 2017, 54(1): 80-93. CSTR: 32373.14.issn1000-1239.2017.20150492
Lei Jun, Ye Hangjun, Wu Zesheng, Zhang Peng, Xie Long, He Yanxiang. Big-Data Platform Based on Open Source Ecosystem[J]. Journal of Computer Research and Development, 2017, 54(1): 80-93. CSTR: 32373.14.issn1000-1239.2017.20150492
Citation: Lei Jun, Ye Hangjun, Wu Zesheng, Zhang Peng, Xie Long, He Yanxiang. Big-Data Platform Based on Open Source Ecosystem[J]. Journal of Computer Research and Development, 2017, 54(1): 80-93. CSTR: 32373.14.issn1000-1239.2017.20150492

基于开源生态系统的大数据平台研究

基金项目: 国家自然科学基金项目(91118003,61373039,61170022) This work was supported by the National Natural Science Foundation of China (91118003, 61373039, 61170022).
详细信息
  • 中图分类号: TP391

Big-Data Platform Based on Open Source Ecosystem

  • 摘要: 大规模数据的收集和处理是近年的研究热点,业界已经提出了若干平台级的设计方案,大量使用了开源软件作为数据收集和处理组件.然而,要真正满足企业应用中海量数据存储、多样化业务处理、跨业务分析、跨环境部署等复杂需求,尚需设计具有完整性、通用性、支持整个数据生命周期管理的大数据平台,并且对开源软件进行大量的功能开发、定制和改进.从小米公司的行业应用和实践出发,在深入研究现有平台的基础上,提出了一种新的基于开源生态系统的大数据收集与处理平台,在负载均衡、故障恢复、数据压缩、多维调度等方面进行了大量优化,同时发现并解决了现有开源软件在数据收集、存储、处理以及软件一致性、可用性和效率等方面的缺陷.该平台已经在小米公司成功部署,为小米公司各个业务线的数据收集和处理提供支撑服务.
    Abstract: As large-scale data collecting and processing are being widely studied in recent years, several released big data processing platforms are increasingly playing important roles in the operations of many Internet businesses. Open source ecosystems, the engine of big data innovation, have been evolving so rapidly that a number of them are successfully adopted as the components of mainstream data processing platforms. In reality, however, the open source software is still far from perfect while dealing with real large-scale data. On the basis of the industrial practice at Xiaomi Inc, this paper proposes an improved platform for collecting and processing large-scale data in face of varied business requirements. We focus on the problems in terms of the functionality, consistency and availability of the software when they are executed for data collecting, storing and processing procedures. In addition, we propose a series of optimizations aiming at load balance, failover, data compression and multi-dimensional scheduling to significantly improve the efficiency of the current system. All these designs and optimizations described in this paper have been practically implemented and deployed to support various Internet services provided by Xiaomi Inc.
  • 期刊类型引用(7)

    1. 黄玲,黄镇伟,黄梓源,关灿荣,高月芳,王昌栋. 图卷积宽度跨域推荐系统. 计算机研究与发展. 2024(07): 1713-1729 . 本站查看
    2. 杨玲玲. 基于HM与LWR算法的电子设备MCS推荐优化. 山西电子技术. 2024(04): 22-24 . 百度学术
    3. 郑升旻,付晓东. 利用混合Plackett-Luce模型的不完整序数偏好预测. 计算机应用. 2024(10): 3105-3113 . 百度学术
    4. 杜兆芳. 基于协同排序学习算法的移动群智感知任务推荐. 电子产品世界. 2023(09): 64-66+70 . 百度学术
    5. 朱丽丽. 随机森林算法下列表级排序学习推荐系统设计. 淮阴工学院学报. 2023(05): 62-68 . 百度学术
    6. 曹玉红,赵乙,陈佳桦. 兼容异构数据的稳定评估模型. 小型微型计算机系统. 2021(09): 2011-2016 . 百度学术
    7. 林子楠,刘杜钢,潘微科,明仲. 面向推荐系统中有偏和无偏一元反馈建模的三任务变分自编码器. 信息安全学报. 2021(05): 110-127 . 百度学术

    其他类型引用(4)

计量
  • 文章访问数:  2997
  • HTML全文浏览量:  48
  • PDF下载量:  1182
  • 被引次数: 11
出版历程
  • 发布日期:  2016-12-31

目录

    /

    返回文章
    返回