-
摘要:
作为互联网数据传输的“最后一公里”,终端网络看似简单却构成99%的性能瓶颈. 经典设计面向典型设备常规环境,难以兼顾多样化场景,导致严重性能落差. 通过云端汇聚并深度诊断大规模终端网络异常,在可用、可靠、可信3个关键维度揭示经典设计多处重要缺陷,采用跨层跨代的协同强化方法针对性修复(如时变非齐次4G/5G双连接管理方法最小化断网概率),实现无场景预设的自调控机制设计. 应用于公安部高速网络、1700万“测网速”app用户、七千万小米手机、一亿百度手机卫士用户以及九亿WiFi设备. 近年来进一步开展基于云端模拟器的前瞻网络设计,无需真实用户设备参与即可发现并修复潜在缺陷,让终端网络设计“生于云、长于云”. 研究成果应用于华为DevEco Studio集成开发环境、腾讯应用市场、谷歌安卓模拟器及字节跳动多款流行应用(如抖音和今日头条).
Abstract:As the “last mile” of Internet content delivery, terminal networks seem rather simple but in fact constitute 99% of the performance bottlenecks. Classic design is usually oriented to typical devices and regular environments, thus making it difficult to accommodate and reproduce diversified scenarios and resulting in severe performance degradation. By comprehensively gathering and deeply diagnosing the anomalies of large-scale complex terminal networks at the cloud, we have revealed several important defects of the classic design for terminal networks in three key dimensions—availability, reliability and credibility. In order to fix these defects effectively and efficiently, the cross-layer and cross-technology collaboratively reinforced design methodology is adopted (e.g., the time-inhomogeneous 4G/5G dual connectivity management method is utilized to minimize the probability of network disconnection), so as to fulfill self-regulation mechanism design for ubiquitous scenarios. The research achievements have been applied to the high-speed network of the Ministry of Public Security, 17 million UUSpeedTest App users, 70 million Xiaomi mobile phones, 100 million Baidu PhoneGuard users, and 900 million WiFi devices. In recent years, we have also conducted forward-looking network design based on cloud-hosted emulators to discover and fix potential defects without real-world user engagement, thus making the design of terminal networks “born in the cloud and grow in the cloud”. The research achievements have been applied to Huawei DevEco Studio IDE (Integrated Development Environment), Tencent App Market, Google Android Emulator, and multiple popular Apps (like Douyin and Toutiao) of ByteDance.
-
Keywords:
- terminal network /
- network measurement /
- network design /
- cloud native /
- network emulation
-
终端网络是互联网的重要组成部分,它连接骨干网络和终端网络,对用户体验的影响最为直接. 随着5G/6G、物联网等技术的发展,终端网络的性能需求不断提升,承载着诸如智慧城市和工业互联网等新兴应用,是推动社会数字化转型的重要基础设施,是未来网络演进不可忽视的重要研究对象. 清华大学李振华教授团队通过分析终端网络中存在的用户困惑和技术鸿沟问题,从“可用性、可靠性、可信性”三个关键维度进行研究,提出云原生强化设计的理念,实现终端网络大规模的测量分析与设计优化,并在多个工业系统中取得了良好的应用效果. 文章突出从用户视角出发的设计思想,对提升网络终端的可用性、可靠性与安全性做出了系统性的探索,主要包括以下三个核心点:
1)针对终端网络带给用户的主要困惑,从网速、断连、安全和代际角度全面分析,阐述克服经典设计模式潜在缺陷的研究动力,通过剖析大规模工业终端网络在多样化使用场景下的性能落差问题,总结动机、场景、资源和知识方面的研发鸿沟,为克服现存技术挑战指明解决方向.
2)围绕云原生强化设计的创新模式,综合考量技术和非技术多方面因素,利用服务器无感知基础设施、以微服务形态测量分析大规模终端网络,并针对复杂场景下的异构性能缺陷,跨层跨代协同强化,自适应改进终端网络设计. 最终实现终端网络的整体完善和全面进化,让终端网络服务更加高效、安全和可靠. 这些方法对现实中的网络运营与演进具有重要借鉴意义.
3)实践效果上,该研究团队将理论设计与工业应用相结合,在不同规模和需求的多个工业系统(包括政府运营的专网、大型企业的商业系统以及创业公司的网络应用)中做了调研分析、部署实施和落地改造,有效并高效地解决了其关键问题,提升了服务质量,示范性地推动了大规模复杂终端网络的技术革新.
总体而言,该研究工作系统而全面地分析了终端网络面临的问题,并在理论和实践上进行了有益的探索,形成了一套改善网络性能的方法体系. 这对推动基于云原生的网络技术发展具有较大的参考价值. 后续工作可以在技术普适性和用户感知等方面进行拓展,以建立一个更智能、自主的网络系统,这将对万物互联时代数字社会的进步具有重要意义.
评述专家
罗军舟,教授,博士生导师.主要研究方向为计算机网络.亮点论文
李振华, 王泓懿, 李洋, 林灏, 杨昕磊. 大规模复杂终端网络的云原生强化设计[J]. 计算机研究与发展,2024,61(1):2−19. DOI: 10.7544/issn1000-1239.202330726
以中国大陆为例,存在8个核心IXPs,分别位于北京、上海、广州、南京、沈阳、武汉、成都和西安.https://FastBTS.github.iohttp://FastBTS.thucloud.comhttp://uuspeed.uutest.cnhttps://MobileBandwidth.github.iohttps://SipLoader.github.iohttps://CellularReliability.github.iohttps://dl.acm.org/doi/10.1145/3452296.3472908https://10046.mi.comhttps://mvno-optimization.github.iohttps://shoujiweishi.baidu.comhttps://www.wifi.comhttps://syzs.qq.comhttps://DevEcoStudio.huawei.comhttps://TrinityEmulator.github.iohttps://HoneyCloud.github.iohttps://sj.qq.comhttps://APIChecker.github.iohttps://issuetracker.google.com/issues/262255458 -
-
[1] Sundaresan S, De Donato W, Feamster N, et al. Broadband Internet performance: A view from the gateway[C]//Proc of 2011 ACM Int Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). New York: ACM, 2011: 134−145
[2] Baranasuriya N, Navda V, Padmanabhan V N, et al. QProbe: Locating the bottleneck in cellular communication[C]//Proc of 2015 ACM Int Conf on Emerging Networking Experiments and Technologies (CoNEXT). New York: ACM, 2015: 1−7
[3] Tahir A, Mittal R. Enabling users to control their Internet[C]//Proc of 2023 USENIX Symp on Network System Design and Implementation (NSDI). Berkeley, CA: USENIX Association, 2023: 555−573
[4] Li Zhenhua, Dai Yafei, Chen Guihai, et al. Background and overview[M]// Content Distribution for Mobile Internet: A Cloud-based Approach, Second Edition. Berlin: Springer, 2023: 3−15
[5] Yang Xinlei, Lin Hao, Li Zhenhua, et al. Mobile access bandwidth in practice: Measurement, analysis, and implications[C]//Proc of 2022 ACM Int Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). New York: ACM, 2022: 114−128
[6] 刘云浩,李振华,李洋,等. 代际交错背景下移动蜂窝系统的近因现象与自调控设计:速度、能耗与可靠性[J]. 中国科学:信息科学,2022,52(12):2290−2305 Liu Yunhao, Li Zhenhua, Li Yang, et al. Recency effect and self-regulating design of mobile cellular systems in the context of interlaced generations: Network bandwidth, power efficiency, and connection reliability[J]. SCIENTIA SINICA Informationis, 2022, 52(12): 2290−2305 (in Chinese)
[7] 刘云浩,杨启凡,李振华. 云计算应用服务开发环境:从代码逻辑到数据流图[J]. 中国科学:信息科学,2019,49(9):1119−1137 doi: 10.1360/N112018-00264 Liu Yunhao, Yang Qifan, Li Zhenhua. Cloud applications’ development environment: From code logic to data flow diagram[J]. SCIENTIA SINICA Informationis, 2019, 49(9): 1119−1137 (in Chinese) doi: 10.1360/N112018-00264
[8] Yang Xinlei, Wang Xianlong, Li Zhenhua, et al. Fast and light bandwidth testing for Internet users[C]// Proc of 2021 USENIX Symp on Network System Design and Implementation (NSDI). Berkeley, CA: USENIX Association, 2021: 1011−1026
[9] Li Zhenhua, Li Xingyao, Yang Xinlei, et al. Fast uplink bandwidth testing for Internet users[J]. IEEE/ACM Transactions on Networking, 2023, 31(4): 1886−1901 doi: 10.1109/TNET.2023.3234265
[10] Liu Wei, Yang Xinlei, Lin Hao, et al. Fusing speed index during web page loading[C]//Proc of 2022 Int Conf on Measurement and Modeling of Computer Systems (SIGMETRICS). New York: ACM, 2022, 6(1): 1−23
[11] Li Yang, Lin Hao, Li Zhenhua, et al. A nationwide study on cellular reliability: Measurement, analysis, and enhancements[C]//Proc of 2021 ACM Int Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). New York: ACM, 2021: 597−609
[12] Xiao Ao, Liu Yunhao, Li Yang, et al. An in-depth study of commercial MVNO: Measurement and optimization[C]//Proc of 2019 ACM Int Conf on Mobile Systems, Applications, and Services (MobiSys). New York: ACM, 2019: 457−468
[13] Li Yang, Zheng Jianwei, Li Zhenhua, et al. Understanding the ecosystem and addressing the fundamental concerns of commercial MVNO[J]. IEEE/ACM Transactions on Networking, 2020, 28(3): 1364−1377 doi: 10.1109/TNET.2020.2981514
[14] 李洋,李振华,辛显龙. 基于攻击经济学的移动虚拟运营商诈骗检测[J]. 计算机科学,2023,50(8):260−270 doi: 10.11896/jsjkx.221000103 Li Yang, Li Zhenhua, Xin Xianlong. Attack economics based fraud detection for MVNO[J]. Computer Science, 2023, 50(8): 260−270 (in Chinese) doi: 10.11896/jsjkx.221000103
[15] Li Zhenhua, Wang Weiwei, Wilson C, et al. FBS-Radar: Uncovering fake base stations at scale in the wild[C]//Proc of 2017 ISOC Network and Distributed System Security Symp (NDSS). Rosten, VA: ISOC, 2017
[16] Gao Di, Lin Hao, Li Zhenhua, et al. A nationwide census on WiFi security threats: Prevalence, riskiness, and the economics[C]//Proc of 2021 ACM Int Conf on Mobile Computing and Networking (MobiCom). New York: ACM, 2021: 242−255
[17] Udit P, Liu Jiamo, Gu Mengyang, et al. The importance of contextualization of crowdsourced active speed test measurements[C]//Proc of 2022 ACM Internet Measurement Conf (IMC). New York: ACM, 2022: 274−289
[18] Hu Ningning, Li Li, Mao Zhuoqing, et al. Locating Internet bottlenecks: Algorithms, measurements, and implications[C]//Proc of 2004 ACM Int Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). New York: ACM, 2004: 41−54
[19] Feamster N, Livingood J. Measuring Internet speed: Current challenges and future recommendations[J]. Communications of the ACM, 2020, 63(12): 72−80 doi: 10.1145/3372135
[20] Marcel D, Haeberlen A, Gummadi K P, et al. Characterizing residential broadband networks[C]//Proc of 2007 ACM Int Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). New York: ACM, 2007: 43−56
[21] Deng Haotian, Peng Chunyi, Fida A, et al. Mobility support in cellular networks: A measurement study on its configurations and implications[C]//Proc of 2018 ACM Internet Measurement Conf (IMC). New York: ACM, 2018: 147−160
[22] Sommers J, Barford P. Cell vs. WiFi: On the performance of metro area mobile connections[C]//Proc of 2012 ACM Internet Measurement Conf (IMC). New York: ACM, 2012: 301−314
[23] Liu Jiajia, Kawamoto Y, Nishiyama H, et al. Device-to-device communications achieve efficient load balancing in LTE-advanced networks[J]. IEEE Wireless Communications, 2014, 21(2): 57−65 doi: 10.1109/MWC.2014.6812292
[24] Mardani S, Singh M, Netravali R. Fawkes: Faster mobile page loads via app-inspired static templating[C]//Proc of 2020 USENIX Symp on Network System Design and Implementation (NSDI). Berkeley, CA: USENIX Association, 2020: 879−894
[25] Netravali R, Mickens J. Prophecy: Accelerating mobile page loads using final-state write logs[C]//Proc of 2018 USENIX Symp on Network System Design and Implementation (NSDI). Berkeley, CA: USENIX Association, 2018: 249−266
[26] Ruamviboonsuk V, Netravali R, Uluyol M, et al. Vroom: Accelerating the mobile web with server-aided dependency resolution[C]//Proc of 2017 ACM Int Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). New York: ACM, 2017: 390−403
[27] Mardani S, Goel A, Ko R, et al. Horcrux: Automatic JavaScript parallelism for resource-efficient web computation[C]//Proc of 2021 USENIX Symp on Operating Systems Design and Implementations (OSDI). Berkeley, CA: USENIX Association, 2021: 461−477
[28] Netravali R, Goyal A, Mickens J, et al. Polaris: Faster page loads using fine-grained dependency tracking[C]//Proc of 2016 USENIX Symp on Network System Design and Implementation (NSDI). Berkeley, CA: USENIX Association, 2016: 123−136
[29] Sivakumar A, Jiang Chuan, Nam Y S, et al. NutShell: Scalable whittled proxy execution for low-latency web over cellular networks[C]//Proc of 2017 ACM Int Conf on Mobile Computing and Networking (MobiCom). New York: ACM, 2017: 448−461
[30] Meng Zili, Guo Yaning, Sun Chen, et al. Achieving consistent low latency for wireless real-time communications with the shortest control loop[C]//Proc 2022 of ACM Int Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). New York: ACM, 2022: 193−206
[31] Luo Zhihong, Fu S, Theis M, et al. Democratizing cellular access with CellBricks[C]//Proc of 2021 ACM Int Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). New York: ACM, 2021: 626−640
[32] Xu Dongzhu, Zhou Anfu, Zhang Xinyu, et al. Understanding operational 5G: A first measurement study on its coverage, performance and energy consumption[C]//Proc of 2020 ACM Int Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). New York: ACM, 2020: 479−494
[33] Narayanan A, Zhang Xumiao, Zhu Ruiyang, et al. A variegated look at 5G in the wild: Performance, power, and QoE implications[C]//Proc of 2021 ACM Int Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). New York: ACM, 2021: 610−625
[34] Ahmad M, Jafri S U, Ikram A, et al. A low latency and consistent cellular control plane[C]//Proc of 2020 ACM Int Conf on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM). New York: ACM, 2020: 648−661
[35] Polese M, Giordani M, Mezzavilla M, et al. Improved handover through dual connectivity in 5G mm wave mobile networks[J]. IEEE Journal on Selected Areas in Communications, 2017, 35(9): 2069−2084 doi: 10.1109/JSAC.2017.2720338
[36] Zarinni F, Chakraborty A, Sekar V, et al. A first look at performance in mobile virtual network operators[C]//Proc of 2014 ACM Internet Measurement Conf (IMC). New York: ACM, 2014: 165−172
[37] Peng Chunyi, Tu Guanhua, Li Chiyu, et al. Can we pay for what we get in 3G data access?[C]//Proc of 2012 ACM Int Conf on Mobile Computing and Networking (MobiCom). New York: ACM, 2012: 113−124
[38] Zhao Jinghao, Ding Boyan, Guo Yunqi, et al. SecureSIM: Rethinking authentication and access control for SIM/eSIM[C]//Proc of 2021 ACM Int Conf on Mobile Computing and Networking (MobiCom). New York: ACM, 2021: 451−464
[39] Kubat M, Matwin S. Addressing the curse of imbalanced training sets: One-sided selection[C]//Proc of 1997 Int Conf on Machine Learning (ICML). New York: ACM, 1997, 97(1): 179
[40] Zhuang Zhou, Ji Xiaoyu, Zhang Taimin, et al. FBSleuth: Fake base station forensics via radio frequency fingerprinting[C]//Proc of 2018 Asia Conf on Computer and Communications Security (AsiaCCS). New York: ACM, 2018: 261−272
[41] Bellard F. QEMU, a fast and portable dynamic translator[C]//Proc of 2005 USENIX Annul Technical Conf (ATC). Berkeley, CA: USENIX Association, 2005: 46−52
[42] Kedia P, Bansal S. Fast dynamic binary translation for the kernel[C]//Proc of 2013 ACM Symp on Operating Systems Principles (SOSP). New York: ACM, 2013: 101−115
[43] Barham P, Dragovic B, Fraser K, et al. Xen and the art of virtualization[C]//Proc of 2003 ACM Symp on Operating Systems Principles (SOSP). New York: ACM, 2003: 164−177
[44] Russell R. virtio: Towards a de-facto standard for virtual I/O devices[J]. ACM SIGOPS Operating Systems Review, 2008, 42(5): 95−103 doi: 10.1145/1400097.1400108
[45] Kivity A, Kamay Y, Laor D, et al. KVM: The Linux virtual machine monitor[C/OL]//Proc of the 2007 Linux Symp, 2007: 225−230. [2023-12-13]. https://www.kernel.org/doc/ols/2007/ols2007v1-pages-225-230.pdf
[46] Christoffer D, Nieh J. KVM/ARM: The design and implementation of the Linux ARM hypervisor[C]//Proc of 2014 Int Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS). New York: ACM, 2014: 333−348
[47] Porter D E, Boyd-Wickizer S, Howell J, et al. Rethinking the library OS from the top down[C]//Proc of 2011 Int Conf on Architectural Support for Programming Languages and Operating Systems (ASPLOS). New York: ACM, 2011: 291−304
[48] Howell J, Parno B, Douceur J R. How to run POSIX apps in a minimal picoprocess[C]//Proc of 2013 USENIX Annul Technical Conf (ATC). Berkeley, CA: USENIX Association, 2013: 321−332
[49] Yang Qifan, Li Zhenhua, Liu Yunhao, et al. Mobile gaming on personal computers with direct Android emulation[C]//Proc of 2019 ACM Int Conf on Mobile Computing and Networking (MobiCom). New York: ACM, 2019: 1−15
[50] Gao Di, Lin Hao, Li Zhenhua, et al. Trinity: High-performance mobile emulation through graphics projection[C]//Proc of 2022 USENIX Symp on Operating Systems Design and Implementations (OSDI). Berkeley, CA: USENIX Association, 2022: 285−301
[51] Dang Fan, Li Zhenhua, Liu Yunhao, et al. Understanding fileless attacks on Linux-based IoT devices with HoneyCloud[C]//Proc of 2019 ACM Int Conf on Mobile Systems, Applications, and Services (MobiSys). New York: ACM, 2019: 482−493
[52] Yan Yuxuan, Li Zhenhua, Chen Qi, et al. Understanding and detecting overlay-based Android malware at market scales[C]//Proc of 2019 ACM Int Conf on Mobile Systems, Applications, and Services (MobiSys). New York: ACM, 2019: 168−179
[53] Gong Liangyi, Li Zhenhua, Qian Feng, et al. Experiences of landing machine learning onto market-scale mobile malware detection[C]//Proc of 2020 European Conf on Computer Systems (EuroSys). New York: ACM, 2020: 1−14
[54] Gong Liangyi, Lin Hao, Li Zhenhua, et al. Systematically landing machine learning onto market-scale mobile malware detection[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(7): 1615−1628
[55] Gong Liangyi, Li Zhenhua, Wang Hongyi, et al. Overlay-based Android malware detection at market scales: Systematically adapting to the new technological landscape[J]. IEEE Transactions on Mobile Computing, 2022, 21(12): 4488−4501 doi: 10.1109/TMC.2021.3079433
[56] Lin Hao, Qiu Jiaxing, Wang Hongyi, et al. Virtual device farms for mobile app testing at scale: A pursuit for fidelity, efficiency, and accessibility[C]//Proc of 2023 ACM Int Conf on Mobile Computing and Networking (MobiCom). New York: ACM, 2023
[57] 张炳,文峥,魏筱瑜,等. InterDroid:面向概念漂移的可解释性Android恶意软件检测方法[J]. 计算机研究与发展,2021,58(11):2456−2474 doi: 10.7544/issn1000-1239.2021.20210560 Zhang Bing, Wen Zheng, Wei Xiaoyu, et al. InterDroid: An interpretable Android malware detection method for conceptual drift[J]. Journal of Computer Research and Development, 2021, 58(11): 2456−2474 (in Chinese) doi: 10.7544/issn1000-1239.2021.20210560
[58] Zhou Chao, Li Zhenhua, Liu Yao, et al. A measurement study of Oculus 360 degree video streaming[C]//Proc of 2017 ACM Multimedia Systems Conf (MMSys). New York: ACM, 2017: 27−37
[59] Xiao Mengbai, Wang Shuoqian, Zhou Chao, et al. Miniview layout for bandwidth-efficient 360-degree video[C]//Proc of 2018 ACM Int Conf on Multimedia (MM). New York: ACM, 2018: 914−922
-
期刊类型引用(1)
1. 王星宇. 浅析新时代背景下计算机科学技术发展的新方向. 数字通信世界. 2024(03): 164-166 . 百度学术
其他类型引用(0)