A Plan Reuse Mechanism for LLM-Driven Agent

Li Guopeng; Wu Ruiqi; Tan Haisheng; Chen Guoliang

doi:10.7544/issn1000-1239.202440380

Journal of Computer Research and Development > 2024 > 61(11): 2706-2720. > DOI: 10.7544/issn1000-1239.202440380 CSTR: 32373.14.issn1000-1239.202440380

Li Guopeng, Wu Ruiqi, Tan Haisheng, Chen Guoliang. A Plan Reuse Mechanism for LLM-Driven Agent[J]. Journal of Computer Research and Development, 2024, 61(11): 2706-2720. DOI: 10.7544/issn1000-1239.202440380

Citation:

PDF (1588 KB)

A Plan Reuse Mechanism for LLM-Driven Agent

School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027

Funds: This work was supported by the 2030 National Key AI Program of China (2021ZD0110400), the Key Program of the National Natural Science Foundation of China (62132009), and the Fundamental Research Funds for the Central Universities.

More Information

Author Bio:
Li Guopeng: born in 1997. PhD candidate. His main research interests include edge intelligence, large language model-driven agent, and machine learning system

Wu Ruiqi: born in 2003. Master candidate. His main research interests include edge computing, large language model, and machine learning system

Tan Haisheng: born in 1981. PhD, professor. Member of CCF. His main research interests include edge intelligence, and system and networking for AI

Chen Guoliang: born in 1938. Professor. Fellow of CCF. His main research interests include parallel algorithms, computer architectures, and computational intelligence
Received Date: May 30, 2024
Revised Date: July 10, 2024
Available Online: July 16, 2024

Graphical Abstract

Abstract

Abstract

Integrating large language models (LLMs) into personal assistants, like Xiao Ai and Blue Heart V, effectively enhances their ability to interact with humans, solve complex tasks, and manage IoT devices. Such assistants are also termed LLM-driven agents. Upon receiving user requests, the LLM-driven agent generates plans using an LLM, executes these plans through various tools, and then returns the response to the user. During this process, the latency for generating a plan with an LLM can reach tens of seconds, significantly degrading user experience. Real-world dataset analysis shows that about 30% of the requests received by LLM-driven agents are identical or similar, which allows the reuse of previously generated plans to reduce latency. However, it is difficult to accurately define the similarity between the request texts received by the LLM-driven agent through directly evaluating the original request texts. Moreover, the diverse expressions of natural language and the unstructured format of plan texts make implementing plan reuse challenging. To address these issues, we present and implement a plan reuse mechanism for LLM-driven agents called AgentReuse. AgentReuse leverages the similarities and differences among requests’ semantics and uses intent classification to evaluate the similarities between requests and enable the reuse of plans. Experimental results based on a real-world dataset demonstrate that AgentReuse achieves a 93% effective plan reuse rate, an F1 score of 0.9718, and an accuracy of 0.9459 in evaluating request similarities, reducing latency by 93.12% compared with baselines without using the reuse mechanism.
- artificial intelligence of things,
- large language models (LLMs),
- agent,
- semantic cache,
- similarity evaluation

FullText(HTML)

References (64)

References

[1]	光明日报. 抢抓机遇,加快发展新质生产力[EB/OL]. [2024-03-14]. https://news.gmw.cn/2024-03/14/content_37202598.htm Guangming Daily. Seize the opportunity, accelerate the development of new quality productive forces [EB/OL]. [2024-03-14]. https://news.gmw.cn/2024-03/14/content_37202598.htm (in Chinese)
[2]	刘云浩. 物联网导论[M]. 北京:科学出版社,2017 Liu Yunhao. Introduction to Internet of Things[M]. Beijing: Sciences Press, 2017 (in Chinese)
[3]	郭斌,刘思聪,刘琰,等. 智能物联网:概念、体系架构与关键技术[J]. 计算机学报,2023,46(11):2259−2278 doi: 10.11897/SP.J.1016.2023.02259 Guo Bin, Liu Sicong, Liu Yan, et al. AIoT: The concept, architecture and key techniques[J]. Chinese Journal of Computers, 2023, 46(11): 2259−2278 (in Chinese) doi: 10.11897/SP.J.1016.2023.02259
[4]	小米集团. 小爱同学[EB/OL]. [2024-03-15]. https://xiaoai.mi.com/ Xiaomi Corporation. Xiaoai tongxue [EB/OL]. [2024-03-15]. https://xiaoai.mi.com/(in Chinese)
[5]	华为终端有限公司. 智慧唤醒语音[EB/OL]. [2024-03-15]. https://consumer.huawei.com/cn/emui-11/tips/smart-home-list/article5/ Huawei Device Co., Ltd. Wake up to intelligent voice [EB/OL]. [2024-03-15]. https://consumer.huawei.com/cn/emui-11/tips/smart-home-list/article5/ (in Chinese)
[6]	李戈,彭鑫,王千祥,等. 大模型:基于自然交互的人机协同软件开发与演化工具带来的挑战[J]. 软件学报,2023,34(10):4601−4606 Li Ge, Peng Xin, Wang Qianxiang, et al. Challenges from LLMs as a natural language based human-machine collaborative tool for software development and evolution[J]. Journal of Software, 2023, 34(10): 4601−4606 (in Chinese)
[7]	Dong Luna Xin, Moon Seungwhan, Xu Ethan Yifan, et al. Towards next-generation intelligent assistants leveraging LLM techniques[C]//Proc of the 29th ACM SIGKDD Conf on Knowledge Discovery and Data Mining. New York: ACM, 2023: 5792−5793
[8]	Li Yuanchun, Wen Hao, Wang Weijun, et al. Personal LLM agents: Insights and survey about the capability, efficiency and security[J]. arXiv preprint, arXiv: 2401.05459, 2024
[9]	Weng Lilian. LLM powered autonomous agents [EB/OL]. [2024-3-30]. https://lilianweng.github.io/posts/2023-06-23-agent
[10]	Wang Lei, Ma Chen, Feng Xueyang, et al. A survey on large language model based autonomous agents[J]. Frontiers of Computer Science, 2024, 18(6): 1−26
[11]	Xi Zhiheng, Chen Wenxiang, Guo Xin, et al. The rise and potential of large language model based agents: A survey[J]. arXiv preprint, arXiv: 2309.07864, 2023
[12]	Wu Qingyun, Bansal Gagan, Zhang jieyu, et al. AutoGen: Enabling next-gen LLM applications via multi-agent conversation[J]. arXiv preprint, arXiv: 2308.08155, 2023
[13]	Open-Assistant. Open assistant conversations dataset release 2 [EB/OL]. [2024-04-20]. https://huggingface.co/datasets/OpenAssistant/oasst2
[14]	Gill W, Elidrisi M, Kalapatapu P, et al. Privacy-aware semantic cache for large language models[J]. arXiv preprint, arXiv: 2403.02694, 2024
[15]	Zhu Banghua, Sheng Ying, Zheng Lianmin, et al. On optimal caching and model multiplexing for large model inference[C]//Advances in Neural Information Processing Systems. Cambridge, MA: MIT, 2023: 59062−59094
[16]	Fu Bang, Feng Di. GPTCache: An open-source semantic cache for LLM applications enabling faster answers and cost savings[C]//Proc of the 3rd Workshop for Natural Language Processing Open Source Software. Stroudsburg, PA: ACL, 2023: 212−218
[17]	Zhao Wayne Xin, Zhou Kun, Li Junyi, et al. A survey of large language models[J]. arXiv preprint, arXiv: 2303.18223, 2023
[18]	Lin Chaofan, Han Zhenhua, Zhang Chengruidong, et al. Parrot: Efficient serving of LLM-based applications with semantic variable[C]//Proc of the 18th SENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2024: 929−945
[19]	Mindstream. AutoGPT [EB/OL]. [2024-04-30]. https://autogpt.net/
[20]	Shen Yongliang, Song Kaitao, Tan Xu, et al. Hugging GPT: Solving AI tasks with chatgpt and its friends in hugging face[C]//Advances in Neural Information Processing Systems. Cambridge, MA: MIT, 2023: 38154−38180
[21]	Hong Sirui, Zhuge Mingchen, Chen Jonathan, et al. MetaGPT: Meta programming for a multi-agent collaborative framework[J]. arXiv preprint, arXiv: 2308.00352, 2023
[22]	Bram M A, Cox S, Schilter O, et al. Augmenting large language models with chemistry tools[J]. Nature Machine Intelligence, 2024(6): 525−535
[23]	高云帆,郁董卿,王思琪,等. 大语言模型驱动的选址推荐系统[J]. 计算机研究与发展,2024,61(7):1681-1696 Gao Yunfan, Yu Dongqing, Wang Siqi, et al. Large language model powered site selection recommender system[J]. Journal of Computer Research and Development, 2024, 61(7): 1681-1696(in Chinese)
[24]	Baek J, Jauhar K S, Cucerzan S, et al. ResearchAgent: Iterative research idea generation over scientific literature with large language models[J]. arXiv preprint, arXiv: 2404.07738, 2024
[25]	VIVO. BlueLM-7B-Chat [EB/OL]. [2024-05-01]. https://huggingface.co/vivo-ai/BlueLM-7B-Chat
[26]	Google. Google pixel 8 pro [EB/OL]. [2024-05-21]. https://store.google.com/product/pixel_8_pro
[27]	Hsiao S. Assistant with bard: A step toward a more personal assistant [EB/OL]. [2024-05-21]. https://blog.google/products/assistant/google-assistant-bard-generative-ai/
[28]	Microsoft. Microsoft copilot [EB/OL]. [2024-05-01]. https://copilot.microsoft.com/
[29]	Mehdi Y. Introducing Copilot+ PCs [EB/OL]. [2024-05-25]. https://blogs.microsoft.com/blog/2024/05/20/introducing-copilot-pcs/
[30]	Zhang Chi, Yang Zhao, Liu Jiaxuan, et al. AppAgent: Multimodal agents as smartphone users[J]. arXiv preprint, arXiv: 2312.13771, 2023
[31]	Wen Hao, Li Yuanchun, Liu Guohong, et al. AutoDroid: LLM-powered task automation in Android[C]//Proc of the 30th Annual Int Conf on Mobile Computing and Networking. New York: ACM, 2024: 543−557
[32]	王恩东,唐士斌,陈继承,等. 多核处理器目录缓存结构设计[J]. 计算机研究与发展,2015,52(6):1242−1253 doi: 10.7544/issn1000-1239.2015.20150140 Wang Endong, Tang Shibin, Chen Jicheng, et al. Directory cache design for multi-core processor[J]. Journal of Computer Research and Development, 2015, 52(6): 1242−1253 (in Chinese) doi: 10.7544/issn1000-1239.2015.20150140
[33]	Sedaghati A, Hakimi M, Hojabr R, et al. X-cache: A modular architecture for domain-specific caches[C]//Proc of the 49th Annual Int Symp on Computer Architecture. New York: ACM, 2022: 396–409
[34]	Bhatla A, Navneet, Panda B. The Maya cache: A storage-efficient and secure fully-associative last-level cache[C]/Proc of the 51st Int Symp on Computer Architecture. New York: ACM, 2024: 32−44
[35]	Wong L D, Wu Hao, Molder C, et al. Baleen: ML admission & prefetching for flash caches[C]//Proc of the 22nd USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2024: 347−371
[36]	Liu Yubo, Ren Yuxin, Liu Mingrui, et al. Optimizing file systems on heterogeneous memory by integrating dram cache with virtual memory management[C]//Proc of the 22nd USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2024: 71−87
[37]	McAllister S, Berg B, Tutuncu-Macias J, et al. Kangaroo: Caching billions of tiny objects on flash[C]//Proc of the 28th Symp on Operating Systems Principles. New York: ACM, 2023: 243−262
[38]	Chen Jiayi, Sharma N, Khan T, et al. Darwin: Flexible learning-based CDN caching[C]//Proc of the 37th ACM Special Interest Group on Data Communication. New York: ACM, 2024: 981−999
[39]	Yang Juncheng, Zhang Yazhuo, Qiu Ziyue, et al. FIFO queues are all you need for cache eviction [C]//Proc of the 29th Symp on Operating Systems Principles. New York: ACM, 2023: 130−149
[40]	Yan Gang, Li Jian. Towards latency awareness for content delivery network caching[C]//Proc of the 2022 USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2022: 789−804
[41]	Mirheidari A S, Arshad S, Onarlioglu K, et al. Cached and confused: Web cache deception in the wild[C]//Proc of the 29th USENIX Security Symp. Berkeley, CA: USENIX Association, 2020: 665−682
[42]	马郓,刘譞哲,梅宏. 面向移动Web应用的浏览器缓存性能度量与优化[J]. 软件学报,2020,31(7):1980−1996 Ma Yun, Liu Xuanzhe, Mei Hong. Measurement and optimization of browser cache performance for mobile Web applications[J]. Journal of Software, 2020, 31(7): 1980−1996(in Chinese)
[43]	Wang Huan, Wu Kui, Wang Jianping, et al. Rldish: Edge-assisted QoE optimization of HTTP live streaming with reinforcement learning[C]// Proc of the 43rd IEEE Conf on Computer Communications. Piscataway, NJ: IEEE 2020: 706−715
[44]	Fuerst A, Sharma P. Faascache: Keeping serverless computing alive with greedy-dual caching[C]//Proc of the 26th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2021: 386−400
[45]	Roy B R, Patel T, Tiwari D. Icebreaker: Warming serverless functions better with heterogeneity[C]//Proc of the 27th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2022: 753−767
[46]	Li Guopeng, Tan Haisheng, Zhang Xuan, et al. Online container caching with late-warm for IoT data processing[C]//Proc of the 40th Int Conf on Data Engineering. Piscataway, NJ: IEEE 2024: 1547−1560
[47]	Traverso S, Ahmed M, Garetto M, et al. Temporal locality in today’s content caching: Why it matters and how to model it[J]. ACM SIGCOMM Computer Communication Review, 2013, 43(5): 5−12 doi: 10.1145/2541468.2541470
[48]	Kwon W, Zhuohan Li, Zhuang Siyuan, et al. Efficient memory management for large language model serving with Paged Attention[C]//Proc of the 29th Symp on Operating Systems Principles. New York: ACM, 2023: 611−626
[49]	Zhang Zhenyu, Sheng Ying, Zhou Tianyi, et al. H2o: Heavy-hitter oracle for efficient generative inference of large language models[C]//Advances in Neural Information Processing Systems. Cambridge, MA: MIT, 2023: 34661−34710
[50]	Liu Yuhan, Li Hanchen, Cheng Yihua, et al. Cachegen: KV cache compression and streaming for fast language model serving[J]. arXiv preprint, arXiv: 2310.07240, 2023
[51]	Agarwal S, Mitra S, Chakraborty S, et al. Approximate caching for efficiently serving text-to-image diffusion models[C]//Proc of the 21st USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2024: 1173−11189
[52]	Ma Ziyu, Sun Bin, Li Shutao. A two-stage selective fusion framework for joint intent detection and slot filling[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(3): 3874−3885 doi: 10.1109/TNNLS.2022.3202562
[53]	NVIDIA. Intelligent virtual assistant [EB/OL]. [2024-05-21]. https://www.nvidia.com/en-us/ai-data-science/ai-workflows/intelligent-virtual-assistant/
[54]	Kinza Y. Virtual assistant (ai assistant) [EB/OL]. [2024-05-21]. https://www.techtarget.com/searchcustomerexperience/definition/virtual-assistant-AI-assistant
[55]	Google. bert-base-chinese [EB/OL]. [2024-04-01]. https://huggingface.co/google-bert/bert-base-chinese
[56]	MokaAI. M3E models [EB/OL]. [2024-04-01]. https://huggingface.co/moka-ai/m3e-small
[57]	Meta. FAISS. [EB/OL]. [2024-04-01]. https://ai.meta.com/tools/faiss/
[58]	Patil G. Shishir, Zhang Tianjun, Fang Vivian, et al. Goex: perspectives and designs towards a runtime for autonomous llm applications[J]. arXiv preprint, arXiv: 2404.06921, 2024
[59]	Chu A, Shoemaker C. Tutorial: Use code interpreter sessions in semantic kernel with azure container apps [EB/OL]. [2024-05-24]. https://learn.microsoft.com/en-us/azure/container-apps/sessions-tutorial-semantic-kernel
[60]	LangChain. Security[EB/OL]. [2024-05-24]. https://python.langchain.com/v0.1/docs/security/
[61]	中国中文信息学会. 中文人机对话技术评测 [EB/OL]. [2024-02-01]. https://conference.cipsc.org.cn/smp2019/evaluation.html Chinese information processing society of china. The evaluation of chinese human-computer dialogue technology [EB/OL]. [2024-02-01]. https://conference.cipsc.org.cn/smp2019/evaluation.html (in Chinese)
[62]	Mehta S, Sekhavat M, Cao Q, et al. OpenELM: An efficient language model family with open training and inference framework [EB/OL]. [2024-05-01]. https://machinelearning.apple.com/research/openelm
[63]	Beatty S. Tiny but mighty: The Phi−3 small language models with big potential [EB/OL]. [2024-05-01]. https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/
[64]	Lee S, Choi J, Lee J, et al. Explore, select, derive, and recall: Augmenting LLM with human-like memory for mobile task automation [J]. arXiv preprint, arXiv: 2312.03003, 2023