基于可判定性理论的人工智能系统安全风险分类

李国杰

doi:10.7544/issn1000-1239.202660032

基于可判定性理论的人工智能系统安全风险分类

李国杰

A Safety Risk Taxonomy of AI Systems Based on Decidability Theory

Li Guojie

摘要

摘要: 本文从逻辑复杂性视角重新思考人工智能系统的安全问题，构建了智能系统安全问题的3个层级：R1级（可判定命题，可事前证明安全）、R2级（半可判定命题，只能事后发现不安全）及R3级（非递归可枚举，连发现不安全都无法保证）。正确区分R1级和R2级至关重要，所有工程上可解决的安全问题都在R1层级，实现安全应当在正确性验证和制度兜底两个方向努力。对于备受关注的人工智能安全，虽然目前人工智能的风险还没有进入R3 级，但安全治理路径迫切需要从“事前证明”转向“运行期治理”，重点关注门控、回滚、隔离、人在回路、权限分级等外部监控，建立以“文明级终极制动机制（kill-switch）”为核心的技术内建与制度外置体系，从而在逻辑不完备的现实中维系人类的纠错主权与文明安全。

Abstract: This paper reconceptualizes AI safety from the lens of logical complexity, and establishes three levels of safety issues: R1 Level (decidable propositions, supporting formal pre-verification), R2 Level (semi-decidable propositions, supporting only post-hoc evidence discovery), and R3 Level (non-recursively enumerable, where even the identification of insecurity cannot be guaranteed). Distinguishing between R1 and R2 is critical: all engineering-tractable security issues reside within the R1 Level. Consequently, achieving security requires a dual-track effort focused on correctness verification and institutional safety nets. Regarding the high-profile field of AI security, while current risks have not yet escalated to the R3 Level, the governance trajectory must urgently shift from “pre-verification” to “runtime governance”, prioritizing external monitoring including gating, rollbacks, isolation, human-in-the-loop systems, and permission hierarchies. Furthermore, it necessitates a dual-sovereignty system combining embedded technology and external institutions, thereby maintaining human corrective sovereignty and civilizational security within a logically incomplete reality.

HTML全文

参考文献(8)

施引文献

资源附件(0)