高级检索

    基于可判定性理论的人工智能系统安全风险分类

    A Safety Risk Taxonomy of AI Systems Based on Decidability Theory

    • 摘要: 本文从逻辑复杂性视角重新思考人工智能系统的安全问题,构建了智能系统安全问题的三个层级:R1级(可判定命题,可事前证明安全)、R2级(半可判定命题,只能事后发现不安全)及R3级(非递归可枚举,连发现不安全都无法保证)。正确区分R1级和R2级至关重要,所有工程上可解决的安全问题都在R1层级,实现安全应当在正确性验证和制度兜底两个方向努力。对于备受关注的人工智能安全,虽然目前人工智能的风险还没有进入R3 级,但安全治理路劲迫切需要从“事前证明”转向“运行期治理”,重点关注门控、回滚、隔离、人在回路、权限分级等外部监控,建立以“文明级终极制动机制(Kill-switch)”为核心的技术内建与制度外置体系,从而在逻辑不完备的现实中维系人类的纠错主权与文明安全。

       

      Abstract: This paper re-thinking the nature of AI safety from the lens of Logical Complexity, establishing a three-levels of safety issues: R1 Level (decidable propositions, supporting formal pre-verification), R2 Level (semi-decidable propositions, supporting only post-hoc evidence discovery), and R3 Level (non-recursively enumerable, where even the identification of insecurity cannot be guaranteed). Distinguishing between R1 and R2 is critical: all engineering-solvable security issues reside within the R1 level; achieving security requires a dual-track effort focused on correctness verification and institutional safety nets. Regarding the high-profile field of AI security, while current risks have not yet escalated to the R3 level, the governance trajectory must urgently shift from “pre-verification” to “runtime governance”. prioritizing external monitoring—such as gating, rollbacks, isolation, human-in-the-loop systems, and permission hierarchies. Furthermore, it necessitates a “technology-embedded + institution-external” dual-sovereignty system centered on a “Civilization-level Kill-switch”, thereby maintaining human corrective sovereignty and civilizational security within a logically incomplete reality.

       

    /

    返回文章
    返回