Abstract:
With the explosive growth of scientific literature and the continuous deepening of research fields, researchers face significant information processing challenges when attempting to formulate novel scientific hypotheses. Although Large Language Models (LLMs) possess considerable potential for data processing and knowledge integration, they remain limited in their ability to generate original and insightful scientific hypotheses. Existing research predominantly emphasizes utilizing LLMs to expedite and refine established theories and technologies, often overlooking the initial stages of scientific inquiry where novel hypotheses are proposed and new theories are developed—a stage vital to scientific advancement. This study, grounded in the principles of divergent and convergent thinking from the theory of structured intelligence, proposes an innovative Human-in-the-loop Multi-agent Framework (HILMA) for the reliable generation of scientific hypotheses. The HILMA framework incorporates a real-time, systematic knowledge retrieval enhancement mechanism, dynamically integrating the latest research advancements to construct citation network subgraphs, providing LLMs with comprehensive and up-to-date scientific knowledge surveys. Additionally, the framework enhances hypothesis generation through a multi-agent argumentation approach that simulates the scientific peer review process, while also leveraging the intuition and expertise of human experts to further refine and diversify the generated hypotheses. A series of human-machine evaluations has shown that this method demonstrates significant advantages over existing baselines in generating high-quality scientific hypotheses and holds promise as a key facilitator for driving technological innovation.