Large Language Models | Sven LI's Homepage

KnowMT-Bench: Benchmarking Knowledge-Intensive Long-Form Question Answering in Multi-Turn Dialogues

Fri, 26 Sep 2025 00:00:00 +0000

Abstract

Multi-Turn Long-Form Question Answering (MT-LFQA) is a key application paradigm of Large Language Models (LLMs) in knowledge-intensive domains. However, existing benchmarks are limited to single-turn dialogue, while multi-turn dialogue benchmarks typically assess other orthogonal capabilities rather than knowledge-intensive factuality.

This paper introduces KnowMT-Bench, the first-ever benchmark designed to systematically evaluate MT-LFQA for LLMs across knowledge-intensive fields, including medicine, finance, and law.

Interpreting Fedspeak with Confidence: A LLM-Based Uncertainty-Aware Framework Guided by Monetary Policy Transmission Paths

Tue, 12 Aug 2025 00:00:00 +0000

Abstract

This paper proposes an LLM-based uncertainty-aware framework for interpreting Federal Reserve communications (Fedspeak) and classifying monetary policy stance. The framework incorporates domain-specific reasoning grounded in monetary policy transmission mechanisms and introduces dynamic uncertainty decoding to assess prediction confidence.

Methodology

Domain Knowledge Integration: Incorporates monetary policy transmission mechanism knowledge
Uncertainty Quantification: Decomposes perceptual uncertainty into cognitive risk and environmental ambiguity
Dynamic Decoding: Adaptively selects decoding strategies based on model confidence levels

Results

The framework achieves competitive performance on policy stance analysis tasks, with uncertainty measures providing reliability indicators for predictions.

Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation

Mon, 19 May 2025 00:00:00 +0000

Abstract

This paper presents Compliance-to-Code, a large-scale Chinese dataset for financial regulatory compliance, containing 1,159 annotated clauses from 361 regulations across ten categories. Each clause is structured with four logical elements: subject, condition, constraint, and contextual information. The dataset includes deterministic Python code mappings and detailed reasoning to facilitate automated compliance checking.

Dataset Overview

Scale: 1,159 annotated regulatory clauses
Coverage: 361 regulations across ten financial categories
Structure: Modular compliance units with logical elements
Code Mappings: Python implementations for automated checking

FinCheck Pipeline

The paper introduces FinCheck, a pipeline system for automated compliance checking that processes natural language regulations and generates executable compliance code.