ICLR 错误证明题库 · 数据报告

从 54,851 篇 ICLR 论文(2016–2026)中,借 peer review 信号挖掘"作者承认的错误证明",经 96 道 hard-flaw 再精炼,最终定稿 49 道入库题目。

54,851

ICLR 论文总数

9,363

实质理论论文

255

作者承认过错误

终题库(入库定稿)

一、筛选漏斗

ICLR 全部论文 (2016–2026)54,851

11 个年份文件,含完整 peer review

正则预筛:含理论信号19,218

关键词/正则打分,STRONG/MEDIUM/WEAK 三档保留 35.0%

主题分类:含实质理论9,363

gpt-5.4-nano 判 pure_theory / theory_core 保留 48.7%

一审:审稿人质疑证明正确性5,868

explicit_error 1,421 + soft_concern 4,447 保留 62.7%

二审:作者承认过错误255

读 rebuttal 线程,作者认账保留 4.3%

可入库:错误成立或可复原131

confirmed 58 + recoverable 73 保留 51.4%

hard flaw:需推理的错误96

Codex 逐条 triage,剔除 35 个显然 typo 保留 73.3%

终题库:入库定稿49

再剔除一目了然的错误 & noise,严格复核入库保留 51.0%

分支说明:一审 5,868 篇是"审稿人质疑正确性"的并集;二审只对其中作者认账的收敛到 255 篇;可入库 131 = 错误成立的 58 篇 + 从 rebuttal 线程完全可复原的 73 篇;Codex 逐条剔除 35 个"显然 typo"得 96 道 hard flaw;最后再剔除一目了然的错误与 noise、严格复核,定稿 49 道入库。

二、终题库数据分析(N=49)

错误类型分布

无效放缩/不等式步骤

15 30.6%

缺失/过弱假设

14 28.6%

循环论证/未证断言

8 16.3%

错误常数/系数/速率

7 14.3%

定义不当/病态

2 4.1%

量词/作用域错误

1 2.0%

运算次序不可交换

1 2.0%

边界/退化情形遗漏

1 2.0%

最常见:无效放缩 / 缺失假设 / 循环论证 / 错误速率——都是需要读懂证明才能发现的结构性缺陷,而非笔误。精炼后无效放缩与缺失假设两类合计近六成。

主题分布(6 类)

优化理论

19 38.8%

强化学习理论

11 22.4%

生成模型/采样

6 12.2%

概率/集中不等式

6 12.2%

学习理论/泛化

5 10.2%

线性代数/谱

2 4.1%

优化理论与强化学习理论占比最高,与 ICLR 理论论文的整体构成一致。

年份分布

2017

1 2.0%

2018

4 8.2%

2019

1 2.0%

2020

5 10.2%

2021

5 10.2%

2022

7 14.3%

2023

6 12.2%

2024

14 28.6%

2025

6 12.2%

来源 & 影响面

来源

错误成立 (PDF 仍带错)

22 44.9%

已修但可复原 (线程重构)

27 55.1%

是否影响主结果(Codex 复核)

影响主结果

42 85.7%

仅影响辅助引理

7 14.3%

Codex 复核平均置信度 0.90(0.84–0.98);逾八成缺陷直接波及论文主结果。

三、高置信度样例(affects_main_result = yes)

年份	论文	主题	错误类型	作者承认(节录)
2022	A Novel Convergence Analysis for the Stochastic Proximal Point Alg…	优化理论	循环论证/未证断言	No, Lemma 1 can't be fixed. The result simply does not hold…
2024	Reward Adaptation Via Q-Manipulation	强化学习理论	错误常数/系数/速率	“we thank the reviewers for pointing out the error in Lemma 5.”…
2024	Farzi Data: Autoregressive Data Distillation	生成模型/采样	缺失/过弱假设	Thank you for catching our honest mistake at the end of Theorem 3.1…
2018	From Information Bottleneck To Activation Norm Penalty	优化理论	循环论证/未证断言	As you rightly pointed out, the lack of treatment for the second term (log-determinant) is unsatisfying…
2020	Policy Optimization with Stochastic Mirror Descent	强化学习理论	循环论证/未证断言	The reviewer's comment of Eq.(40) is correct…
2020	An Information Theoretic Perspective on Disentangled Representatio…	概率/集中不等式	缺失/过弱假设	As you point out, now we realize that the factorized noise is sufficient for conditional independence…
2018	Residual Loss Prediction: Reinforcement Learning With No Incremen…	强化学习理论	缺失/过弱假设	indeed, you're right, there's a missing term. In going from Eq 7 to Eq 8…
2018	Three factors influencing minima in SGD	优化理论	缺失/过弱假设	We agree there is a mathematical mistake in allowing sigma to vary with theta…
2023	Shuffle Gaussian Mechanism for Differential Privacy	概率/集中不等式	量词/作用域错误	Thanks for the insightful comment! We have indeed overlooked the fact that normalization and shuffling…
2023	Improving Out-of-distribution Generalization with Indirection Repr…	线性代数/谱	无效放缩/不等式步骤	In the revision, we provided the definition for the ‖·‖∞ norm in Definition A.3…
2024	Learning multi-modal generative models with permutation-invariant…	生成模型/采样	缺失/过弱假设	We agree that Proposition 1 relies on a strong assumption about the encoding distribution…
2024	Are Transformers with One Layer Self-Attention Using Low-Rank Weig…	学习理论/泛化	无效放缩/不等式步骤	As for the bugs in equation (10), you are absolutely right…

数据管线:正则预筛 → gpt-5.4-nano 主题分类 → 双审(nano)→ gpt-5.4-mini 复原 → Codex 难度 triage · 总成本 ≈ \$12.9