DHBW theorem prover scores at World Championship for automated reasoning systems

29.09.2022

Author Prof. Dr. Stephan Schulz:

“Who killed Aunt Agatha?” – Popular board games played by millions of fans are an application of logical puzzles. There are a number of possible answers, and using some hints, humans can typically solve this questions by reasoning. However, not all reasoning problems are that simple. Some are very large and need computing power for considering all available information. Reasoning is one of the lesser known fields of artificial intelligence, letting computers and algorithms do the reasoning work that in the past has been done by humans.

Theorem provers: start with a real world problem, formalize it, and then let the computer find a proof, or a countermodel.

In computer science, so-called theorem provers can be applied to logical puzzles like the aforementioned, but also for proving mathematical theorems, answering common-sense questions, demonstrating biological relationships, or showing properties like zero-defect software. The idea is to start with a real world problem, formalize it, and then let the computer find a proof, or a countermodel.

Poster of the world championship
Each year, the world’s best scientists in the field of automated reasoning meet at the CADE ATP System Competition (CASC). It is associated with the Conference on Automated Deduction and regarded as the "World Championship" for automated reasoning systems. This year, it was held at the Federated Logic Conference, an event that co-located many conferences concerned with logic, formal methods, and automated reasoning, at the Technion in Haifa, Israel.

The 2022 edition of CASC saw 18 competitors working on problems from 7 different so-called “divisions". In each division, systems try to solve as many logical problems as possible within a given time limit, usually by proving a conjecture from a given set of axioms. One of the Divisions, FNT, instead challenges systems to demonstrate that the conjecture is false (e.g. by providing a counterexample, or, in more technical language, a counter-model).

The theorem prover E, primarily developed by DHBW Professor and ZfKI member Stephan Schulz, competed in six divisions. It achieved a first place in the SLH division (proving "real user problems" within a 30s time limit) and came in second in three more divisions: proving theorems in higher-order logic, proving theorems in first-order logic (often considered the premiere category with the most mature theorem provers), and proving purely equational problems. In the LTB division (proving a large number of problems from one application domain), it came in third. This continues the successes of the last 3 years, where E (or it’s higher-order variant) also won first places and good placements in various divisions.

Details on E

The provers and the competition has been the subject of a number of publications. The foundations of E can be found in Schulz:AICOM-2002
Recent developments are described in
SCV:CADE-2019 and VBCS:STTT-2021 A publication on the latest support for full higher-order logic is forthcoming.
The setup of the competition and the competitors are described in Sutcliffe:CASC-2022.

E as basis for other tools

In the higher-order logic division of the contest, six of the eight provers rely on E in the back-end. This includes the winner Zipperposition, an experimental system developed in the Matryoshka project, hosted at the Vrije Universiteit Amsterdam in cooperation with Inria Nancy in France, the Université de Liège in Belgium, the MPI Saarbrücken, and DHBW Stuttgart from Germany. The successful performance of both Zipperposition and E was largely due to the work of Petar Vukmirović, a PhD candidate jointly supervised by Jasmin Blanchette and Stephan Schulz.

Figure 4 shows the performance graphs (CPU time needed over number of problems solved within that limit) for the provers in the THF division. Typically, many problems are easy, but each system has a point at which the time for further solutions tends sharply upwards. Zipperposition and E solve most problems.

Schulz, Stephan (2002): E ‐ A Brainiac Theorem Prover. In: Journal of AI Communications 15 (2/3), S. 111–126.
Schulz, Stephan; Cruanes, Simon; Vukmirović, Petar: Faster, Higher, Stronger: E 2.3. In: (LNAI), S. 495–507.
Sutcliffe, Geoff (2022): Proceedings of the 11th IJCAR ATP System Competition. Haifa, Israel. Online verfügbar unter http://www.tptp.org/CASC/J11/Proceedings.pdf.
Vukmirović, Petar; Blanchette, Jasmin Christian; Cruanes, Simon; Schulz, Stephan (2021): Extending a Brainiac Prover to Lambda-free Higher-Order Logic. In: International Journal on Software Tools for Technology Transfer. DOI: 10.1007/s10009-021-00639-7.

Blog von Prof. Dr. Stephan Schulz

Prof. Dr. rer. nat. Stephan Schulz / Tel.: 0711/1849-4519 / E-Mail: stephan.schulz@dhbw-stuttgart.de

DHBW theorem prover scores at World Championship for automated reasoning systems

Details on E

E as basis for other tools

Blog von Prof. Dr. Stephan Schulz

Quicklinks

Informationen für

Portale

Kontaktinfo