Advanced Text Understanding for Budget Auditing Using Domain-Specific Languages
This bachelor thesis will investigate the development of a domain-specific language (DSL) for automating budget compliance verification at the Schleswig-Holstein State Audit Office (Landesrechnungshof). We already built a successful prototype that uses YAML-based rule patterns and Prolog to validate budget overruns against complex legal regulations. This thesis aims to enhance the system's ability to interpret complex legal texts and budget annotations. The work will focus on how a specialized DSL can formalize budget regulations more precisely while improving maintainability compared to the current pattern matching approach.
Problem Statement
The State Audit Office (Landesrechnungshof) annually examines budget overruns and verifies whether they are properly covered according to legal regulations. A prototype system has demonstrated that rule-based approaches can partially automate this verification process, but several challenges remain:
- The textual complexity of coverage annotations contains numerous special cases and conditional rules that are difficult to capture with simple pattern matching
- YAML-based rule patterns lack the expressiveness needed for more sophisticated language understanding
- The current approach struggles with context-dependent information across different sections of budget documentation
- Budget plans from different departments often use heterogeneous annotation styles, requiring significant manual adaptation of rules
A specialized domain-specific language tailored to budget regulations could increase expressiveness while enabling better maintainability.
Goals
- Design of a domain-specific language for describing coverage annotations and rules
- Implement a parser for this language using appropriate tools (e.g., ANTLR, Python parsing libraries)
- Develop a translation layer to convert DSL expressions into Prolog code
- Evaluation of expressiveness and maintainability compared to the existing system
Requirements
- Solid programming skills in Python
- Basic understanding of formal languages, grammars, and parsing techniques
- Familiarity with logic programming concepts, ideally some experience with Prolog
- Interest in natural language processing and rule-based systems
- Comfort working with German-language documentation (all budget materials are in German)
- News
- Research
- Teaching
- Staff
- Martin Leucker
- Diedrich Wolter
- Ulrike Schräger-Ahrens
- Mahmoud Abdelrehim
- Aliyu Ali
- Phillip Bende
- Moritz Bayerkuhnlein
- Marc Bätje
- Tobias Braun
- Gerhard Buntrock
- Raik Dankworth
- Anja Grotrian
- Raik Hipler
- Elaheh Hosseinkhani
- Frauke Kerlin
- Karam Kharraz
- Mohammad Khodaygani
- Ludwig Pechmann
- Waqas Rehan
- Martin Sachenbacher
- Andreas Schuldei
- Inger Struve
- Annette Stümpel
- Gesina Schwalbe
- Tobias Schwartz
- Daniel Thoma
- Sparsh Tiwari
- Lars Vosteen
- Open Positions
- Contact