Advanced Text Understanding for Budget Auditing Using Domain-Specific Languages

for Degree: 
Contact Person: 
Status: 
Available

This bachelor thesis will investigate the development of a domain-specific language (DSL) for automating budget compliance verification at the Schleswig-Holstein State Audit Office (Landesrechnungshof). We already built a successful prototype that uses YAML-based rule patterns and Prolog to validate budget overruns against complex legal regulations. This thesis aims to enhance the system's ability to interpret complex legal texts and budget annotations. The work will focus on how a specialized DSL can formalize budget regulations more precisely while improving maintainability compared to the current pattern matching approach.

Problem Statement

The State Audit Office (Landesrechnungshof) annually examines budget overruns and verifies whether they are properly covered according to legal regulations. A prototype system has demonstrated that rule-based approaches can partially automate this verification process, but several challenges remain:

  1. The textual complexity of coverage annotations contains numerous special cases and conditional rules that are difficult to capture with simple pattern matching
  2. YAML-based rule patterns lack the expressiveness needed for more sophisticated language understanding
  3. The current approach struggles with context-dependent information across different sections of budget documentation
  4. Budget plans from different departments often use heterogeneous annotation styles, requiring significant manual adaptation of rules

A specialized domain-specific language tailored to budget regulations could increase expressiveness while enabling better maintainability.

Goals

  • Design of a domain-specific language for describing coverage annotations and rules
  • Implement a parser for this language using appropriate tools (e.g., ANTLR, Python parsing libraries)
  • Develop a translation layer to convert DSL expressions into Prolog code
  • Evaluation of expressiveness and maintainability compared to the existing system

Requirements

  • Solid programming skills in Python
  • Basic understanding of formal languages, grammars, and parsing techniques
  • Familiarity with logic programming concepts, ideally some experience with Prolog
  • Interest in natural language processing and rule-based systems
  • Comfort working with German-language documentation (all budget materials are in German)