Declarative static analysis for multilingual programs using CodeQL
Citations Over TimeTop 10% of 2023 papers
Abstract
Summary Declarative static program analysis has become one of the widely‐used program analysis techniques. Declarative static analyzers perform three steps: creating databases of facts from program source code, evaluating rules to generate new facts, and running queries over facts to extract all information related to specific properties via query systems. Declarative static analyzers can easily target diverse programming languages by modifying only databases and rules for new languages. Because query systems are independent of programming languages, they are reusable for new languages. However, even when declarative analyzers support multiple programming languages they do not currently support the analysis of multilingual programs written in two or more programming languages. We propose a systematic methodology that extends a declarative static analyzer supporting multiple languages to support multilingual programs as well. The main idea is to reuse existing components of the analyzer. Our approach first generates a merged database of facts, consisting of multiple logical language spaces. It allows existing language‐specific rules to derive new facts for the corresponding language from the facts in the corresponding language space. Then, it defines language‐interoperation rules that handle the language interoperation semantics. Finally, it uses the same query system to get analysis results leveraging the language interoperation semantics. We develop a proof‐of‐concept declarative static analyzer for multilingual programs by extending CodeQL, which can track dataflows across language boundaries. Our evaluation shows that the analyzer successfully tracks dataflows across Java‐C and Python‐C language boundaries and detects genuine interoperation bugs in real‐world multilingual programs.
Related Papers
- → Declarative networking(2009)147 cited
- → Multi-paradigm Declarative Languages(2007)99 cited
- → Formal semantics and high performance in declarative machine learning using Datalog(2021)9 cited
- Retire Superman: Handling Exceptions Seamlessly in a Declarative Visual Programming Language(1996)
- Extending Datalog with Declarative Updates(1998)