mstdn.maud.io is one of the many independent Mastodon servers you can use to participate in the fediverse.
The place to express your ❤️ more freely. / あなたの「すき」をもっと自由に書き表すための場所。

Administered by:

Server stats:

344
active users

パーサーを自動生成することの利点にはある種の性質を検査できるという点があって、パーサージェネレーターによくあるセマンティックアクションも含めて人が手を入れられるようにするとその性質が保証できなくなる(たとえるなら引数を型検査したけど中でキャストしまくってる感じ)から、むしろカスタマイズしなくてもいい感じになるパーサージェネレーターを目指したい気持ちがある。

じゃあいい感じってなんだよ、というのはあって、さっきの arxiv.org/abs/2209.08383 の人は文法定義を渡したらASTのデータ構造もツリービルダーも自動生成する(というか、ASTの大まかなあるべき形も文法定義の中に入っているというべきか)ツール github.com/jzimmerman/langcc を書いてて、それで気になってる。その根本原理を理解できてないけど。

arXiv.orgPractical LR Parser GenerationParsing is a fundamental building block in modern compilers, and for industrial programming languages, it is a surprisingly involved task. There are known approaches to generate parsers automatically, but the prevailing consensus is that automatic parser generation is not practical for real programming languages: LR/LALR parsers are considered to be far too restrictive in the grammars they support, and LR parsers are often considered too inefficient in practice. As a result, virtually all modern languages use recursive-descent parsers written by hand, a lengthy and error-prone process that dramatically increases the barrier to new programming language development. In this work we demonstrate that, contrary to the prevailing consensus, we can have the best of both worlds: for a very general, practical class of grammars -- a strict superset of Knuth's canonical LR -- we can generate parsers automatically, and the resulting parser code, as well as the generation procedure itself, is highly efficient. This advance relies on several new ideas, including novel automata optimization procedures; a new grammar transformation ("CPS"); per-symbol attributes; recursive-descent actions; and an extension of canonical LR parsing, which we refer to as XLR, which endows shift/reduce parsers with the power of bounded nondeterministic choice. With these ingredients, we can automatically generate efficient parsers for virtually all programming languages that are intuitively easy to parse -- a claim we support experimentally, by implementing the new algorithms in a new software tool called langcc, and running them on syntax specifications for Golang 1.17.8 and Python 3.9.12. The tool handles both languages automatically, and the generated code, when run on standard codebases, is 1.2x faster than the corresponding hand-written parser for Golang, and 4.3x faster than the CPython parser, respectively.