mirror of
https://github.com/apple/pkl.git
synced 2026-01-13 15:13:38 +01:00
[PR #1251] [MERGED] Fix Lexer EOF sentinel collision with valid Unicode code points #968
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/apple/pkl/pull/1251
Author: @spyoungtech
Created: 10/22/2025
Status: ✅ Merged
Merged: 10/26/2025
Merged by: @bioball
Base:
main← Head:fix-lexer-unicode-collision📝 Commits (5)
30456b1use out-of-band checks for EOF detection06ea6a6add tests for EOF sentinel handlinga645496apply spotless fixesab9895aremove use of EOF const now that atEOF is used424e4b9Switch to using -1 for EOF📊 Changes
2 files changed (+79 additions, -14 deletions)
View changed files
📝
pkl-parser/src/main/java/org/pkl/parser/Lexer.java(+13 -14)📝
pkl-parser/src/test/kotlin/org/pkl/parser/LexerTest.kt(+66 -0)📄 Description
Summary
A change introduced in 0.28.0 reimplemented the Lexer in a way that implements EOF detection using an in-band sentinel value
Short.MAX_VALUE(0x7FFF), which collides with the valid Unicode codepoint U+7FFF. When this character appears in source code, the lexer incorrectly terminates parsing, treating all subsequent content as non-existent.Although this is probably a rare occurrence for typical use, this issue presents an EOF injection vulnerability, which is very similar to a null byte injection vulnerability. This vulnerability enables steganographic configuration manipulation with the following characteristics:
This affects (at least) all versions since 0.28.0: [0.28.0, 0.28.1, 0.28.2, 0.29.0, 0.29.1]
This fix removes reliance on lookahead/sentinel comparisons to prevent early parser termination.
Proof of Concept
Consider a base network policy with subsequent hardening:
base.pkl:
production.pkl (legitimate):
production.pkl (compromised):
Result:
The configuration appears syntactically valid but the hardening directive is silently ignored.
Proposed Remediation
The fix implements out-of-band EOF signaling using a dedicated atEOF boolean flag, eliminating the sentinel value collision:
This ensures no valid Unicode codepoint can be misinterpreted as EOF.
Verification
Verification is added to the existing test suite in the form of a test that validates the Unicode code points, including the boundary values (U+7FFF, U+FFFF), are correctly processed without premature termination.
References
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.