[PR #1251] [MERGED] Fix Lexer EOF sentinel collision with valid Unicode code points #968

Closed
opened 2025-12-30 01:28:01 +01:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/apple/pkl/pull/1251
Author: @spyoungtech
Created: 10/22/2025
Status: Merged
Merged: 10/26/2025
Merged by: @bioball

Base: mainHead: fix-lexer-unicode-collision


📝 Commits (5)

  • 30456b1 use out-of-band checks for EOF detection
  • 06ea6a6 add tests for EOF sentinel handling
  • a645496 apply spotless fixes
  • ab9895a remove use of EOF const now that atEOF is used
  • 424e4b9 Switch to using -1 for EOF

📊 Changes

2 files changed (+79 additions, -14 deletions)

View changed files

📝 pkl-parser/src/main/java/org/pkl/parser/Lexer.java (+13 -14)
📝 pkl-parser/src/test/kotlin/org/pkl/parser/LexerTest.kt (+66 -0)

📄 Description

Summary

A change introduced in 0.28.0 reimplemented the Lexer in a way that implements EOF detection using an in-band sentinel value Short.MAX_VALUE (0x7FFF), which collides with the valid Unicode codepoint U+7FFF. When this character appears in source code, the lexer incorrectly terminates parsing, treating all subsequent content as non-existent.

Although this is probably a rare occurrence for typical use, this issue presents an EOF injection vulnerability, which is very similar to a null byte injection vulnerability. This vulnerability enables steganographic configuration manipulation with the following characteristics:

  1. Silent Failure: No parse errors, warnings, or diagnostic output
  2. Code Review Evasion: The attack character is visually innocuous and easily overlooked
  3. Policy Bypass: Security controls defined after the injection point are never evaluated

This affects (at least) all versions since 0.28.0: [0.28.0, 0.28.1, 0.28.2, 0.29.0, 0.29.1]

This fix removes reliance on lookahead/sentinel comparisons to prevent early parser termination.

Proof of Concept

Consider a base network policy with subsequent hardening:

base.pkl:

allowedPorts = new Listing {
    80
    443
    22
}

production.pkl (legitimate):

  amends "base.pkl"

  // Production security: remove SSH access
  allowedPorts = new Listing {
      80
      443
  }

production.pkl (compromised):

  amends "base.pkl"

  // Production security: remove port 22; SSH not allowed in China env  翿!
  allowedPorts = new Listing {
      80
      443
  }

Result:

  • Expected: SSH port 22 blocked
  • Actual: SSH port 22 remains open (security bypass)

The configuration appears syntactically valid but the hardening directive is silently ignored.

Proposed Remediation

The fix implements out-of-band EOF signaling using a dedicated atEOF boolean flag, eliminating the sentinel value collision:

  1. EOF state tracked via explicit boolean flag rather than reserved character value
  2. All EOF detection logic migrated to flag-based checks
  3. EOF sentinel changed to Character.MAX_VALUE (0xFFFF) with proper flag handling

This ensures no valid Unicode codepoint can be misinterpreted as EOF.

Verification

Verification is added to the existing test suite in the form of a test that validates the Unicode code points, including the boundary values (U+7FFF, U+FFFF), are correctly processed without premature termination.

References


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/apple/pkl/pull/1251 **Author:** [@spyoungtech](https://github.com/spyoungtech) **Created:** 10/22/2025 **Status:** ✅ Merged **Merged:** 10/26/2025 **Merged by:** [@bioball](https://github.com/bioball) **Base:** `main` ← **Head:** `fix-lexer-unicode-collision` --- ### 📝 Commits (5) - [`30456b1`](https://github.com/apple/pkl/commit/30456b189a661e5025b6e1b53e354a18a7632f82) use out-of-band checks for EOF detection - [`06ea6a6`](https://github.com/apple/pkl/commit/06ea6a68936825b0ef5afa92a3d2a7bcbf7961c7) add tests for EOF sentinel handling - [`a645496`](https://github.com/apple/pkl/commit/a6454967c4247799196a32794755424375a9d9c2) apply spotless fixes - [`ab9895a`](https://github.com/apple/pkl/commit/ab9895af134a3b92d6df910d3271532a2b7ec38e) remove use of EOF const now that atEOF is used - [`424e4b9`](https://github.com/apple/pkl/commit/424e4b9d47295047b8686292e5f1f2100ec50f48) Switch to using -1 for EOF ### 📊 Changes **2 files changed** (+79 additions, -14 deletions) <details> <summary>View changed files</summary> 📝 `pkl-parser/src/main/java/org/pkl/parser/Lexer.java` (+13 -14) 📝 `pkl-parser/src/test/kotlin/org/pkl/parser/LexerTest.kt` (+66 -0) </details> ### 📄 Description # Summary A change [introduced](https://github.com/apple/pkl/pull/917) in 0.28.0 reimplemented the Lexer in a way that implements EOF detection using an in-band sentinel value `Short.MAX_VALUE` (0x7FFF), which collides with the valid Unicode codepoint U+7FFF. When this character appears in source code, the lexer incorrectly terminates parsing, treating all subsequent content as non-existent. Although this is probably a rare occurrence for typical use, this issue presents an EOF injection vulnerability, which is very similar to a null byte injection vulnerability. This vulnerability enables **steganographic configuration manipulation** with the following characteristics: 1. **Silent Failure**: No parse errors, warnings, or diagnostic output 2. **Code Review Evasion**: The attack character is visually innocuous and easily overlooked 3. **Policy Bypass**: Security controls defined after the injection point are never evaluated This affects (at least) all versions since 0.28.0: [0.28.0, 0.28.1, 0.28.2, 0.29.0, 0.29.1] This fix removes reliance on lookahead/sentinel comparisons to prevent early parser termination. <details><summary><h3>Proof of Concept</h3></summary> Consider a base network policy with subsequent hardening: **base.pkl**: ```pkl allowedPorts = new Listing { 80 443 22 } ``` production.pkl (legitimate): ```pkl amends "base.pkl" // Production security: remove SSH access allowedPorts = new Listing { 80 443 } ``` production.pkl (compromised): ```pkl amends "base.pkl" // Production security: remove port 22; SSH not allowed in China env 翿! allowedPorts = new Listing { 80 443 } ``` Result: - Expected: SSH port 22 blocked - Actual: SSH port 22 remains open (security bypass) The configuration appears syntactically valid but the hardening directive is silently ignored. </details> ## Proposed Remediation The fix implements out-of-band EOF signaling using a dedicated atEOF boolean flag, eliminating the sentinel value collision: 1. EOF state tracked via explicit boolean flag rather than reserved character value 2. All EOF detection logic migrated to flag-based checks 3. EOF sentinel changed to Character.MAX_VALUE (0xFFFF) with proper flag handling This ensures no valid Unicode codepoint can be misinterpreted as EOF. ### Verification Verification is added to the existing test suite in the form of a test that validates the Unicode code points, including the boundary values (U+7FFF, U+FFFF), are correctly processed without premature termination. ## References - https://cwe.mitre.org/data/definitions/147.html - https://cwe.mitre.org/data/definitions/115.html - https://www.unicode.org/reports/tr55/ --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2025-12-30 01:28:01 +01:00
adam closed this issue 2025-12-30 01:28:01 +01:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/pkl#968