[PR #924] [CLOSED] [RFC] Add properties to List for byte-based encodings/hashes #808

Closed
opened 2025-12-30 01:26:56 +01:00 by adam · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/apple/pkl/pull/924
Author: @HT154
Created: 2/2/2025
Status: Closed

Base: mainHead: list-byte-properties


📝 Commits (1)

  • 3821909 Add properties to List for byte-based encodings/hashes

📊 Changes

5 files changed (+98 additions, -3 deletions)

View changed files

📝 pkl-core/src/main/java/org/pkl/core/runtime/VmList.java (+10 -1)
📝 pkl-core/src/main/java/org/pkl/core/runtime/VmUtils.java (+8 -0)
📝 pkl-core/src/main/java/org/pkl/core/stdlib/base/ListNodes.java (+42 -1)
📝 pkl-core/src/main/resources/org/pkl/core/errorMessages.properties (+3 -0)
📝 stdlib/base.pkl (+35 -1)

📄 Description

The primary motivation here is to allow Pkl to properly work with arbitrary binary data. String can be abused to some extent, but because text encoding (and unicode normalization) is involved, what you see isn't what you get:

❯ pkl eval pkl:base -x 'let (_ = trace(0xff.toRadixString(16))) trace(0xff.toChar())' | hexdump                 
pkl: TRACE: 0xff.toRadixString(16) = "ff" (repl:text)
pkl: TRACE: 0xff.toChar() = "ÿ" (repl:text)
0000000 bfc3                                   
0000002

I'm most focused on base64 here, but I don't see an reason to omit the hashes either.

This can actually be implemented in-langage, but it's far from performant:

const local base64Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".chars
const function base64Bytes(bytes: List<UInt8>|Listing<UInt8>): String =
  let (padding = if (bytes.length % 3 == 0) "" else "=".repeat(3 - (bytes.length % 3)))
    bytes.fold(List(List()), (acc, elem) ->
      if (acc.last.length == 3) acc + List(List(elem))
      else acc.dropLast(1) + List(acc.last + List(elem))
    )
      .map((tuple) ->
      let (n = tuple[0].shl(16) + (tuple.getOrNull(1)?.shl(8) ?? 0) + (tuple.getOrNull(2) ?? 0))
        "\(base64Chars[n.ushr(18).and(63)])\(base64Chars[n.ushr(12).and(63)])\(base64Chars[n.ushr(6).and(63)])\(base64Chars[n.and(63)])"
    )
      .join("") + padding

Here's a test encoding a 16kB buffer:

❯ time ../pkl/pkl-cli/build/executable/jpkl eval test.pkl > /dev/null
../pkl/pkl-cli/build/executable/jpkl eval test.pkl > /dev/null  19.55s user 0.36s system 125% cpu 15.862 total

And here's the exact same evaluation switched to use the List.base64 property introduced in this PR:

❯ time ../pkl/pkl-cli/build/executable/jpkl eval test.pkl > /dev/null
../pkl/pkl-cli/build/executable/jpkl eval test.pkl > /dev/null  4.22s user 0.19s system 325% cpu 1.356 total

This is roughly a 12x speedup!

Methods on generic types that only work for some type arguments don't really exist in Pkl yet. I considered an entirely separate stdlib class (possibly even a List subclass?) that forces the element type to UInt8 I'm interested in hearing thoughts on how to best approach this, but this was the most expedient path to prove the concept.

As for why I'm doing this at all, here's a teaser:
Screenshot 2025-02-01 at 16 58 38


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/apple/pkl/pull/924 **Author:** [@HT154](https://github.com/HT154) **Created:** 2/2/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `list-byte-properties` --- ### 📝 Commits (1) - [`3821909`](https://github.com/apple/pkl/commit/38219094e6bf5fe8dabb4259697ce3af5c148555) Add properties to `List` for byte-based encodings/hashes ### 📊 Changes **5 files changed** (+98 additions, -3 deletions) <details> <summary>View changed files</summary> 📝 `pkl-core/src/main/java/org/pkl/core/runtime/VmList.java` (+10 -1) 📝 `pkl-core/src/main/java/org/pkl/core/runtime/VmUtils.java` (+8 -0) 📝 `pkl-core/src/main/java/org/pkl/core/stdlib/base/ListNodes.java` (+42 -1) 📝 `pkl-core/src/main/resources/org/pkl/core/errorMessages.properties` (+3 -0) 📝 `stdlib/base.pkl` (+35 -1) </details> ### 📄 Description The primary motivation here is to allow Pkl to properly work with arbitrary binary data. String can be abused to some extent, but because text encoding (and unicode normalization) is involved, what you see isn't what you get: ``` ❯ pkl eval pkl:base -x 'let (_ = trace(0xff.toRadixString(16))) trace(0xff.toChar())' | hexdump pkl: TRACE: 0xff.toRadixString(16) = "ff" (repl:text) pkl: TRACE: 0xff.toChar() = "ÿ" (repl:text) 0000000 bfc3 0000002 ``` I'm most focused on `base64` here, but I don't see an reason to omit the hashes either. This can actually be implemented in-langage, but it's far from performant: ```pkl const local base64Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".chars const function base64Bytes(bytes: List<UInt8>|Listing<UInt8>): String = let (padding = if (bytes.length % 3 == 0) "" else "=".repeat(3 - (bytes.length % 3))) bytes.fold(List(List()), (acc, elem) -> if (acc.last.length == 3) acc + List(List(elem)) else acc.dropLast(1) + List(acc.last + List(elem)) ) .map((tuple) -> let (n = tuple[0].shl(16) + (tuple.getOrNull(1)?.shl(8) ?? 0) + (tuple.getOrNull(2) ?? 0)) "\(base64Chars[n.ushr(18).and(63)])\(base64Chars[n.ushr(12).and(63)])\(base64Chars[n.ushr(6).and(63)])\(base64Chars[n.and(63)])" ) .join("") + padding ``` Here's a test encoding a 16kB buffer: ``` ❯ time ../pkl/pkl-cli/build/executable/jpkl eval test.pkl > /dev/null ../pkl/pkl-cli/build/executable/jpkl eval test.pkl > /dev/null 19.55s user 0.36s system 125% cpu 15.862 total ``` And here's the exact same evaluation switched to use the `List.base64` property introduced in this PR: ``` ❯ time ../pkl/pkl-cli/build/executable/jpkl eval test.pkl > /dev/null ../pkl/pkl-cli/build/executable/jpkl eval test.pkl > /dev/null 4.22s user 0.19s system 325% cpu 1.356 total ``` This is roughly a 12x speedup! Methods on generic types that only work for some type arguments don't really exist in Pkl yet. I considered an entirely separate stdlib class (possibly even a `List` subclass?) that forces the element type to `UInt8` I'm interested in hearing thoughts on how to best approach this, but this was the most expedient path to prove the concept. As for why I'm doing this at all, here's a teaser: <img width="789" alt="Screenshot 2025-02-01 at 16 58 38" src="https://github.com/user-attachments/assets/97828383-d261-4b52-80ce-a68e4f8afc56" /> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
adam added the pull-request label 2025-12-30 01:26:56 +01:00
adam closed this issue 2025-12-30 01:26:56 +01:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/pkl#808