Support cloning the project on Windows #45

Closed
opened 2025-12-30 01:19:54 +01:00 by adam · 17 comments
Owner

Originally created by @odenix on GitHub (Feb 8, 2024).

This is the first step in #20 and critical to unlock outside contributions.

Originally created by @odenix on GitHub (Feb 8, 2024). This is the first step in #20 and critical to unlock outside contributions.
adam closed this issue 2025-12-30 01:19:54 +01:00
Author
Owner

@odenix commented on GitHub (Mar 30, 2024):

Is there a timeline to fix this? Developing on WSL is really painful. Gradle and esp. IntelliJ are extremely slow (30s pause is normal), and important IntelliJ features such as showing the JDK source code don't work.
I think fixing this might require changing the package cache file layout (no : character in filenames).

@odenix commented on GitHub (Mar 30, 2024): Is there a timeline to fix this? Developing on WSL is really painful. Gradle and esp. IntelliJ are extremely slow (30s pause is normal), and important IntelliJ features such as showing the JDK source code don't work. I think fixing this might require changing the package cache file layout (no `:` character in filenames).
Author
Owner

@odenix commented on GitHub (Apr 5, 2024):

@bioball Windows support is very important to me. I'd be willing to work on this first step, but I'd need some guidance wrt. changing the cache file layout.

@odenix commented on GitHub (Apr 5, 2024): @bioball Windows support is very important to me. I'd be willing to work on this first step, but I'd need some guidance wrt. changing the cache file layout.
Author
Owner

@bioball commented on GitHub (Apr 5, 2024):

How about we precent-encode them?

According to their docs, these characters are reserved:

  • < (less than)
  • > (greater than)
  • : (colon)
  • " (double quote)
  • / (forward slash)
  • \ (backslash)
  • | (vertical bar or pipe)
  • ? (question mark)
  • * (asterisk)

Forward slash is also reserved on macOS and Linux, so we don't need to worry about encoding them (Pkl filenames cannot contain forward slashes, and URI forward slashes are equated to path separators).

For Windows only, it probably makes sense to percent-encode all other characters. Or, are there other encodings that people use?

Here's some literature:

https://stackoverflow.com/questions/1184176/how-can-i-safely-encode-a-string-in-java-to-use-as-a-filename
https://stackoverflow.com/questions/1077935/will-urlencode-fix-this-problem-with-illegal-characters-in-file-names-c

I think we'd want to take the same approach for writing output files. For instance, what will be written when you pkl eval -m . foo.pkl here?

// foo.pkl
output {
  files {
    ["foo:bar.txt"] { text = "foo:bar" }
  }
}
@bioball commented on GitHub (Apr 5, 2024): How about we precent-encode them? According to [their docs](https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file#naming-conventions), these characters are reserved: * < (less than) * \> (greater than) * : (colon) * " (double quote) * / (forward slash) * \ (backslash) * | (vertical bar or pipe) * ? (question mark) * \* (asterisk) Forward slash is also reserved on macOS and Linux, so we don't need to worry about encoding them (Pkl filenames cannot contain forward slashes, and URI forward slashes are equated to path separators). For Windows only, it probably makes sense to percent-encode all other characters. Or, are there other encodings that people use? Here's some literature: https://stackoverflow.com/questions/1184176/how-can-i-safely-encode-a-string-in-java-to-use-as-a-filename https://stackoverflow.com/questions/1077935/will-urlencode-fix-this-problem-with-illegal-characters-in-file-names-c I think we'd want to take the same approach for writing output files. For instance, what will be written when you `pkl eval -m . foo.pkl` here? ``` // foo.pkl output { files { ["foo:bar.txt"] { text = "foo:bar" } } } ```
Author
Owner

@odenix commented on GitHub (Apr 5, 2024):

Don't we want the exact same layout across operating systems? Seems desirable for portability/tooling/etc.
Gradle's dependency cache used to be non-portable, and it was causing a lot of pain.
https://github.com/gradle/gradle/issues/1338

@odenix commented on GitHub (Apr 5, 2024): Don't we want the exact same layout across operating systems? Seems desirable for portability/tooling/etc. Gradle's dependency cache used to be non-portable, and it was causing a lot of pain. https://github.com/gradle/gradle/issues/1338
Author
Owner

@bioball commented on GitHub (Apr 6, 2024):

I can see wanting that, especially if you are using the cache dir as a way to vendor dependencies in a repo.

In that case, it might make more sense to think of this as a separate problem from writing file paths with -o or -m. In those cases, it'd be surprising if -o "foo:bar.txt" somehow mangled that output file on linux/macOS.

In that case, maybe it makes the most sense to always percent encode those characters when writing to the cache dir.

@bioball commented on GitHub (Apr 6, 2024): I can see wanting that, especially if you are using the cache dir as a way to vendor dependencies in a repo. In that case, it might make more sense to think of this as a separate problem from writing file paths with `-o` or `-m`. In those cases, it'd be surprising if `-o "foo:bar.txt"` somehow mangled that output file on linux/macOS. In that case, maybe it makes the most sense to always percent encode those characters when writing to the cache dir.
Author
Owner

@odenix commented on GitHub (Apr 6, 2024):

Is mangling required for -o/-m? Can’t -o "foo:bar.txt" just fail on Windows? Another option would be to make it fail everywhere (“non-portable output path”). Anyway, I agree that this is a separate concern.

@odenix commented on GitHub (Apr 6, 2024): Is mangling required for -o/-m? Can’t `-o "foo:bar.txt"` just fail on Windows? Another option would be to make it fail everywhere (“non-portable output path”). Anyway, I agree that this is a separate concern.
Author
Owner

@bioball commented on GitHub (Apr 6, 2024):

What does Windows do right now if you try to create a filename with a reserved character? E.g. if you do echo "hello" > foo:bar.txt?

Note: for percent-encoding, we will also need to percent-encode the literal %.

@bioball commented on GitHub (Apr 6, 2024): What does Windows do right now if you try to create a filename with a reserved character? E.g. if you do `echo "hello" > foo:bar.txt`? Note: for percent-encoding, we will also need to percent-encode the literal `%`.
Author
Owner

@odenix commented on GitHub (Apr 6, 2024):

What does Windows do right now if you try to create a filename with a reserved character?

PowerShell:

echo "hello" > foo:bar.txt
Out-File: Cannot find drive. A drive with the name 'foo' does not exist.
echo "hello" > $home\foo:bar.txt   # creates a file named "foo", but it's empty

Java 21:

jshell> new File("foo:bar.txt").createNewFile()   // creates a file named "foo"
jshell> Files.createFile(Path.of("foo:bar.txt"))
|  Exception java.nio.file.InvalidPathException: Illegal char <:> at index 3: foo:bar.txt

File explorer:
Can't create file with invalid path

@odenix commented on GitHub (Apr 6, 2024): > What does Windows do right now if you try to create a filename with a reserved character? PowerShell: ``` echo "hello" > foo:bar.txt Out-File: Cannot find drive. A drive with the name 'foo' does not exist. ``` ``` echo "hello" > $home\foo:bar.txt # creates a file named "foo", but it's empty ``` Java 21: ``` jshell> new File("foo:bar.txt").createNewFile() // creates a file named "foo" ``` ``` jshell> Files.createFile(Path.of("foo:bar.txt")) | Exception java.nio.file.InvalidPathException: Illegal char <:> at index 3: foo:bar.txt ``` File explorer: Can't create file with invalid path
Author
Owner

@odenix commented on GitHub (Apr 6, 2024):

I assume the package URL -> file path conversion should be unique, to rule out name collisions. Should it also be reversible?

@odenix commented on GitHub (Apr 6, 2024): I assume the package URL -> file path conversion should be unique, to rule out name collisions. Should it also be reversible?
Author
Owner

@bioball commented on GitHub (Apr 6, 2024):

Yeah, ideally reversible, which is why percent-encoding seems like a good choice here.

@bioball commented on GitHub (Apr 6, 2024): Yeah, ideally reversible, which is why percent-encoding seems like a good choice here.
Author
Owner

@mitchcapper commented on GitHub (Apr 24, 2024):

echo "hello" > $home\foo:bar.txt
jshell> new File("foo:bar.txt").createNewFile() // creates a file named "foo"

To note, the colon here is only kind of an invalid character. Really what you are doing is writing to alternate data streams. This is why the file can appear but is 'empty'. For details: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c54dec26-1551-4d3a-a0ea-4fa40f848eb3

This is just one character but will be slightly different than other path characters that are truly invalid (in terms of if/what errors thrown).

Also why from a command line this works:

> echo "hey" > "somefile.txt:alt_stream" && cat somefile.txt && echo "..." && cat "somefile.txt:alt_stream"
"..."
"hey"
@mitchcapper commented on GitHub (Apr 24, 2024): > `echo "hello" > $home\foo:bar.txt ` > `jshell> new File("foo:bar.txt").createNewFile() // creates a file named "foo"` To note, the colon here is only kind of an invalid character. Really what you are doing is writing to alternate data streams. This is why the file can appear but is 'empty'. For details: https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-fscc/c54dec26-1551-4d3a-a0ea-4fa40f848eb3 This is just one character but will be slightly different than other path characters that are truly invalid (in terms of if/what errors thrown). Also why from a command line this works: ``` > echo "hey" > "somefile.txt:alt_stream" && cat somefile.txt && echo "..." && cat "somefile.txt:alt_stream" "..." "hey" ```
Author
Owner

@bioball commented on GitHub (Apr 26, 2024):

Hm... this impacts pkldoc too. This means that generated pkldoc websites should also use percent encoding, otherwise we can't write directories or filenames onto Windows.

@mitchcapper good to know. I don't think that changes the fact that we should encode these characters.

@bioball commented on GitHub (Apr 26, 2024): Hm... this impacts pkldoc too. This means that generated pkldoc websites should also use percent encoding, otherwise we can't write directories or filenames onto Windows. @mitchcapper good to know. I don't think that changes the fact that we should encode these characters.
Author
Owner

@mitchcapper commented on GitHub (Apr 26, 2024):

I don't think that changes the fact that we should encode these characters.

Right, should certainly not be on the whitelist as not valid as path only some apis for the alt stream access.

@mitchcapper commented on GitHub (Apr 26, 2024): > I don't think that changes the fact that we should encode these characters. Right, should certainly not be on the whitelist as not valid as path only some apis for the alt stream access.
Author
Owner

@bioball commented on GitHub (Apr 26, 2024):

WRT encoding:

  • Scala does not do any encoding; generating scaladoc on Windows will error if a class name contains a reserved character.
  • Kotlin's Dokka wraps reserved characters in square brackets, then the character code inside. For example, : becomes [58].
  • Doesn't seem like Swift's DocC has figured out a solution here yet: https://forums.swift.org/t/docc-colons-in-filenames/57917/27

Using percent-encoding is pretty ugly, because you get doubly-encoded URL paths.

Kotlin's encoding seems nicer. A nice benefit here is that the file paths match the URL paths. However, it works for them because square bracket chars aren't allowed as identifier names. In Pkl, the only character not allowed in an identifier is the backtick literal.

Encoding file path URL path
Percent localhost%3A0 localhost%253A0
Dokka localhost[58]0 localhost[58]0
@bioball commented on GitHub (Apr 26, 2024): WRT encoding: * Scala does not do any encoding; generating scaladoc on Windows will error if a class name contains a reserved character. * Kotlin's Dokka wraps reserved characters in square brackets, then the character code inside. For example, `:` becomes `[58]`. * Doesn't seem like Swift's DocC has figured out a solution here yet: https://forums.swift.org/t/docc-colons-in-filenames/57917/27 Using percent-encoding is pretty ugly, because you get doubly-encoded URL paths. Kotlin's encoding seems nicer. A nice benefit here is that the file paths match the URL paths. However, it works for them because square bracket chars aren't allowed as identifier names. In Pkl, the only character not allowed in an identifier is the backtick literal. | Encoding | file path | URL path | --- | --- | --- | | Percent | localhost%3A0 | localhost%253A0 | | Dokka | localhost[58]0 | localhost[58]0 |
Author
Owner

@bioball commented on GitHub (Apr 29, 2024):

Maybe we can copy Dokka. We can simply represent verbatim [ as [[. Multiple [ literals would just be double the amount of [.

I don't think we need a special way to represent ]; it only has meaning if it is preceded with regex [\d{2}.

literal encoded
foo:bar foo[58]bar
foo[58]bar foo[[58]bar
foo[[ foo[[[[
foo[:bar foo[[[58]bar
illegal foo[bar

One downside is that we now use four bytes to represent one byte, which might be a problem with URL addresses. But this is an edge case that maybe we can live with. Also, this is still better than percent-encoding, which uses five bytes for URL paths.

Another thought: I think this new encoding necessitates packages-2 (we use packages-1 as our cache dir right now).

@bioball commented on GitHub (Apr 29, 2024): Maybe we can copy Dokka. We can simply represent verbatim `[` as `[[`. Multiple `[` literals would just be double the amount of `[`. I don't think we need a special way to represent `]`; it only has meaning if it is preceded with regex `[\d{2}`. | literal | encoded | | --- | --- | | `foo:bar` | `foo[58]bar` | | `foo[58]bar` | `foo[[58]bar` | | `foo[[` | `foo[[[[` | | `foo[:bar` | `foo[[[58]bar` | | illegal | `foo[bar` | One downside is that we now use four bytes to represent one byte, which might be a problem with URL addresses. But this is an edge case that maybe we can live with. Also, this is still better than percent-encoding, which uses *five* bytes for URL paths. Another thought: I think this new encoding necessitates `packages-2` (we use `packages-1` as our cache dir right now).
Author
Owner

@bioball commented on GitHub (May 1, 2024):

Proposal here: https://github.com/apple/pkl-evolution/pull/3

@bioball commented on GitHub (May 1, 2024): Proposal here: https://github.com/apple/pkl-evolution/pull/3
Author
Owner

@bioball commented on GitHub (May 14, 2024):

This works now per https://github.com/apple/pkl/pull/489.

Might need --depth=1 in order for it to work.

@bioball commented on GitHub (May 14, 2024): This works now per https://github.com/apple/pkl/pull/489. Might need `--depth=1` in order for it to work.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/pkl#45