Glob reading with the file: schema fails on some paths #184

Open
opened 2025-12-30 01:21:50 +01:00 by adam · 5 comments
Owner

Originally created by @thomaspurchas on GitHub (Jul 17, 2024).

The handling of files reads using read* is a bit broken at the moment. Behaviour changes depending on whether or not the file: schema is used, and behaviour in a pkl vs the REPL doesn't always seem consistent. Additionally there are a number of simple glob patterns that cause NullPointerExceptionss or similar incorrect behaviour.

Pkl REPL:

An unexpected error has occurred. Would you mind filing a bug report?
Cmd+Double-click the link below to open an issue.
Please copy and paste the entire error output into the issue's description, provided you can share it.

https://github.com/apple/pkl/issues/new

java.lang.NullPointerException

–– Pkl Error ––
None (cause has no message)

1 | read*("file:**/*.txt")
    ^^^^^^^^^^^^^^^^^^^^^^
at  (repl:pkl0)

Pkl 0.27.0-dev+7917ddb0c (macOS 15.0, native)

java.lang.NullPointerException
	at org.pkl.core.util.GlobResolver.splitGlobPatternIntoBaseAndWildcards(GlobResolver.java:452)
	at org.pkl.core.util.GlobResolver.resolveGlob(GlobResolver.java:483)

This error is is caused by the unsafe assumption that .getPath always returns a value. We should probably be using . getSchemeSpecificPart instead.

Evaling the following will work

a = read*("**/*.txt")

but not (addition of the file: schema)

a = read*("file:**/*.txt")

and running read*("**/*.txt") in the REPL results in

read*("**/*.txt")
–– Pkl Error ––
No resource reader is registered for scheme null.

1 | read*("**/*.txt")
    ^^^^^^^^^^^^^^^^^
at  (repl:pkl0)
Originally created by @thomaspurchas on GitHub (Jul 17, 2024). The handling of files reads using `read*` is a bit broken at the moment. Behaviour changes depending on whether or not the `file:` schema is used, and behaviour in a `pkl` vs the REPL doesn't always seem consistent. Additionally there are a number of simple glob patterns that cause `NullPointerExceptions`s or similar incorrect behaviour. Pkl REPL: ```pkl An unexpected error has occurred. Would you mind filing a bug report? Cmd+Double-click the link below to open an issue. Please copy and paste the entire error output into the issue's description, provided you can share it. https://github.com/apple/pkl/issues/new java.lang.NullPointerException –– Pkl Error –– None (cause has no message) 1 | read*("file:**/*.txt") ^^^^^^^^^^^^^^^^^^^^^^ at (repl:pkl0) Pkl 0.27.0-dev+7917ddb0c (macOS 15.0, native) java.lang.NullPointerException at org.pkl.core.util.GlobResolver.splitGlobPatternIntoBaseAndWildcards(GlobResolver.java:452) at org.pkl.core.util.GlobResolver.resolveGlob(GlobResolver.java:483) ``` This error is is caused by the [unsafe assumption that `.getPath` always returns a value](https://sourcegraph.com/github.com/apple/pkl/-/blob/pkl-core/src/main/java/org/pkl/core/util/GlobResolver.java?L448). We should probably be using [`. getSchemeSpecificPart`](https://docs.oracle.com/javase/8/docs/api/java/net/URI.html#getSchemeSpecificPart--) instead. Evaling the following will work ```pkl a = read*("**/*.txt") ``` but not (addition of the `file:` schema) ```pkl a = read*("file:**/*.txt") ``` and running `read*("**/*.txt")` in the REPL results in ``` read*("**/*.txt") –– Pkl Error –– No resource reader is registered for scheme null. 1 | read*("**/*.txt") ^^^^^^^^^^^^^^^^^ at (repl:pkl0) ```
Author
Owner

@bioball commented on GitHub (Jul 17, 2024):

The underlying issue here is that file: URIs are hierarchical; it's invalid for a file URI to not have a hierarchical path part (a path that starts with /); see RFC-8089.

Right now, read("file:foo.txt") will cause Pkl to read foo.txt relative to the process CWD, but file:foo.txt is not a valid file URI. This might be changed to a throw in the future. For similar reasons, read*("file:*") is also not supported, and would also be changed to a throw.

When globbing using a file scheme, the path part must be a hierarchical path. For example, read*("file:/path/to/*.txt") (or read("file:///path/to/*.txt" for a more canonical representation; file URIs typically have a host section.)

@bioball commented on GitHub (Jul 17, 2024): The underlying issue here is that `file:` URIs are _hierarchical_; it's invalid for a file URI to not have a hierarchical path part (a path that starts with `/`); see [RFC-8089](https://datatracker.ietf.org/doc/html/rfc8089). Right now, `read("file:foo.txt")` will cause Pkl to read `foo.txt` relative to the process CWD, but `file:foo.txt` is not a valid file URI. This might be changed to a throw in the future. For similar reasons, `read*("file:*")` is also not supported, and would also be changed to a throw. When globbing using a file scheme, the path part must be a hierarchical path. For example, `read*("file:/path/to/*.txt")` (or `read("file:///path/to/*.txt"` for a more canonical representation; file URIs typically have a host section.)
Author
Owner

@thomaspurchas commented on GitHub (Jul 17, 2024):

In that situation, how are we meant to use read() with a relative path, in particular, how can we use read() with a relative in the REPL where the schema is mandatory?

Also these nuances aren't described in the docs. The docs mention the file: scheme, and under the Globbed Read section provides examples of using read without the file: scheme. But I can't see anywhere that explicitly states relative reads shouldn't use the file: scheme.

Also for the read API, do we have to stick so closely to the RFC? Obviously we should correctly handle any RFC-8089 valid URI. But there's no reason we can't also accept technically invalid URI, but where the intent is still obvious and useful (e.g. using the file: scheme with a relative path)

@thomaspurchas commented on GitHub (Jul 17, 2024): In that situation, how are we meant to use `read()` with a relative path, in particular, how can we use `read()` with a relative in the REPL where the schema is mandatory? Also these nuances aren't described in the docs. The docs mention the `file:` scheme, and under the Globbed Read section provides examples of using `read` without the `file:` scheme. But I can't see anywhere that explicitly states relative reads shouldn't use the `file:` scheme. Also for the `read` API, do we have to stick so closely to the RFC? Obviously we should correctly handle any RFC-8089 valid URI. But there's no reason we can't also accept technically invalid URI, but where the intent is still obvious and useful (e.g. using the `file:` scheme with a relative path)
Author
Owner

@bioball commented on GitHub (Jul 17, 2024):

I don't see a strong need for relative paths with file URIs. If this is declared from within a file-based module, it should use a relative path without the scheme part. If declared within a non file-based module (e.g. from a package), what should the path be relative to? It can't be relative to the enclosing module, and also shouldn't be relative to the CWD, because we want imports to be statically analyzable.

In particular, how can we use read() with a relative in the REPL where the schema is mandatory

I think this is a tangential but definitely a problem. Ideally, you should be able to use relative file imports from within the REPL.

@bioball commented on GitHub (Jul 17, 2024): I don't see a strong need for relative paths with `file` URIs. If this is declared from within a file-based module, it should use a relative path without the scheme part. If declared within a non file-based module (e.g. from a package), what should the path be relative to? It can't be relative to the enclosing module, and also shouldn't be relative to the CWD, because we want imports to be statically analyzable. > In particular, how can we use read() with a relative in the REPL where the schema is mandatory I think this is a tangential but definitely a problem. Ideally, you should be able to use relative file imports from within the REPL.
Author
Owner

@thomaspurchas commented on GitHub (Jul 17, 2024):

That sounds reasonable. I guess then my issue is that docs aren't very clear. It's not obvious that using read() with file: and without are different. It also creates a slightly odd situation where file reading has a "special" status, in that it supports relative reads is you don't use the file: scheme, but no other scheme has that capability. Although it doesn't really make sense for other schemes to support relative paths in any situation, it still creates a slightly inconsistent read() API, that's a little surprising.

@thomaspurchas commented on GitHub (Jul 17, 2024): That sounds reasonable. I guess then my issue is that docs aren't very clear. It's not obvious that using `read()` with `file:` and without are different. It also creates a slightly odd situation where file reading has a "special" status, in that it supports relative reads is you don't use the `file:` scheme, but no other scheme has that capability. Although it doesn't really make sense for other schemes to support relative paths in any situation, it still creates a slightly inconsistent `read()` API, that's a little surprising.
Author
Owner

@bioball commented on GitHub (Jul 18, 2024):

Agree that we should improve our docs here. We have a section on relative paths, but it's easily missed and a common source of confusion.

@bioball commented on GitHub (Jul 18, 2024): Agree that we should improve our docs here. We have a [section on relative paths](https://pkl-lang.org/main/current/language-reference/index.html#relative-uris), but it's easily missed and a common source of confusion.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/pkl#184