mirror of
https://github.com/apple/pkl.git
synced 2026-01-11 22:30:54 +01:00
Inconsistent handling of HTTP redirects and file symlinks #93
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @odenix on GitHub (Feb 26, 2024).
I just did some manual testing. Currently, Pkl handles HTTP redirects and file symlinks differently:
A program imports both
https://example.com/bar.pklandhttps://example.com/redirect-to-bar.pkl.A program imports both
path/to/bar.pklandpath/to/symlink-to-bar.pkl.In other words: Files are canonicalized, HTTP URLs aren't. I think both should be.
@bioball commented on GitHub (Feb 27, 2024):
Hm, good catch. I think the correct behavior here is to treat them as different modules.
Let's say you have a hard link:
These are treated as different modules:
It doesn't make sense to me that making one a symbolic link for the other will make this error go away. The symlink is an implementation detail of the file system, and should be opaque to the language itself.
@odenix commented on GitHub (Feb 27, 2024):
You actually mean hard link, i.e.,
a.pklandb.pklpoint to the same inode? It's normal and expected for hard links and symbolic links to have different semantics. On the other hand, not canonicalizing symbolic links and HTTP URLs will result in unnecessary work, unnecessary inconsistencies (e.g., file/URL content changes while program is running), and unnecessary errors (e.g., incompatible types). I feel it's just asking for trouble for no real gain, especially in large, distributed programs. If canonicalization of symbolic links caused major confusion, you'd have heard of it already.Here is another question: Should https://pkg.pkl-lang.org/github.com/jamesward/pklamper/pklamper@0.3.0
and https://github.com/jamesward/pklamper/releases/download/pklamper@0.3.0/pklamper@0.3.0 be treated as unrelated packages?
Somewhat related: https://go.dev/doc/go1.4#canonicalimports
@bioball commented on GitHub (Feb 29, 2024):
Yeah, I think those should be different packages.
The core problem that I see here is that, just by looking at source code, you should be able to tell whether something is treated as the same module (and the same type) or not. If we treated symlinks as in-language aliases, you'd have a pretty hard time figuring out whether the snippet above would error or not.
@odenix commented on GitHub (Feb 29, 2024):
I can't think of a concrete problem caused by having one module per source of truth. I can think of several problems caused by having multiple modules per source of truth, especially at scale and in a nominally typed language. Users won't be surprised if
foo.bar.Bazandfoo.bar.Bazare compatible, they will be surprised if they aren't. Aliases are already everywhere in the language, and tooling can help to make sense of them.Looking at the current code, if the goal is to treat
redirect-to-urlandurlas independent modules, it doesn't make sense to do an additional security check forurlwhen accessingredirect-to-url. And the additional resolved URI cache does nothing but create the very problem you're trying to avoid, namely to have the same module for different unresolved URIs.@bioball commented on GitHub (Feb 29, 2024):
FYI: the module name is not actually used for type checking. The module URI is the actual "name" of the type.
You can test this via:
FooA.pkl
FooB.pkl (no symlink)
test.pkl
Produces error:
I can't think of a real-world scenario where treating symlinks as a different module is actually a problem. Can you illustrate one?
This is a security concern; Pkl shouldn't import sources that aren't allowed by
--allowed-modules. If there is a redirect, we want to check the redirect as well.Yeah, maybe the modulesByResolvedUri cache should go away.
@bioball commented on GitHub (Feb 29, 2024):
Some tests:
Python
a/__init__.py
b/__init__.py (symlink)
test.py
Node.js
lib/a.mjs
lib/b.mjs (symlink)
test.mjs
Go
a/a.go
b/b.go (symlink)
main.go
go.mod
Ruby
lib/a.rb
lib/b.rb (symlink)
test.rb
@bioball commented on GitHub (Feb 29, 2024):
Seems like a toss-up; Go and Python treat symlinks as their own files; Node and Ruby treat them as the same file.
@odenix commented on GitHub (Feb 29, 2024):
Comparing with other languages is a good idea. Ideally we could compare with other nominally typed config languages.
I think symlinks and redirects are ultimately the same problem. Apart from the incompatible types problem that I'll discuss further below, I think that seeing multiple "versions" of a module in a single program run because the module's source code changes externally is never beneficial. Another concern is the time and memory cost of downloading and evaluating the same source of truth multiple times in a single program run.
I feel that these two decisions don't go well together:
redirect-to-urlshould not be the same module asurlbecause that would make things difficult to understand.redirect-to-urlredirects tourlwhen specifying--allowed-modules.Here is the kind of problem I expect to see at scale when not canonicalizing module/package URIs:
package-a, which importsmodule-xviamodule-urland exports values ofmodule-xtypes.package-b, which importsmodule-xviaredirect-to-module-urland exports values ofmodule-xtypes.Result:
You have no way of exchanging values of
module-xtypes betweenpackage-aandpackage-b, and no way to enforce a single version formodule-x, only because the two packages, which may not be aware of each other, don't use the same canonical URL formodule-x(or its package). You spend the night debugging and trying to work around this.I feel it's unnecessary to invite this kind of problem. (In a structurally typed language, you wouldn't notice as long as
package-aandpackage-bimport structurally compatible versions ofmodule-x.)Whether to canonicalize module and package URIs feels like a critical decision with far-reaching consequences. Would be great to hear more opinions.