UTF16 (Swedish grapheme cluster) issues #293

Closed
opened 2025-12-29 15:28:17 +01:00 by adam · 4 comments
Owner

Originally created by @rivera-ernesto on GitHub (Nov 1, 2019).

I have described this issue here https://stackoverflow.com/q/58660170/1049134

Basically I store an identifier string with UTF16 Swedish characters:

"/EXTERNAL/Gemensam RUN/FileCloud Test/Test folder åäö/Test with Swedish characters - åäö.xlsx"

Which then don't match a fetch:

CoreStore.fetchOne(From<Record>().where(\.identifier == identifier)) // Fails

Originally created by @rivera-ernesto on GitHub (Nov 1, 2019). I have described this issue here https://stackoverflow.com/q/58660170/1049134 Basically I store an `identifier` string with UTF16 Swedish characters: `"/EXTERNAL/Gemensam RUN/FileCloud Test/Test folder åäö/Test with Swedish characters - åäö.xlsx"` Which then don't match a fetch: `CoreStore.fetchOne(From<Record>().where(\.identifier == identifier)) // Fails`
adam closed this issue 2025-12-29 15:28:17 +01:00
Author
Owner

@JohnEstropia commented on GitHub (Nov 5, 2019):

What type is identifier ?

@JohnEstropia commented on GitHub (Nov 5, 2019): What type is `identifier` ?
Author
Owner

@JohnEstropia commented on GitHub (Nov 5, 2019):

Also, what happens if you use NSPredicate instead?

CoreStore.fetchOne(
    From<Record>().where(NSPredicate(format: "%K == %@", #keyPath(Record.identifier), identifier))
) 
@JohnEstropia commented on GitHub (Nov 5, 2019): Also, what happens if you use `NSPredicate` instead? ```swift CoreStore.fetchOne( From<Record>().where(NSPredicate(format: "%K == %@", #keyPath(Record.identifier), identifier)) ) ```
Author
Owner

@rivera-ernesto commented on GitHub (Nov 6, 2019):

identifier is a Swift string. Found the problem and updated the StackOverflow issue.

In short I had two identifier instances with Swedish characters that are written with different grapheme clusters, so two binary different but both valid representations of the same String.
They are considered equal by Swift but for Core Data (and I guess because of SQLite under it) they are not considered equal, so they don't match the predicate.

It's not a problem of encoding, as both representations are in the same encoding, is a problem of grapheme clusters being valid in more than one way, and no way to "normalize" them that I could find.

Again refer to https://stackoverflow.com/q/58660170/1049134 if you are interested to know more.

@rivera-ernesto commented on GitHub (Nov 6, 2019): `identifier` is a Swift string. Found the problem and updated the StackOverflow issue. In short I had two `identifier` instances with Swedish characters that are written with different grapheme clusters, so two binary different but both valid representations of the same String. They are considered equal by Swift but for Core Data (and I guess because of SQLite under it) they are not considered equal, so they don't match the predicate. It's not a problem of encoding, as both representations are in the same encoding, is a problem of grapheme clusters being valid in more than one way, and no way to "normalize" them that I could find. Again refer to https://stackoverflow.com/q/58660170/1049134 if you are interested to know more.
Author
Owner

@JohnEstropia commented on GitHub (Nov 8, 2019):

@rivera-ernesto I see. If that's the case I'm not sure this is something that should be "fixed" at the library level. Here are some common practices for such cases:

  1. Store a canonical string in addition to your actual property value and use canonical strings as search strings (See: example1, example2)
  2. Try diacritic-insensitive query operators:
.where(format: "%K CONTAINS[cd] %@", #keyPath(Record.identifier), identifier)
@JohnEstropia commented on GitHub (Nov 8, 2019): @rivera-ernesto I see. If that's the case I'm not sure this is something that should be "fixed" at the library level. Here are some common practices for such cases: 1. Store a canonical string in addition to your actual property value and use canonical strings as search strings (See: [example1](https://stackoverflow.com/questions/17418443/normalize-or-canonicalize-string-for-core-data), [example2](https://stackoverflow.com/questions/16741632/how-to-properly-convert-to-a-canonical-string-for-searching-in-cocoa)) 2. Try diacritic-insensitive query operators: ```swift .where(format: "%K CONTAINS[cd] %@", #keyPath(Record.identifier), identifier) ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/CoreStore#293