Some sophisticated main thread deadlock #258

Closed
opened 2025-12-29 15:27:44 +01:00 by adam · 6 comments
Owner

Originally created by @iby on GitHub (Jan 16, 2019).

I feel like have discovered a treasure and can't figure out where this is coming from… Everything worked fine yesterday, rolling to earlier commits doesn't help nor deleting all the data. I've narrowed this down to the following block:

DispatchQueue.global().async(execute: {
    Logger.default.log(info: "Found \(CoreStore.defaultStack.fetchCount(From<Recording>()) as Any) existing recordings.")
})

More specifically to the logger, which accepts info message as @autoclosure and does this:

lock.lock() ; defer { lock.unlock() }
logMessage(message(), with: logLevel)

One problem I see is that the message gets acquired within that lock, which might lead to all sort of troubles. I'll report this anyway, but this doesn't seem to be the main issue here.

Another message being logged later in the code on the main thread fails to acquire that lock, because it's still waiting on the fetchCount.

Now, what is strange about this, if I do prefetch or dump core store string (either one will do) just prior logging, everything works fine without causing a deadlock:

DispatchQueue.global().async(execute: {
    _ = CoreStore.defaultStack.fetchCount(From<Recording>())
    _ = CoreStore.defaultStack.coreStoreDumpString
    Logger.default.log(info: "Found \(CoreStore.defaultStack.fetchCount(From<Recording>()) as Any) existing recordings.")
})

This makes me think there's some initialization going on (for a particular thread?) or internal CoreData bug or both, as the execution stops at the [NSManagedObjectContext performBlockAndWait:] line:

__ulock_wait 0x00007fff596e894e
_dispatch_ulock_wait 0x00007fff5955f5ac
_dispatch_thread_event_wait_slow 0x00007fff5955f68b
__DISPATCH_WAIT_FOR_QUEUE__ 0x00007fff5956ac8a
_dispatch_sync_f_slow 0x00007fff5956a8cf
-[NSManagedObjectContext performBlockAndWait:] 0x00007fff2bf053e7
(extension in CoreStore):__C.NSManagedObjectContext.fetchCount(__C.NSFetchRequest<__C.NSFetchRequestResult>) -> Swift.Optional<Swift.Int> NSManagedObjectContext+Querying.swift:444
(extension in CoreStore):__C.NSManagedObjectContext.fetchCount<A where A: CoreStore.DynamicObject>(CoreStore.From<A>, Swift.Array<CoreStore.FetchClause>) -> Swift.Optional<Swift.Int> NSManagedObjectContext+Querying.swift:179
merged CoreStore.DataStack.fetchCount<A where A: CoreStore.DynamicObject>(CoreStore.From<A>, Swift.Array<CoreStore.FetchClause>) -> Swift.Optional<Swift.Int> 0x000000010147737f
CoreStore.DataStack.fetchCount<A where A: CoreStore.DynamicObject>(CoreStore.From<A>, CoreStore.FetchClause...) -> Swift.Optional<Swift.Int> 0x0000000101477272
implicit closure #1 : @autoclosure () -> Swift.String in closure #4 () -> () in Gifox.Application.applicationDidFinishLaunching(Foundation.Notification) -> () 0x000000010006bd43
implicit closure #1 : @autoclosure () -> Swift.String in Internal.Logger.log(info: @autoclosure () -> Swift.String) -> () 0x00000001000352ec
partial apply forwarder for implicit closure #1 : @autoclosure () -> Swift.String in Internal.Logger.log(info: @autoclosure () -> Swift.String) -> () 0x0000000100035351
Willow.Logger.logMessage(_: () -> Swift.String, with: Willow.LogLevel) -> () Logger.swift:259
merged Willow.Logger.debugMessage(@autoclosure () -> Swift.String) -> () 0x0000000101e80a63
Willow.Logger.infoMessage(@autoclosure () -> Swift.String) -> () 0x0000000101e8099e
Internal.Logger.log(info: @autoclosure () -> Swift.String) -> () Log.Logger.swift:29

Is this something you came across or can recommend the best way around this problem? Updating the logger locking fixes this, but there's a deeper issue, which might occur elsewhere. I'm trying to understand and foresee such cases. 👍

P. S. And it worked perfectly fine up until today! I feel like looking at a ghost! What are the odds…

Originally created by @iby on GitHub (Jan 16, 2019). I feel like have discovered a treasure and can't figure out where this is coming from… Everything worked fine yesterday, rolling to earlier commits doesn't help nor deleting all the data. I've narrowed this down to the following block: ```swift DispatchQueue.global().async(execute: { Logger.default.log(info: "Found \(CoreStore.defaultStack.fetchCount(From<Recording>()) as Any) existing recordings.") }) ``` More specifically to the logger, which accepts `info` message as `@autoclosure` and [does this](https://github.com/Nike-Inc/Willow/blob/a8750591c8b5df1fabe7fe02baa28734305d512f/Source/Logger.swift#L168-L169): ```swift lock.lock() ; defer { lock.unlock() } logMessage(message(), with: logLevel) ``` One problem I see is that the message gets acquired within that lock, which might lead to all sort of troubles. I'll report this anyway, but this doesn't seem to be the main issue here. Another message being logged later in the code on the main thread fails to acquire that lock, because it's still waiting on the `fetchCount`. Now, what is strange about this, if I do prefetch or dump core store string (either one will do) just prior logging, everything works fine without causing a deadlock: ```swift DispatchQueue.global().async(execute: { _ = CoreStore.defaultStack.fetchCount(From<Recording>()) _ = CoreStore.defaultStack.coreStoreDumpString Logger.default.log(info: "Found \(CoreStore.defaultStack.fetchCount(From<Recording>()) as Any) existing recordings.") }) ``` This makes me think there's some initialization going on (for a particular thread?) or internal CoreData bug or both, as the execution stops at the `[NSManagedObjectContext performBlockAndWait:]` line: ```log __ulock_wait 0x00007fff596e894e _dispatch_ulock_wait 0x00007fff5955f5ac _dispatch_thread_event_wait_slow 0x00007fff5955f68b __DISPATCH_WAIT_FOR_QUEUE__ 0x00007fff5956ac8a _dispatch_sync_f_slow 0x00007fff5956a8cf -[NSManagedObjectContext performBlockAndWait:] 0x00007fff2bf053e7 (extension in CoreStore):__C.NSManagedObjectContext.fetchCount(__C.NSFetchRequest<__C.NSFetchRequestResult>) -> Swift.Optional<Swift.Int> NSManagedObjectContext+Querying.swift:444 (extension in CoreStore):__C.NSManagedObjectContext.fetchCount<A where A: CoreStore.DynamicObject>(CoreStore.From<A>, Swift.Array<CoreStore.FetchClause>) -> Swift.Optional<Swift.Int> NSManagedObjectContext+Querying.swift:179 merged CoreStore.DataStack.fetchCount<A where A: CoreStore.DynamicObject>(CoreStore.From<A>, Swift.Array<CoreStore.FetchClause>) -> Swift.Optional<Swift.Int> 0x000000010147737f CoreStore.DataStack.fetchCount<A where A: CoreStore.DynamicObject>(CoreStore.From<A>, CoreStore.FetchClause...) -> Swift.Optional<Swift.Int> 0x0000000101477272 implicit closure #1 : @autoclosure () -> Swift.String in closure #4 () -> () in Gifox.Application.applicationDidFinishLaunching(Foundation.Notification) -> () 0x000000010006bd43 implicit closure #1 : @autoclosure () -> Swift.String in Internal.Logger.log(info: @autoclosure () -> Swift.String) -> () 0x00000001000352ec partial apply forwarder for implicit closure #1 : @autoclosure () -> Swift.String in Internal.Logger.log(info: @autoclosure () -> Swift.String) -> () 0x0000000100035351 Willow.Logger.logMessage(_: () -> Swift.String, with: Willow.LogLevel) -> () Logger.swift:259 merged Willow.Logger.debugMessage(@autoclosure () -> Swift.String) -> () 0x0000000101e80a63 Willow.Logger.infoMessage(@autoclosure () -> Swift.String) -> () 0x0000000101e8099e Internal.Logger.log(info: @autoclosure () -> Swift.String) -> () Log.Logger.swift:29 ``` Is this something you came across or can recommend the best way around this problem? Updating the logger locking fixes this, but there's a deeper issue, which might occur elsewhere. I'm trying to understand and foresee such cases. 👍 P. S. And it worked perfectly fine up until today! I feel like looking at a ghost! What are the odds…
adam closed this issue 2025-12-29 15:27:44 +01:00
Author
Owner

@JohnEstropia commented on GitHub (Jan 17, 2019):

Not at all surprising for a couple of reasons:

One, always remember that the DataStack expects to runs things on the main thread. In fact, if you're on a DEBUG build it should raise an assertion here:
screen shot 2019-01-17 at 11 08 22

By design, any fetches from the DataStack should be done from the main queue or the main thread, otherwise all bets are off.

Two, if I remember correctlyDispatchQueue.global() creates a concurrent queue, which means all assumptions about which threads they run will be wrong. I see Willow uses NSRecursionLock, which ALLOWS lock reentry if lock() is called from similar threads.

@JohnEstropia commented on GitHub (Jan 17, 2019): Not at all surprising for a couple of reasons: One, always remember that the `DataStack` expects to runs things on the main thread. In fact, if you're on a DEBUG build it should raise an assertion here: <img width="712" alt="screen shot 2019-01-17 at 11 08 22" src="https://user-images.githubusercontent.com/3029684/51290814-3cb56880-1a48-11e9-95e9-84e86f7aa47a.png"> By design, any fetches from the `DataStack` should be done from the main queue or the main thread, otherwise all bets are off. Two, if I remember correctly`DispatchQueue.global()` creates a concurrent queue, which means all assumptions about which threads they run will be wrong. I see Willow uses `NSRecursionLock`, which ALLOWS lock reentry if `lock()` is called from similar threads.
Author
Owner

@JohnEstropia commented on GitHub (Jan 17, 2019):

Regarding DataStack having main-thread affinity, it is always better to create a floating transaction if you really need to fetch something from an uncontrolled (ex: concurrent) queue:

DispatchQueue.global().async(execute: {
    let transaction = CoreStore.beginUnsafe()
    _ = transaction.fetchCount(From<Recording>())
    // ...
})

Or better yet, if the data to be fetched is static then just fetch them outside the closure:

let dump = CoreStore.defaultStack.coreStoreDumpString
DispatchQueue.global().async(execute: {
    print(dump)
    // ...
})
@JohnEstropia commented on GitHub (Jan 17, 2019): Regarding DataStack having main-thread affinity, it is always better to create a floating transaction if you really need to fetch something from an uncontrolled (ex: concurrent) queue: ```swift DispatchQueue.global().async(execute: { let transaction = CoreStore.beginUnsafe() _ = transaction.fetchCount(From<Recording>()) // ... }) ``` Or better yet, if the data to be fetched is static then just fetch them outside the closure: ```swift let dump = CoreStore.defaultStack.coreStoreDumpString DispatchQueue.global().async(execute: { print(dump) // ... }) ```
Author
Owner

@iby commented on GitHub (Jan 17, 2019):

Great, thanks for the detailed response! So, am I correct to think that if one simply wants an asynchronous fetch of any kind then perform(asynchronous: …) would be a better way to do this all together?

CoreStore.perform(asynchronous: { transaction -> Int? in
    transaction.fetchCount(From<Recording>()) ?? 0
}, success: {
    Logger.default.log(info: "Found \($0) existing recordings.")
}, failure: {
    Logger.default.log(error: "Failed to fetch entity counts with error: \($0)")
})
@iby commented on GitHub (Jan 17, 2019): Great, thanks for the detailed response! So, am I correct to think that if one simply wants an asynchronous fetch of any kind then `perform(asynchronous: …)` would be a better way to do this all together? ```swift CoreStore.perform(asynchronous: { transaction -> Int? in transaction.fetchCount(From<Recording>()) ?? 0 }, success: { Logger.default.log(info: "Found \($0) existing recordings.") }, failure: { Logger.default.log(error: "Failed to fetch entity counts with error: \($0)") }) ```
Author
Owner

@JohnEstropia commented on GitHub (Jan 18, 2019):

@ianbytchek perform(asynchronous:...) would be the safest, but it would also wait for current running transactions so you might not want that if you have long running imports. So unless you want that and if you are fetching just once (that is, unaffected by race conditions), using a quick fetch with .beginUnsafe()would be fine.

@JohnEstropia commented on GitHub (Jan 18, 2019): @ianbytchek `perform(asynchronous:...)` would be the safest, but it would also wait for current running transactions so you might not want that if you have long running imports. So unless you want that and if you are fetching just once (that is, unaffected by race conditions), using a quick fetch with `.beginUnsafe()`would be fine.
Author
Owner

@JohnEstropia commented on GitHub (Jan 18, 2019):

In the future we might benefit from built-in asynchronously-fetching methods, but that's another feature for another day :)

@JohnEstropia commented on GitHub (Jan 18, 2019): In the future we might benefit from built-in asynchronously-fetching methods, but that's another feature for another day :)
Author
Owner

@iby commented on GitHub (Jan 18, 2019):

Fantastic. Thanks for the details!

@iby commented on GitHub (Jan 18, 2019): Fantastic. Thanks for the details!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/CoreStore#258