mirror of
https://github.com/JohnEstropia/CoreStore.git
synced 2026-01-12 04:10:36 +01:00
Investigate Changing Core Data Stack Structure #125
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @colinmorelli on GitHub (Jan 26, 2017).
CoreStore is currently, to my knowledge, using a fairly common Core Data stack setup of:
PSC -> Background Saving Context -> Main Context
I'm proposing a change (or investigation) into:
PSC -> Background Saving Context
PSC -> Main Context
I've got two primary reasons for this:
Performance
If this post is to be trusted, it looks like you can score considerably improved performance using this layout. At the very least, I can't imagine it will cost any performance, but investigation is warranted here.
Fixing Core Data Bugs
There's an fairly significant and incredibly annoying that has been in Core Data for who knows how long.
NSFetchedResultsControllerswhich are managed by a context that is not directly connected to a PSC will completely ignorefetchBatchSizeand load all objects directly in memory at the time the FRC is instantiated.This can be avoided if the main context is attached directly to the PSC, giving back some considerably performance benefits to
NSFetchedResultsControllersthat are fronting large data sets.@colinmorelli commented on GitHub (Jan 26, 2017):
Perhaps a less-risky approach could be to optionally expose a separate main queue context for
NSFRCs to be created from, keeping the existing structure as-is.@JohnEstropia commented on GitHub (Jan 30, 2017):
Hi, thanks for the proposal :)
The setup you propose is effectively similar to the one CoreStore uses (see implementation below) because CoreStore enforces a read-only main context, which means its parent-child relationship with the background saving context is one-way: it's only used for fetching. Theoretically, there should be an advantage to the current setup as objects already in the parent context's memory should not be fetched from disk again. (Although, this behavior is most likely the cause of the FRC bug you mentioned).
The change you propose actually just needs a few lines to implement:
You can go ahead and test benchmarks if you wish. In any case, I'll need to run this change first in a few test runs because I worry this may affect deadlock frequency (or maybe reduce it, I'm not sure)
@MattNewberry commented on GitHub (Feb 8, 2017):
@colinmorelli Any performance data / feedback would be great as we're also running into this issue.
@JohnEstropia commented on GitHub (Feb 9, 2017):
@colinmorelli @MattNewberry I prepared
prototype/mainContextToPSCbranch we can use to test this change. The unit test errors can be ignored, they are caused by ambiguous configurations which are not allowed in the first place.@MattNewberry commented on GitHub (Feb 9, 2017):
@JohnEstropia Thanks, we'll check it out. In our case, we're not using CoreStore as we've long rolled our own stack. That said, we handle very large initial data set downloads, and have tried to shift these downloads to the background so our users aren't stuck at a loading screen. However, as @colinmorelli spoke about as well, we've had issue with merging background changes into the main thread, and largely believe this is due to
NSFetchedResultsControllerdelegate methods constantly re-firing and reloading its entire datasets per save. We import in batches of 1,000, and have yet to come up with an architecture that facilitates speedy background -> main thread merges.@JohnEstropia commented on GitHub (Feb 23, 2017):
@colinmorelli Hi, did you get the chance to test this change? So far there doesn't seem to be any effects (good or bad) on the apps we work on. Our use case may be limited so I hope others can try this out as well.
@ruslanskorb commented on GitHub (Mar 24, 2017):
I started testing this change and can confirm that PSC -> Main Context Core Data stack setup prevents the loading all objects directly in memory at the time the FRC is instantiated.
More details over time.
@JohnEstropia commented on GitHub (Mar 25, 2017):
@ruslanskorb Thanks! I really appreciate all inputs about this change.
I haven't seen any problem with this so far as well. If it proves stable I'll include this in the CoreStore 4.0 milestone.
@ruslanskorb commented on GitHub (Mar 27, 2017):
I found two problems at the moment. The more the number of fetched objects, the more noticeable each of them. Each of these problems belongs to
ListMonitor.rootSavingContextthere are noticeable lags when accessing a particular fetched object. It looks likeNSFetchedResultsControllerworks incorrectly with the changedmainContext. To fix it we can recreateListMonitorand everything will work fine again.@JohnEstropia commented on GitHub (Mar 28, 2017):
@ruslanskorb Thanks for trying this out!
Thanks for the PR! I merged #157. It looks like we can use the same optimization in other places.
Does this happen with the develop branch as well?
@ruslanskorb commented on GitHub (Mar 28, 2017):
No, this only happens when Main Context is connected directly to PSC.
@JohnEstropia commented on GitHub (Mar 30, 2017):
@ruslanskorb I have a hunch that this is caused by the NSFetchedResultsController workaround done during merging. Are you saving transactions asynchronously or synchronously?
@ruslanskorb commented on GitHub (Mar 30, 2017):
@JohnEstropia It works the same even if we replace
observerForDidSaveNotificationbyautomaticallyMergesChangesFromParent(available on iOS 10).And, I tested saving transactions asynchronously.
@JohnEstropia commented on GitHub (Mar 30, 2017):
@ruslanskorb Thanks for the info! I'm pretty sure that those lags are caused by this.
In theory,
NSManagedObjectContext.object(with:)is designed to fetch updated objects if the main context don't have them yet (so it can let NSFetchedResultsControllers process diffs), but "fetch" here means traversing the parent tree for those objects' info. In our case, the parent is the NSPersistentStoreCoordinator, and any fetch from there will trigger locking.iOS 10 was supposed to improve this by moving those locks to SQLite-level. But it turns out that concurrency on those locks have a limited pool size so your use case of thousands of objects are causing hiccups.
In any case, I think this stack rearchitecture is not feasible yet. With regards to NSFRCs, lags are harder to manage than memory usage, so I think it's better to err on the previous behavior.
@colinmorelli @ruslanskorb @MattNewberry Thank you for your input on this. If there are no other observations/solutions I'm closing this issue in a couple of days. Feel free to comment anytime.
@ruslanskorb commented on GitHub (Mar 30, 2017):
@JohnEstropia In my previous comment I meant that
NSManagedObjectContext.object(with:)call inNSManagedObjectContext+Setupdoes not cause those lags. I tried to comment out this code but nothing has changed. After that I tried to comment outobserverForDidSaveNotificationand setautomaticallyMergesChangesFromParenttotrue. And in this case also nothing has changed. Do you mean thatNSManagedObjectContext.object(with:)is called by NSFRC internally?As I can see in Instruments after merging when we are accessing an object in NSFRC it performs a heavy request to the database. And so for each object. Even for the one to which we just accessed. It looks like NSFRC ignores cached objects after merging.
If we recreate NSFRC it will again perform only lightweight requests to the database.
Also I noticed a bunch of random memory crashes on iOS 9 with PSC -> Main Context Core Data stack setup. Recreating NSFRC in
listMonitorDidChange(_:)also fixes them.@JohnEstropia commented on GitHub (Mar 30, 2017):
I'm inclined to think so. Actually I don't see how NSFRC will find updates to its objects without any sort of fetching. Let's suppose that 100 objects satisfy an NSFRC's predicate but it initially fetches only
fetchBatchSize = 20in memory. If an external transaction updates any object from those 80, the NSFRC needs to know if the 20 it knows should be the same 20 it needs to keep (sort keys could have been updated for example). The only way to get those info is to fetch on the main queue (because of NSMainQueueConcurrencyType) from the NSPSC (which locks the store) .On the original CoreStore stack, the context can get those info from the cached objects from it's parent root context. If a fetch is required, the fetch runs from the root context, which is on a background queue (NSPrivateQueueConcurrencyType), so the only danger of lags are from NSPSC locks.
(Please correct me if anything about how I understood this works is wrong)
I was wondering why after a month of using this branch we didn't notice any effects in our debug builds. Then I realized that we fetch incremental data from our server only 20~50 records at a time, probably not enough to push limits of the SQLite locking pools. Anyway, for me I/O will generally be a bigger performance cost compared to memory usage (that's what caches are for after all) so I think we'd rather keep the original stack setup.
@ruslanskorb Thanks again for the detailed investigation :)
@ruslanskorb commented on GitHub (Apr 13, 2017):
I'm here again :-]
NSFRC gets those updates from
NSManagedObjectContextDidSavenotification.In most of cases this information is enough to update the fetched objects and their order.
If you add
-com.apple.CoreData.SQLDebug 1to Arguments Passed on Launch you will not see any sort of fetching from the database when updating objects.I do not see the same problems in CoreStoreDemo project. Probably, there is a dependence on the complexity of the database schema.
In any case, it would be nice to be able to choose one of the above discussed configurations in the next version of CoreStore when initializing the stack.
@JohnEstropia commented on GitHub (Apr 13, 2017):
@ruslanskorb Thank you for taking time to test this :) Sorry I'm not sure I understood which side you were explaining. All your observations are from the
prototype/mainContextToPSCbranch, is that correct?In your previous post you have mentioned:
But you mentioned that
so I'm a little confused where the performance drops.
Yeah, the CoreStoreDemo app doesn't have a demo for multiple, reasonably sized imports (sorry).
Heavy lags on each update defeats the purpose of changing the stack structure for performance, and I feel the degradation more than outweighs the benefit (fixing
the fetchBatchSizebug). Are you experiencing a point where NSFRC's memory usage reaches intolerable levels?@ruslanskorb commented on GitHub (Apr 13, 2017):
Yes, it is.
I see this behavior in a real application with a complex data model.
At the same time, I see a different behavior in the test application with a simple data model.
These lags are easily eliminated by re-creating the NSFRC.
At the moment, no :-] But, the level of memory usage affects the overall performance of the application. And when you have 100 000 objects in memory it is noticeable.
Anyway, there are other cases when
NSManagedObjectContextwill load all objects in memory at the same time, not only when NSFRC is instantiated.