ImportableUniqueObject take long time (on the first load) #430

Open
opened 2025-12-29 15:31:40 +01:00 by adam · 4 comments
Owner

Originally created by @loicleser on GitHub (Apr 11, 2024).

Hello,

I'm testing your solution in my app. Everything looks really good and can be very helpful for me.

However, I'm encountering issues with the loading of the data. I'm trying to load a large set of movies from a JSON (40,000 movies). Everything works well with ImportableObject, but it duplicates the objects on each download. So, I implemented ImportableUniqueObject, and it takes a very long time to save the data.

With ImportableObject, it takes 1 or 2 seconds, and with ImportableUniqueObject, it takes 4 to 5 minutes.

Am I doing something wrong?

Here is my code:

class CoreDataManager {
    
    static let shared = CoreDataManager()
    
    let dataStack = DataStack(
        xcodeModelName: "___",
        migrationChain: []
    )
    
    init() {
        do {
            try dataStack.addStorageAndWait(
                SQLiteStore(
                    fileName: "MyDataModel.sqlite",
                    configuration: "Default",
                    localStorageOptions: .recreateStoreOnModelMismatch
                )
            )
        } catch {
            print("Error configuring CoreStore: \(error)")
        }
    }
}
func downloadMovies() {
    let url = URL(string: "MYJSONURL")!
    let task = URLSession.shared.dataTask(with: url) { (data, response, error) in
        guard let data = data, error == nil else {
            print("error: \(error)")
            return
        }

        do {
            let decoder = JSONDecoder()
            let movies = try decoder.decode([MovieJSON].self, from: data)
            self.dataStack.perform(
                asynchronous: { (transaction) -> Void in
                    try! transaction.importUniqueObjects(
                        Into<Movie>(),
                        sourceArray: movies
                    )
                },
                completion: { (result) -> Void in
                    switch result {
                    case .success:
                        print("success")
                    case .failure(let error):
                        print("error: \(error)")
                    }
                }
            )
        } catch {
            print("error: \(error)")
        }
    }
    task.resume()
}

class Movie: NSManagedObject, ImportableUniqueObject {
    static func uniqueID(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws -> String? {
        return source.name
    }
    
    func update(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws {
        
    }
    
    func didInsert(from source: ImportSource, in transaction: BaseDataTransaction) throws {
        self.name = source.name
    }
    
    typealias UniqueIDType = String
    
    static var uniqueIDKeyPath: String {
        return #keyPath(Movie.name)
    }
    
    typealias ImportSource = MovieJSON
    
}

Thank you for your help.

Originally created by @loicleser on GitHub (Apr 11, 2024). Hello, I'm testing your solution in my app. Everything looks really good and can be very helpful for me. However, I'm encountering issues with the loading of the data. I'm trying to load a large set of movies from a JSON (40,000 movies). Everything works well with ImportableObject, but it duplicates the objects on each download. So, I implemented ImportableUniqueObject, and it takes a very long time to save the data. With ImportableObject, it takes 1 or 2 seconds, and with ImportableUniqueObject, it takes 4 to 5 minutes. Am I doing something wrong? Here is my code: ``` class CoreDataManager { static let shared = CoreDataManager() let dataStack = DataStack( xcodeModelName: "___", migrationChain: [] ) init() { do { try dataStack.addStorageAndWait( SQLiteStore( fileName: "MyDataModel.sqlite", configuration: "Default", localStorageOptions: .recreateStoreOnModelMismatch ) ) } catch { print("Error configuring CoreStore: \(error)") } } } ``` ``` func downloadMovies() { let url = URL(string: "MYJSONURL")! let task = URLSession.shared.dataTask(with: url) { (data, response, error) in guard let data = data, error == nil else { print("error: \(error)") return } do { let decoder = JSONDecoder() let movies = try decoder.decode([MovieJSON].self, from: data) self.dataStack.perform( asynchronous: { (transaction) -> Void in try! transaction.importUniqueObjects( Into<Movie>(), sourceArray: movies ) }, completion: { (result) -> Void in switch result { case .success: print("success") case .failure(let error): print("error: \(error)") } } ) } catch { print("error: \(error)") } } task.resume() } ``` ``` class Movie: NSManagedObject, ImportableUniqueObject { static func uniqueID(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws -> String? { return source.name } func update(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws { } func didInsert(from source: ImportSource, in transaction: BaseDataTransaction) throws { self.name = source.name } typealias UniqueIDType = String static var uniqueIDKeyPath: String { return #keyPath(Movie.name) } typealias ImportSource = MovieJSON } ``` Thank you for your help.
Author
Owner

@JohnEstropia commented on GitHub (Apr 12, 2024):

That sounds about reasonable, since uniquing requires a fetch operation in addition to the write operations. Note that importUniqueObjects() already optimizes this for you by doing only one fetch operation to map existing unique IDs before doing the write operations, but a large data is a large data. I suspect that the performance bottlenecks you're seeing is due to the high volume of memory use and disk IO being done in one whole transaction.

My suggestion would be to split the decoded movies array into batches. Something like this:

extension Array {
    func chunked(into size: Int) -> [[Element]] {
        return stride(from: 0, to: count, by: size).map {
            Array(self[$0 ..< Swift.min($0 + size, count)])
        }
    }
}
...

            let movies = try decoder.decode([MovieJSON].self, from: data)
            for chunk in movies.chunked(into: 100) { // adjust depending on how large each object import is
                self.dataStack.perform(
                    asynchronous: { (transaction) -> Void in
                        try! transaction.importUniqueObjects(
                            Into<Movie>(),
                            sourceArray: chunk
                        )
                    },
                    completion: { (result) -> Void in
                        switch result {
                        case .success:
                            print("success")
                            // Merge your completion blocks somehow with something like a DispatchGroup
                        case .failure(let error):
                            print("error: \(error)")
                        }
                    }
                )
            }
...
@JohnEstropia commented on GitHub (Apr 12, 2024): That sounds about reasonable, since uniquing requires a fetch operation in addition to the write operations. Note that `importUniqueObjects()` already optimizes this for you by doing only one fetch operation to map existing unique IDs before doing the write operations, but a large data is a large data. I suspect that the performance bottlenecks you're seeing is due to the high volume of memory use and disk IO being done in one whole transaction. My suggestion would be to split the decoded `movies` array into batches. Something like this: ```swift extension Array { func chunked(into size: Int) -> [[Element]] { return stride(from: 0, to: count, by: size).map { Array(self[$0 ..< Swift.min($0 + size, count)]) } } } ``` ```swift ... let movies = try decoder.decode([MovieJSON].self, from: data) for chunk in movies.chunked(into: 100) { // adjust depending on how large each object import is self.dataStack.perform( asynchronous: { (transaction) -> Void in try! transaction.importUniqueObjects( Into<Movie>(), sourceArray: chunk ) }, completion: { (result) -> Void in switch result { case .success: print("success") // Merge your completion blocks somehow with something like a DispatchGroup case .failure(let error): print("error: \(error)") } } ) } ... ```
Author
Owner

@loicleser commented on GitHub (Apr 12, 2024):

Thank you for your response, I will conduct a test, but since I have other data to load, I don't think the solution will be sufficient. I need it to remain very fast. I will try to load the existing data in bulk and filter the new data before saving it to avoid using ImportableUniqueObject.

Thanks

@loicleser commented on GitHub (Apr 12, 2024): Thank you for your response, I will conduct a test, but since I have other data to load, I don't think the solution will be sufficient. I need it to remain very fast. I will try to load the existing data in bulk and filter the new data before saving it to avoid using ImportableUniqueObject. Thanks
Author
Owner

@JohnEstropia commented on GitHub (Apr 12, 2024):

I will try to load the existing data in bulk and filter the new data before saving it

If you check the implementation of importUniqueObjects(...), I think that's already what it does for you.

Let me know if you find an approach that works for your case so I can consider adopting it internally.

@JohnEstropia commented on GitHub (Apr 12, 2024): > I will try to load the existing data in bulk and filter the new data before saving it If you check the implementation of `importUniqueObjects(...)`, I think that's already what it does for you. Let me know if you find an approach that works for your case so I can consider adopting it internally.
Author
Owner

@loicleser commented on GitHub (Apr 12, 2024):

Hello @JohnEstropia ,

I've started running some tests, but I've encountered an issue with the following code:

class Movie: NSManagedObject, ImportableUniqueObject {
    static func uniqueID(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws -> String? {
        return source.name
    }
    
    func shouldUpdate(from source: ImportSource, in transaction: BaseDataTransaction) -> Bool {
        return false
    }
    
    func shouldInsert(from source: ImportSource, in transaction: BaseDataTransaction) -> Bool {
        return false
    }
    
    func update(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws {
        
    }
    
    func didInsert(from source: ImportSource, in transaction: BaseDataTransaction) throws {
        //self.name = source.name
    }
    
    typealias UniqueIDType = String
    
    static var uniqueIDKeyPath: String {
        return #keyPath(Movie.name)
    }
    
    typealias ImportSource = MovieJSON
}

Even though I've set shouldUpdate and shouldInsert to return false, sources are still being inserted during didInsert. Is this behavior normal? I'd like to have precise control over when objects are added or updated.

Regarding my problem, I've come up with an initial, faster solution:


let decoder = JSONDecoder()
let movies = try decoder.decode([MovieJSON].self, from: data)
DispatchQueue.main.async {
    do {
        let existingMovies = try self.dataStack.fetchAll(From<Movie>())
        let existingMovieNames = Set(existingMovies.map { $0.name })
        let filteredMovies = movies.filter { !existingMovieNames.contains($0.name) }

        self.dataStack.perform(
            asynchronous: { (transaction) -> Void in
                try! transaction.importUniqueObjects(
                    Into<Movie>(),
                    sourceArray: movies
                )
            },
            completion: { (result) -> Void in
                switch result {
                case .success:
                    print("success")
                case .failure(let error):
                    print("error: \(error)")
                }
            }
        )
    } catch {
        print("error: \(error)")
    }
}

However, in some cases, I will need to update the names of some movies, so this solution doesn't completely resolve my problem.

Thank you for your assistance.

@loicleser commented on GitHub (Apr 12, 2024): Hello @JohnEstropia , I've started running some tests, but I've encountered an issue with the following code: ``` class Movie: NSManagedObject, ImportableUniqueObject { static func uniqueID(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws -> String? { return source.name } func shouldUpdate(from source: ImportSource, in transaction: BaseDataTransaction) -> Bool { return false } func shouldInsert(from source: ImportSource, in transaction: BaseDataTransaction) -> Bool { return false } func update(from source: MovieJSON, in transaction: CoreStore.BaseDataTransaction) throws { } func didInsert(from source: ImportSource, in transaction: BaseDataTransaction) throws { //self.name = source.name } typealias UniqueIDType = String static var uniqueIDKeyPath: String { return #keyPath(Movie.name) } typealias ImportSource = MovieJSON } ``` Even though I've set shouldUpdate and shouldInsert to return false, sources are still being inserted during didInsert. Is this behavior normal? I'd like to have precise control over when objects are added or updated. Regarding my problem, I've come up with an initial, faster solution: ``` let decoder = JSONDecoder() let movies = try decoder.decode([MovieJSON].self, from: data) DispatchQueue.main.async { do { let existingMovies = try self.dataStack.fetchAll(From<Movie>()) let existingMovieNames = Set(existingMovies.map { $0.name }) let filteredMovies = movies.filter { !existingMovieNames.contains($0.name) } self.dataStack.perform( asynchronous: { (transaction) -> Void in try! transaction.importUniqueObjects( Into<Movie>(), sourceArray: movies ) }, completion: { (result) -> Void in switch result { case .success: print("success") case .failure(let error): print("error: \(error)") } } ) } catch { print("error: \(error)") } } ``` However, in some cases, I will need to update the names of some movies, so this solution doesn't completely resolve my problem. Thank you for your assistance.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/CoreStore#430