Netbox API returns duplicate resources during paging with offset #10806

Closed
opened 2025-12-29 21:36:06 +01:00 by adam · 5 comments
Owner

Originally created by @dankotrajkovic on GitHub (Feb 25, 2025).

Originally assigned to: @bctiemann on GitHub.

Deployment Type

Self-hosted

NetBox Version

v4.2.3

Python Version

3.12

Steps to Reproduce

Use Python or Postman API to GET the clusters from Netbox by paging through the resources.
The real-world use case is to load a larger list of clusters using the paging mechanism.

To reproduce Run the following code

import requests


def main():
    """
    Pull netbox clusters for demo.netbox.dev using limit=5
    The idea of small limit is just to simulate a bigger database where we have to do multiple
    requests of 50 items to load a larger list of for example 250 clusters.

    The idea is to show that within the returned items few duplicates appear.
    :return:
    """

    cluster_list = [] # Place to store the clusters after with each requests call
    cluster_unique_ids = set() # Set to store the unique IDs of clusters loaded from netbox


    # Collect the clusters
    headers = {
        'Accept': 'application/json',
        'Authorization': 'Token 6a768e6363830a536ffa07abf261c1d64d365b9a'
    }

    parameters = {
        'limit': 5,
        'offset': 0
    }
    while True:
        response = requests.get('https://demo.netbox.dev/api/virtualization/clusters', headers=headers,
                                params=parameters)
        if response.status_code == 200:
            print(f'Collected clusters from Netbox with offset: {parameters["offset"]}, limit: {parameters["limit"]}')
            cluster_list.extend(response.json()['results'])
            parameters['offset'] += parameters['limit']
        if  not response.json()['next']:
            break

    # Check if there are any duplicates in the clusters
    for cluster in cluster_list:
        if cluster['name'] not in cluster_unique_ids:
            cluster_unique_ids.add(cluster['name'])
        else:
            print(f'Duplicate Cluster. Name: {cluster["name"]}, ID: {cluster["id"]}')


if __name__ == '__main__':
    main()

To Reproduce in Postman:
Issue the GET requests with following path:
https://demo.netbox.dev/api/virtualization/clusters/?limit=5&offset=5
and then issue again with
https://demo.netbox.dev/api/virtualization/clusters/?limit=5&offset=30

You will see that Cluster with ID 10 repeats in two responses. The ID might differ at times.

The Example above uses demo.netbox.dev, but the same behavior we experience on our on-prem self-hosted instance. With 150 clusters we see about 30-35 duplicates.

Expected Behavior

Expect not to have duplicates returned as we page through the clusters.

Observed Behavior

Clusters with duplicate IDs are present in the responses.

Image
Image

We see that few IDs are duplicated in the response

Collected clusters from Netbox with offset: 0, limit: 5
Collected clusters from Netbox with offset: 5, limit: 5
Collected clusters from Netbox with offset: 10, limit: 5
Collected clusters from Netbox with offset: 15, limit: 5
Collected clusters from Netbox with offset: 20, limit: 5
Collected clusters from Netbox with offset: 25, limit: 5
Collected clusters from Netbox with offset: 30, limit: 5
Duplicate Cluster. Name: gc-us-west1, ID: 10
Duplicate Cluster. Name: gc-europe-west4, ID: 22
Originally created by @dankotrajkovic on GitHub (Feb 25, 2025). Originally assigned to: @bctiemann on GitHub. ### Deployment Type Self-hosted ### NetBox Version v4.2.3 ### Python Version 3.12 ### Steps to Reproduce Use Python or Postman API to GET the clusters from Netbox by paging through the resources. The real-world use case is to load a larger list of clusters using the paging mechanism. To reproduce Run the following code ``` import requests def main(): """ Pull netbox clusters for demo.netbox.dev using limit=5 The idea of small limit is just to simulate a bigger database where we have to do multiple requests of 50 items to load a larger list of for example 250 clusters. The idea is to show that within the returned items few duplicates appear. :return: """ cluster_list = [] # Place to store the clusters after with each requests call cluster_unique_ids = set() # Set to store the unique IDs of clusters loaded from netbox # Collect the clusters headers = { 'Accept': 'application/json', 'Authorization': 'Token 6a768e6363830a536ffa07abf261c1d64d365b9a' } parameters = { 'limit': 5, 'offset': 0 } while True: response = requests.get('https://demo.netbox.dev/api/virtualization/clusters', headers=headers, params=parameters) if response.status_code == 200: print(f'Collected clusters from Netbox with offset: {parameters["offset"]}, limit: {parameters["limit"]}') cluster_list.extend(response.json()['results']) parameters['offset'] += parameters['limit'] if not response.json()['next']: break # Check if there are any duplicates in the clusters for cluster in cluster_list: if cluster['name'] not in cluster_unique_ids: cluster_unique_ids.add(cluster['name']) else: print(f'Duplicate Cluster. Name: {cluster["name"]}, ID: {cluster["id"]}') if __name__ == '__main__': main() ``` To Reproduce in Postman: Issue the GET requests with following path: https://demo.netbox.dev/api/virtualization/clusters/?limit=5&offset=5 and then issue again with https://demo.netbox.dev/api/virtualization/clusters/?limit=5&offset=30 You will see that Cluster with ID 10 repeats in two responses. The ID might differ at times. The Example above uses demo.netbox.dev, but the same behavior we experience on our on-prem self-hosted instance. With 150 clusters we see about 30-35 duplicates. ### Expected Behavior Expect not to have duplicates returned as we page through the clusters. ### Observed Behavior Clusters with duplicate IDs are present in the responses. ![Image](https://github.com/user-attachments/assets/58b7830a-f417-48cb-a2fe-e9f6c0ea01b1) ![Image](https://github.com/user-attachments/assets/692e0ebc-4c3c-45cb-a91e-8ffbe43cd912) We see that few IDs are duplicated in the response ``` Collected clusters from Netbox with offset: 0, limit: 5 Collected clusters from Netbox with offset: 5, limit: 5 Collected clusters from Netbox with offset: 10, limit: 5 Collected clusters from Netbox with offset: 15, limit: 5 Collected clusters from Netbox with offset: 20, limit: 5 Collected clusters from Netbox with offset: 25, limit: 5 Collected clusters from Netbox with offset: 30, limit: 5 Duplicate Cluster. Name: gc-us-west1, ID: 10 Duplicate Cluster. Name: gc-europe-west4, ID: 22 ```
adam added the type: bugstatus: acceptedseverity: medium labels 2025-12-29 21:36:07 +01:00
adam closed this issue 2025-12-29 21:36:07 +01:00
Author
Owner

@bctiemann commented on GitHub (Feb 27, 2025):

This seems fairly high severity as the API pagination ought to be predictable and orderly. Is this reproducible in any other models?

@bctiemann commented on GitHub (Feb 27, 2025): This seems fairly high severity as the API pagination ought to be predictable and orderly. Is this reproducible in any other models?
Author
Owner

@dankotrajkovic commented on GitHub (Feb 27, 2025):

In our local environment, we can reproduce this with the IPAddress model. This is where we initially found it but we have not been able to reproduce it on the public Netbox instance so we held back from raising the issue.

We thought it was because we have 500,000 IPAddresses in netbox across various VRFs that this was causing the duplicates. But even then the duplicates were not severe. Fetching with limit=1000 (so 500 pages) we were getting only about 10 duplicates. Still very problematic for our code and we had to write methods to recover from this, but luckily on this model the issue is severe and should lead to the quicker discovery of the problem. Its possible we are doing something wrong, but either way knowing how to fix would really help us.

@dankotrajkovic commented on GitHub (Feb 27, 2025): In our local environment, we can reproduce this with the IPAddress model. This is where we initially found it but we have not been able to reproduce it on the public Netbox instance so we held back from raising the issue. We thought it was because we have 500,000 IPAddresses in netbox across various VRFs that this was causing the duplicates. But even then the duplicates were not severe. Fetching with limit=1000 (so 500 pages) we were getting only about 10 duplicates. Still very problematic for our code and we had to write methods to recover from this, but luckily on this model the issue is severe and should lead to the quicker discovery of the problem. Its possible we are doing something wrong, but either way knowing how to fix would really help us.
Author
Owner

@atownson commented on GitHub (Feb 27, 2025):

I have seen this issue as well, when performing GET requests for Services.

@atownson commented on GitHub (Feb 27, 2025): I have seen this issue as well, when performing GET requests for Services.
Author
Owner
@cruse1977 commented on GitHub (Mar 4, 2025): https://demo.netbox.dev/api/virtualization/clusters/?offset=5&limit=5 https://demo.netbox.dev/api/virtualization/clusters/?offset=5&limit=30 ID 10 shown in both
Author
Owner

@bctiemann commented on GitHub (Mar 4, 2025):

It looks like the issue is just that Django isn't obeying the model's ordering setting when annotation is applied to the queryset, i.e. in the case of ClusterViewSet:

913405a3ae/netbox/virtualization/api/views.py (L37-L41)

Note that ordering = ["name"] for Cluster:

In [29]: queryset = Cluster.objects.all()

In [30]: [(r.id, r.name) for r in queryset[0:10]]
Out[30]: 
[(9, 'DO-AMS3'),
 (8, 'DO-BLR1'),
 (7, 'DO-FRA1'),
 (6, 'DO-LON1'),
 (1, 'DO-NYC1'),
 (2, 'DO-NYC3'),
 (3, 'DO-SFO3'),
 (5, 'DO-SGP1'),
 (4, 'DO-TOR1'),
 (36, 'gc-asia-east1')]
In [27]: queryset = Cluster.objects.prefetch_related('virtual_machines').annotate(
    ...:         allocated_vcpus=Sum('virtual_machines__vcpus'),
    ...:         allocated_memory=Sum('virtual_machines__memory'),
    ...:         allocated_disk=Sum('virtual_machines__disk'),
    ...:     )

In [28]: [(r.id, r.name) for r in queryset[0:10]]
Out[28]: 
[(4, 'DO-TOR1'),
 (34, 'gc-asia-southeast1'),
 (40, 'gc-asia-northeast3'),
 (10, 'gc-us-west1'),
 (9, 'DO-AMS3'),
 (7, 'DO-FRA1'),
 (35, 'gc-asia-southeast2'),
 (38, 'gc-asia-northeast1'),
 (15, 'gc-us-east1'),
 (6, 'DO-LON1')]

If .order_by("name") is added to the custom queryset, it sorts predictably. Same if you add &ordering=name to the query parameters on the API call.

https://code.djangoproject.com/ticket/32811

We may need to identify all the ViewSets that use annotation in this way and add explicit ordering to the queryset statement.

@bctiemann commented on GitHub (Mar 4, 2025): It looks like the issue is just that Django isn't obeying the model's `ordering` setting when annotation is applied to the queryset, i.e. in the case of `ClusterViewSet`: https://github.com/netbox-community/netbox/blob/913405a3ae93ec28b8970a2dbdd81c99508dd557/netbox/virtualization/api/views.py#L37-L41 Note that `ordering = ["name"]` for `Cluster`: ``` In [29]: queryset = Cluster.objects.all() In [30]: [(r.id, r.name) for r in queryset[0:10]] Out[30]: [(9, 'DO-AMS3'), (8, 'DO-BLR1'), (7, 'DO-FRA1'), (6, 'DO-LON1'), (1, 'DO-NYC1'), (2, 'DO-NYC3'), (3, 'DO-SFO3'), (5, 'DO-SGP1'), (4, 'DO-TOR1'), (36, 'gc-asia-east1')] ``` ``` In [27]: queryset = Cluster.objects.prefetch_related('virtual_machines').annotate( ...: allocated_vcpus=Sum('virtual_machines__vcpus'), ...: allocated_memory=Sum('virtual_machines__memory'), ...: allocated_disk=Sum('virtual_machines__disk'), ...: ) In [28]: [(r.id, r.name) for r in queryset[0:10]] Out[28]: [(4, 'DO-TOR1'), (34, 'gc-asia-southeast1'), (40, 'gc-asia-northeast3'), (10, 'gc-us-west1'), (9, 'DO-AMS3'), (7, 'DO-FRA1'), (35, 'gc-asia-southeast2'), (38, 'gc-asia-northeast1'), (15, 'gc-us-east1'), (6, 'DO-LON1')] ``` If `.order_by("name")` is added to the custom queryset, it sorts predictably. Same if you add `&ordering=name` to the query parameters on the API call. https://code.djangoproject.com/ticket/32811 We may need to identify all the ViewSets that use annotation in this way and add explicit ordering to the queryset statement.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#10806