Non-unique interface-connection IDs from API #1990

Closed
opened 2025-12-29 17:21:14 +01:00 by adam · 13 comments
Owner

Originally created by @pm17788 on GitHub (Sep 7, 2018).

Environment

  • Python version: 3.5.2
  • NetBox version: 2.4.4 (incremental upgrades from 2.2.10

Steps to Reproduce

Gather all the interface connections into a set of files:

curl --silent \
	-X GET --header 'Accept: application/json' \
	--header "Authorization: Token $(cat netbox_api_key)" \
        'https://netbox.DOMAIN:443/api/dcim/interface-connections/?limit=1000' > interface-connections-0-1000.json
curl --silent \
	-X GET --header 'Accept: application/json' \
	--header "Authorization: Token $(cat netbox_api_key)" \
        'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=1000' > interface-connections-1000-2000.json
curl --silent \
	-X GET --header 'Accept: application/json' \
	--header "Authorization: Token $(cat netbox_api_key)" \
         'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=2000'> interface-connections-2000-3000.json
curl --silent \
	-X GET --header 'Accept: application/json' \
	--header "Authorization: Token $(cat netbox_api_key)" \
         'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=3000'> interface-connections-3000-4000.json
curl --silent \
	-X GET --header 'Accept: application/json' \
	--header "Authorization: Token $(cat netbox_api_key)" \
        'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=4000' > interface-connections-4000-5000.json
curl --silent \
	-X GET --header 'Accept: application/json' \
	--header "Authorization: Token $(cat netbox_api_key)" \
         'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=5000'> interface-connections-5000-6000.json
curl --silent \
	-X GET --header 'Accept: application/json' \
	--header "Authorization: Token $(cat netbox_api_key)" \
        'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=6000' > interface-connections-6000-7000.json
curl --silent \
	-X GET --header 'Accept: application/json' \
	--header "Authorization: Token $(cat netbox_api_key)" \
         'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=7000'> interface-connections-7000-8000.json
curl --silent \
	-X GET --header 'Accept: application/json' \
	--header "Authorization: Token $(cat netbox_api_key)" \
         'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=8000'> interface-connections-8000-9000.json
curl --silent \
	-X GET --header 'Accept: application/json' \
	--header "Authorization: Token $(cat netbox_api_key)" \
         'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=9000'> interface-connections-9000-10000.json

Iterate through the JSON outputs from above, and isolate just the id field from within each of the interface-connection array member within the results array:

[pm17788@gunboat-diplomat curls]$ for file in interface-connections*json; do echo "Processing: $file"; jq ".results[].id" $file >> interface-connections_ids.out; done
Processing: interface-connections-0-1000.json
Processing: interface-connections-1000-2000.json
Processing: interface-connections-2000-3000.json
Processing: interface-connections-3000-4000.json
Processing: interface-connections-4000-5000.json
Processing: interface-connections-5000-6000.json
Processing: interface-connections-6000-7000.json
Processing: interface-connections-7000-8000.json
Processing: interface-connections-8000-9000.json
Processing: interface-connections-9000-10000.json
[pm17788@gunboat-diplomat curls]$ wc -l interface-connections_ids.out
9827 interface-connections_ids.out
[pm17788@gunboat-diplomat curls]$ sort -u interface-connections_ids.out | wc -l
8252

Expected Behavior

Expected that the raw gathered count and the sort -u output be the same.

Observed Behavior

Apparently duplicated interface IDs? Not quite sure.
Duplicated IDs aren't the only thing which seems odd. If we trust that 9827 is the accurate count of overall interface connections in my instance, the bulk API-based get seems to be missing data for ~1500 interface connections.

(Edit): Formatting, clarity

Originally created by @pm17788 on GitHub (Sep 7, 2018). ### Environment * Python version: 3.5.2 * NetBox version: 2.4.4 (incremental upgrades from 2.2.10 ### Steps to Reproduce Gather all the interface connections into a set of files: ``` curl --silent \ -X GET --header 'Accept: application/json' \ --header "Authorization: Token $(cat netbox_api_key)" \ 'https://netbox.DOMAIN:443/api/dcim/interface-connections/?limit=1000' > interface-connections-0-1000.json curl --silent \ -X GET --header 'Accept: application/json' \ --header "Authorization: Token $(cat netbox_api_key)" \ 'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=1000' > interface-connections-1000-2000.json curl --silent \ -X GET --header 'Accept: application/json' \ --header "Authorization: Token $(cat netbox_api_key)" \ 'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=2000'> interface-connections-2000-3000.json curl --silent \ -X GET --header 'Accept: application/json' \ --header "Authorization: Token $(cat netbox_api_key)" \ 'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=3000'> interface-connections-3000-4000.json curl --silent \ -X GET --header 'Accept: application/json' \ --header "Authorization: Token $(cat netbox_api_key)" \ 'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=4000' > interface-connections-4000-5000.json curl --silent \ -X GET --header 'Accept: application/json' \ --header "Authorization: Token $(cat netbox_api_key)" \ 'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=5000'> interface-connections-5000-6000.json curl --silent \ -X GET --header 'Accept: application/json' \ --header "Authorization: Token $(cat netbox_api_key)" \ 'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=6000' > interface-connections-6000-7000.json curl --silent \ -X GET --header 'Accept: application/json' \ --header "Authorization: Token $(cat netbox_api_key)" \ 'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=7000'> interface-connections-7000-8000.json curl --silent \ -X GET --header 'Accept: application/json' \ --header "Authorization: Token $(cat netbox_api_key)" \ 'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=8000'> interface-connections-8000-9000.json curl --silent \ -X GET --header 'Accept: application/json' \ --header "Authorization: Token $(cat netbox_api_key)" \ 'https://netbox.DOMAIN/api/dcim/interface-connections/?limit=1000&offset=9000'> interface-connections-9000-10000.json ``` Iterate through the JSON outputs from above, and isolate *just* the `id` field from within each of the `interface-connection` array member within the `results` array: ``` [pm17788@gunboat-diplomat curls]$ for file in interface-connections*json; do echo "Processing: $file"; jq ".results[].id" $file >> interface-connections_ids.out; done Processing: interface-connections-0-1000.json Processing: interface-connections-1000-2000.json Processing: interface-connections-2000-3000.json Processing: interface-connections-3000-4000.json Processing: interface-connections-4000-5000.json Processing: interface-connections-5000-6000.json Processing: interface-connections-6000-7000.json Processing: interface-connections-7000-8000.json Processing: interface-connections-8000-9000.json Processing: interface-connections-9000-10000.json [pm17788@gunboat-diplomat curls]$ wc -l interface-connections_ids.out 9827 interface-connections_ids.out [pm17788@gunboat-diplomat curls]$ sort -u interface-connections_ids.out | wc -l 8252 ``` ### Expected Behavior Expected that the raw gathered count and the `sort -u` output be the same. ### Observed Behavior Apparently duplicated interface IDs? Not quite sure. Duplicated IDs aren't the only thing which seems odd. If we trust that 9827 is the accurate count of overall interface connections in my instance, the bulk API-based get seems to be *missing* data for ~1500 interface connections. (*Edit*): Formatting, clarity
adam closed this issue 2025-12-29 17:21:14 +01:00
Author
Owner

@pm17788 commented on GitHub (Sep 7, 2018):

This a re-work of #2391 using just the API and jq/sort rather than a custom script. As @jeremystretch pointed out, there was a decent possibility I cocked something up in my script. I really hoped (still am) that this is some sort of a brain-fart on my side.

@pm17788 commented on GitHub (Sep 7, 2018): This a re-work of #2391 using just the API and `jq`/`sort` rather than a custom script. As @jeremystretch pointed out, there was a decent possibility I cocked something up in my script. I really hoped (still am) that this is some sort of a brain-fart on my side.
Author
Owner

@pm17788 commented on GitHub (Sep 7, 2018):

Vexingly, I am unable to replicate this behaviour using nbshell:


root@netbox-1:/opt/netbox# python3 netbox/manage.py nbshell
### NetBox interactive shell (dlt-netbox-1)
### Python 3.5.2 | Django 2.0.8 | NetBox 2.4.4
### lsmodels() will show available models. Use help(<model>) for more info.
>>> connections = InterfaceConnection.objects.filter()
>>> len(connections)
9827
>>> foo = {}
>>> for connection in connections:
...     if connection.id in list(foo.keys()):
...             print("ERROR! Already seen connection ID: {}".format(connection.id))
...     else:
...             foo[connection.id] = connection
... 
>>> len(foo)
9827
>>> 

@pm17788 commented on GitHub (Sep 7, 2018): Vexingly, I am unable to replicate this behaviour using `nbshell`: ``` root@netbox-1:/opt/netbox# python3 netbox/manage.py nbshell ### NetBox interactive shell (dlt-netbox-1) ### Python 3.5.2 | Django 2.0.8 | NetBox 2.4.4 ### lsmodels() will show available models. Use help(<model>) for more info. >>> connections = InterfaceConnection.objects.filter() >>> len(connections) 9827 >>> foo = {} >>> for connection in connections: ... if connection.id in list(foo.keys()): ... print("ERROR! Already seen connection ID: {}".format(connection.id)) ... else: ... foo[connection.id] = connection ... >>> len(foo) 9827 >>> ```
Author
Owner

@pm17788 commented on GitHub (Sep 10, 2018):

D'Oh. Wrong button. Sorry about that.

@pm17788 commented on GitHub (Sep 10, 2018): D'Oh. Wrong button. Sorry about that.
Author
Owner

@jeremystretch commented on GitHub (Sep 13, 2018):

I'm not able to replicate this behavior. Note that InterfaceConnection IDs are enforced as primary keys at the database level, so it should not be possible to have duplicates.

What does the API return for the total object count at the /api/dcim/interface-connections/ endpoint? And what does InterfaceConnection.objects.count() return in nbshell?

@jeremystretch commented on GitHub (Sep 13, 2018): I'm not able to replicate this behavior. Note that InterfaceConnection IDs are enforced as primary keys at the database level, so it should not be possible to have duplicates. What does the API return for the total object `count` at the `/api/dcim/interface-connections/` endpoint? And what does `InterfaceConnection.objects.count()` return in `nbshell`?
Author
Owner

@pm17788 commented on GitHub (Sep 13, 2018):

Both nbshell and the API's count variable return, right now, "9863"

I, too, am confused by the fact that my API calls seem to return duplicates, since, as you point out, the InterfaceConnection ID is a PK.
If it wasn't for Occam's razor looking menacingly at me from a corner of my desk, I'd start to suspect DB corruption.

I even wrote a quick thing in nbshell which wrote out each connection's .id and .to_csv() to a file, and looked at what was showing up as "duplicate" to see if there was, perhaps, a pattern to what is shown as "dupes". Nope. All sorts of things seem to be mix-n-matched - things like my Core <=> AggSwitch connections or TOR <=> ServerNIC connections. Nothing jumped out, sadly.

I've just re-ran the test on my test instance which is based on a pg_dump-based restore, and I am able to replicate the disparity, which means I have an instance where I can turn the debugging to 11 or thereabout to see what's going on. Which knobs can I tweak to make it tell us useful things?

@pm17788 commented on GitHub (Sep 13, 2018): Both `nbshell` and the API's `count` variable return, right now, "9863" I, too, am confused by the fact that my API calls *seem* to return duplicates, since, as you point out, the InterfaceConnection ID is a PK. If it wasn't for Occam's razor looking menacingly at me from a corner of my desk, I'd start to suspect DB corruption. <GRIN> I even wrote a quick thing in nbshell which wrote out each connection's `.id` and `.to_csv()` to a file, and looked at what was showing up as "duplicate" to see if there was, perhaps, a pattern to what is shown as "dupes". Nope. All sorts of things seem to be mix-n-matched - things like my Core <=> AggSwitch connections or TOR <=> ServerNIC connections. Nothing jumped out, sadly. I've just re-ran the test on my test instance which is based on a `pg_dump`-based restore, and I am able to replicate the disparity, which means I have an instance where I can turn the debugging to 11 or thereabout to see what's going on. Which knobs can I tweak to make it tell us useful things?
Author
Owner

@pm17788 commented on GitHub (Oct 1, 2018):

@jeremystretch: Is there anything else I can do to help with this?

@pm17788 commented on GitHub (Oct 1, 2018): @jeremystretch: Is there anything else I can do to help with this?
Author
Owner

@LBegnaud commented on GitHub (Oct 1, 2018):

should be relatively easy to find the actual dupes, no? Admittedly i haven't spent too much time thinking about it, but wouldn't you be duplicating things by your collection of items? getting the first thousand items, then doing an offset of a thousand?

@LBegnaud commented on GitHub (Oct 1, 2018): should be relatively easy to find the actual dupes, no? Admittedly i haven't spent too much time thinking about it, but wouldn't you be duplicating things by your collection of items? getting the first thousand items, then doing an offset of a thousand?
Author
Owner

@DanSheps commented on GitHub (Oct 2, 2018):

I even wrote a quick thing in nbshell which wrote out each connection's .id and .to_csv() to a file, and looked at what was showing up as "duplicate" to see if there was, perhaps, a pattern to what is shown as "dupes". Nope. All sorts of things seem to be mix-n-matched - things like my Core <=> AggSwitch connections or TOR <=> ServerNIC connections. Nothing jumped out, sadly.

This is a bit hacky, but have you thought of dumping all rows to a CSV (maintaining all data) then looking at the rows and highlighting duplicates to see if you actually have duplicates?

@DanSheps commented on GitHub (Oct 2, 2018): > I even wrote a quick thing in nbshell which wrote out each connection's .id and .to_csv() to a file, and looked at what was showing up as "duplicate" to see if there was, perhaps, a pattern to what is shown as "dupes". Nope. All sorts of things seem to be mix-n-matched - things like my Core <=> AggSwitch connections or TOR <=> ServerNIC connections. Nothing jumped out, sadly. This is a bit hacky, but have you thought of dumping all rows to a CSV (maintaining all data) then looking at the rows and highlighting duplicates to see if you actually have duplicates?
Author
Owner

@pm17788 commented on GitHub (Oct 2, 2018):

@DanSheps: Sorry, not sure I fully understand what you're suggesting. Using nbshell to make queries, there were no duplicates at all. The duplicates only showed up using the API to enumerate all connections.


@LBegnaud: You have a good point, and if I had 10 dupes, it'd make sense (because in my example, I made 10 "paged" calls). But since I saw ~1500 duplicates, I do not believe that the offset/pagination overlap account for the dupes.

@pm17788 commented on GitHub (Oct 2, 2018): @DanSheps: Sorry, not sure I fully understand what you're suggesting. Using `nbshell` to make queries, there were no duplicates at all. The duplicates only showed up using the API to enumerate all connections. ---- @LBegnaud: You have a good point, and if I had 10 dupes, it'd make sense (because in my example, I made 10 "paged" calls). But since I saw ~1500 duplicates, I do not believe that the offset/pagination overlap account for the dupes.
Author
Owner

@DanSheps commented on GitHub (Oct 2, 2018):

What about doing what I suggested with the API then?

@DanSheps commented on GitHub (Oct 2, 2018): What about doing what I suggested with the API then?
Author
Owner

@pm17788 commented on GitHub (Oct 3, 2018):

@DanSheps: Something roughly like this:

nbshell

fp = open('/tmp/interface-connections_from_nbshell.out','w')
connections = InterfaceConnection.objects.filter()
for connection in connections:
  dump_line = "{} {}\n".format(connection.id, connection.to_csv())
  fp.write(dump_line)

Using nbshell-generated "master" to cross-check what I got out of the API:

for interface_connection in $(awk '{print $1}' interface-connections_from_nbshell.out); do if grep -q "^$interface_connection$" interface-connections_ids.out; then echo "In API: $interface_connection!"; else echo "Not in API: $(grep -h $interface_connection interface-connections_from_nbshell.out)"; fi; done | grep Not | wc -l
1583

Sample output:

for interface_connection in $(awk '{print $1}' interface-connections_from_nbshell.out); do if grep -q "^$interface_connection$" interface-connections_ids.out; then echo "In API: $interface_connection!"; else echo "Not in API: $(grep -h $interface_connection interface-connections_from_nbshell.out)"; fi; done | grep Not
Not in API: 331690 ['cr1-cg-r02-s02.slc01', 'Ethernet1/3', 'agg1-cg-r02-s02.slc01', '1/0/49', 'Connected']
Not in API: 331693 ['cr2-cg-r02-s02.slc01', 'Ethernet1/2', 'agg1-cg-r01-s02.slc01', '2/0/50', 'Connected']
Not in API: 331696 ['cr2-cg-r02-s02.slc01', 'Ethernet1/1', 'agg1-cg-r01-s02.slc01', '1/0/50', 'Connected']
Not in API: 331697 ['cr2-cg-r02-s02.slc01', 'Ethernet1/4', 'agg1-cg-r02-s02.slc01', '2/0/50', 'Connected']
Not in API: 331145 ['asw1-c15-rb-s05.use01', 'Ethernet1/11', 'server1252', 'eth0', 'Connected']
Not in API: 331146 ['asw1-c15-rb-s05.use01', 'Ethernet1/12', 'server1252', 'eth1', 'Connected']
Not in API: 331172 ['asw1-c31-ra-s05.use01', 'Ethernet1/16', 'server1346', 'eth1', 'Connected']
Not in API: 331189 ['asw1-c28-rb-s05.use01', 'Ethernet1/9', 'server1539', 'eth0', 'Connected']
Not in API: 331215 ['asw1-c31-ra-s05.use01', 'Ethernet1/18', 'server2075', 'eth0', 'Connected']

@pm17788 commented on GitHub (Oct 3, 2018): @DanSheps: Something roughly like this: `nbshell` ``` fp = open('/tmp/interface-connections_from_nbshell.out','w') connections = InterfaceConnection.objects.filter() for connection in connections: dump_line = "{} {}\n".format(connection.id, connection.to_csv()) fp.write(dump_line) ``` Using `nbshell`-generated "master" to cross-check what I got out of the API: ``` for interface_connection in $(awk '{print $1}' interface-connections_from_nbshell.out); do if grep -q "^$interface_connection$" interface-connections_ids.out; then echo "In API: $interface_connection!"; else echo "Not in API: $(grep -h $interface_connection interface-connections_from_nbshell.out)"; fi; done | grep Not | wc -l 1583 ``` Sample output: ``` for interface_connection in $(awk '{print $1}' interface-connections_from_nbshell.out); do if grep -q "^$interface_connection$" interface-connections_ids.out; then echo "In API: $interface_connection!"; else echo "Not in API: $(grep -h $interface_connection interface-connections_from_nbshell.out)"; fi; done | grep Not Not in API: 331690 ['cr1-cg-r02-s02.slc01', 'Ethernet1/3', 'agg1-cg-r02-s02.slc01', '1/0/49', 'Connected'] Not in API: 331693 ['cr2-cg-r02-s02.slc01', 'Ethernet1/2', 'agg1-cg-r01-s02.slc01', '2/0/50', 'Connected'] Not in API: 331696 ['cr2-cg-r02-s02.slc01', 'Ethernet1/1', 'agg1-cg-r01-s02.slc01', '1/0/50', 'Connected'] Not in API: 331697 ['cr2-cg-r02-s02.slc01', 'Ethernet1/4', 'agg1-cg-r02-s02.slc01', '2/0/50', 'Connected'] Not in API: 331145 ['asw1-c15-rb-s05.use01', 'Ethernet1/11', 'server1252', 'eth0', 'Connected'] Not in API: 331146 ['asw1-c15-rb-s05.use01', 'Ethernet1/12', 'server1252', 'eth1', 'Connected'] Not in API: 331172 ['asw1-c31-ra-s05.use01', 'Ethernet1/16', 'server1346', 'eth1', 'Connected'] Not in API: 331189 ['asw1-c28-rb-s05.use01', 'Ethernet1/9', 'server1539', 'eth0', 'Connected'] Not in API: 331215 ['asw1-c31-ra-s05.use01', 'Ethernet1/18', 'server2075', 'eth0', 'Connected'] ```
Author
Owner

@pm17788 commented on GitHub (Oct 3, 2018):

@DanSheps: Re-read your question with more attention (am on-call this week, so a bit hectic). I misread it initially. The dupes from the API side of things are not just in connection IDs, but in the actual data, too.

@pm17788 commented on GitHub (Oct 3, 2018): @DanSheps: Re-read your question with more attention (am on-call this week, so a bit hectic). I misread it initially. The dupes from the API side of things are not just in connection IDs, but in the actual data, too.
Author
Owner

@jeremystretch commented on GitHub (Nov 6, 2018):

Not sure what else there is we can do with this. No one else has been able to replicate the reported issue. I'm going to close this out, but please feel free to continue the discussion on the mailing list if you're still trying to track this down. If we can achieve reproduction I'm happy to re-open this as a bug.

@jeremystretch commented on GitHub (Nov 6, 2018): Not sure what else there is we can do with this. No one else has been able to replicate the reported issue. I'm going to close this out, but please feel free to continue the discussion on the mailing list if you're still trying to track this down. If we can achieve reproduction I'm happy to re-open this as a bug.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/netbox#1990