Add Prometheus metrics about cache #8401

New Issue

adam · 2025-12-29T20:36:16+01:00

adam commented

2025-12-29 20:36:16 +01:00

Originally created by @tobikris on GitHub (Aug 2, 2023).

NetBox version

v3.5.7

Feature type

Change to existing functionality

Proposed functionality

Currently the prometheus metrics do not include metrics about the cache. The documentation suggests otherwise 0b10131564/docs/integrations/prometheus-metrics.md (L18)

I could imagine that the change of the cache backend removed this as a side-effect.
As those metrics are pretty useful, I propose to add them back.

Use case

Without the metrics it is pretty hard to determine the effectiveness of the caching. I would like to see if the cache is working as expected or if the general performance is worse than it should be.

Database changes

No response

External dependencies

No response

Originally created by @tobikris on GitHub (Aug 2, 2023). ### NetBox version v3.5.7 ### Feature type Change to existing functionality ### Proposed functionality Currently the prometheus metrics do not include metrics about the cache. The documentation suggests otherwise https://github.com/netbox-community/netbox/blob/0b10131564dc16138b9b7c7cd869d705771c229e/docs/integrations/prometheus-metrics.md?plain=1#L18 I could imagine that the [change of the cache backend](https://github.com/netbox-community/netbox/pull/6716) removed this as a side-effect. As those metrics are pretty useful, I propose to add them back. ### Use case Without the metrics it is pretty hard to determine the effectiveness of the caching. I would like to see if the cache is working as expected or if the general performance is worse than it should be. ### Database changes _No response_ ### External dependencies _No response_

adam added the type: feature label 2025-12-29 20:36:16 +01:00

adam closed this issue

2025-12-29 20:36:16 +01:00

adam commented

2025-12-29 20:36:16 +01:00

@kkthxbye-code commented on GitHub (Aug 2, 2023):

What caching though, there's basically none in use. Specifically what do you want to monitor?

@kkthxbye-code commented on GitHub (Aug 2, 2023): What caching though, there's basically none in use. Specifically what do you want to monitor?

adam commented

2025-12-29 20:36:16 +01:00

@jsenecal commented on GitHub (Aug 2, 2023):

What caching though, there's basically none in use. Specifically what do you want to monitor?

Cache hit, miss, and invalidation counters

These guys: Diff

@jsenecal commented on GitHub (Aug 2, 2023): > What caching though, there's basically none in use. Specifically what do you want to monitor? > Cache hit, miss, and invalidation counters These guys: [Diff](https://github.com/netbox-community/netbox/pull/6716/files#diff-15ee226a2efe0a66ca7c3e44694d85c323157f8a38d64e681032f926de642ab0L147-L164)

adam commented

2025-12-29 20:36:17 +01:00

@kkthxbye-code commented on GitHub (Aug 2, 2023):

@jsenecal - We don't really cache anything, which is why I asked.

@kkthxbye-code commented on GitHub (Aug 2, 2023): @jsenecal - We don't really cache anything, which is why I asked.

adam commented

2025-12-29 20:36:17 +01:00

@jsenecal commented on GitHub (Aug 2, 2023):

@kkthxbye-code yeah, I get that, I was replying to the "what". Perhaps the question to @tobikris is more "Why" :)

@jsenecal commented on GitHub (Aug 2, 2023): @kkthxbye-code yeah, I get that, I was replying to the "what". Perhaps the question to @tobikris is more "Why" :)

adam commented

2025-12-29 20:36:17 +01:00

@kkthxbye-code commented on GitHub (Aug 2, 2023):

I just want to know what specific objects he needs the cache hit, miss and invalidation counters for, because we cache the release check, config revisions and the RSS widget and that's pretty much it. Nothing were metrics would actually be of use imo.

I'm assuming the why is because he thinks we still cache stuff, so that's not really interesting.

@kkthxbye-code commented on GitHub (Aug 2, 2023): I just want to know what specific objects he needs the cache hit, miss and invalidation counters for, because we cache the release check, config revisions and the RSS widget and that's pretty much it. Nothing were metrics would actually be of use imo. I'm assuming the why is because he thinks we still cache stuff, so that's not really interesting.

adam commented

2025-12-29 20:36:17 +01:00

@tobikris commented on GitHub (Aug 2, 2023):

Your assumption is correct - based on the docs I was expecting to see cache hits etc for modelled objects.
We are having some issues with long response times and wanted to dig into different aspects. This is why we tried to use the Prometheus metrics and realized the cache metrics mentioned in docs were missing.

Thanks for the explanation. I guess this feature request can be closed again as you are right - those metrics are not important if only minor things are cached anyway.

@tobikris commented on GitHub (Aug 2, 2023): Your assumption is correct - based on the docs I was expecting to see cache hits etc for modelled objects. We are having some issues with long response times and wanted to dig into different aspects. This is why we tried to use the Prometheus metrics and realized the cache metrics mentioned in docs were missing. Thanks for the explanation. I guess this feature request can be closed again as you are right - those metrics are not important if only minor things are cached anyway.

adam commented

2025-12-29 20:36:17 +01:00

@kkthxbye-code commented on GitHub (Aug 2, 2023):

@tobikris - Long response times are usually because of high pagination size. Page load time scales pretty linearly with pagination size, so if someone set it to 1000, which is the default max, page load time can be multiple seconds.

Other stuff that slows down load times include custom link columns, custom field of the object type and for prefixes specifically, the utilization column.

@kkthxbye-code commented on GitHub (Aug 2, 2023): @tobikris - Long response times are _usually_ because of high pagination size. Page load time scales pretty linearly with pagination size, so if someone set it to 1000, which is the default max, page load time can be multiple seconds. Other stuff that slows down load times include custom link columns, custom field of the object type and for prefixes specifically, the utilization column.

adam commented

2025-12-29 20:36:18 +01:00

@tobikris commented on GitHub (Aug 3, 2023):

Thanks again. We are aware of the page size and its impact on response times. However, we are a little bit forced to load all elements in one request. This is because the current pagination implementation without cursors does not ensure correctness in case of parallel changes.

But even in our tests using pagination with different page sizes it took about 40 seconds in total to retrieve all 7000 IP addresses. Sweetpoint was at about 250 items per request.
Does that sound about right/expected? Please note that we are currently not running the latest version (v3.2.8) for unrelated reasons.

@tobikris commented on GitHub (Aug 3, 2023): Thanks again. We are aware of the page size and its impact on response times. However, we are a little bit forced to load all elements in one request. This is because the current pagination implementation without cursors does not ensure correctness in case of parallel changes. But even in our tests using pagination with different page sizes it took about 40 seconds in total to retrieve all 7000 IP addresses. Sweetpoint was at about 250 items per request. Does that sound about right/expected? Please note that we are currently not running the latest version (v3.2.8) for unrelated reasons.

adam commented

2025-12-29 20:36:18 +01:00

@jsenecal commented on GitHub (Aug 3, 2023):

It obviously depends on how fast your CPU cores are, but yeah 7000 IPs is a lot to retrieve. Maybe there is a better way, but a discussion would be more suited for this.

@jsenecal commented on GitHub (Aug 3, 2023): It obviously depends on how fast your CPU cores are, but yeah 7000 IPs is a lot to retrieve. Maybe there is a better way, but a [discussion ](https://github.com/netbox-community/netbox/discussions) would be more suited for this.

Sign in to join this conversation.

Branches Tags

main

update-changelog-comments-docs

feature-removal-issue-type

20911-dropdown

20239-plugin-menu-classes-mutable-state

21097-graphql-id-lookups

feature

fix_module_substitution

20923-dcim-templates

20044-elevation-stuck-lightmode

feature-ip-prefix-link

v4.5-beta1-release

20068-import-moduletype-attrs

20766-fix-german-translation-code-literals

20378-del-script

7604-filter-modifiers-v3

circuit-swap

12318-case-insensitive-uniqueness

20637-improve-device-q-filter

20660-script-load

19724-graphql

20614-update-ruff

14884-script

02496-max-page

19720-macaddress-interface-generic-relation

19408-circuit-terminations-export-templates

20203-openapi-check

fix-19669-api-image-download

7604-filter-modifiers

19275-fixes-interface-bulk-edit

fix-17794-get_field_value_return_list

11507-show-aggregate-and-rir-on-api

9583-add_column_specific_search_field_to_tables

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: starred/netbox#8401