decreasing counters in deployed environments * prometheus/client_ruby * Discussion #305 * GitHub

coldnebo
Mar 5, 2024

we're seeing decreasing counters in our deployed servers. We're using Kubernetes pods with puma set to 2 workers, it appears that Prometheus::Client.registry is returning two different registries (one per worker process)... and our rack exporter is being directed to available worker by puma, so the count varies... here's how that looks when monitoring logs locally under controlled circumstances:

user@user2-l:../~$ curl -sL http://user2-l.dhcp.company.com:38080/app/metrics # TYPE http_request_count counter # HELP http_request_count The count of HTTP requests handled by the Rack application. http_request_count 11.0 user@user2-l:../~$ curl -sL http://user2-l.dhcp.company.com:38080/app/metrics # TYPE http_request_count counter # HELP http_request_count The count of HTTP requests handled by the Rack application. http_request_count 12.0 user@user2-l:../~$ curl -sL http://user2-l.dhcp.company.com:38080/app/metrics # TYPE http_request_count counter # HELP http_request_count The count of HTTP requests handled by the Rack application. http_request_count 9.0

Notice that the counter resets to 9. If we look at a lot of this we see another request bounce up to 13, then down to 10. The total number of requests is always higher than either of these counters. This seems to demonstrate that there are two registry instances (each worker process has its own Singleton).

I tried to move the registry creation outside of the process by using Puma's before_fork, but that didn't seem to change the behavior, which raised questions about whether the client is fork-safe.

I'm not sure about what I've found, so it could be for other reasons. Are there any guides that people can recommend for the right way to setup the prometheus client in the context of multiple processes? (we haven't started using multithreaded Rails yet, but if someone has encountered this there, any ideas are welcome).

AFAIK there is no way to scrape specific workers in puma, but if there was, then we could fix this problem by scraping all the workers separately instead of just the pods.

Answered by dmagliola

Mar 5, 2024

Are you using the DirectFileStore to store your metrics?

Read more here. Make sure to read the caveats!

View full answer

Replies: 1 comment 3 replies

dmagliola
Mar 5, 2024
Maintainer

Are you using the DirectFileStore to store your metrics?

Read more here. Make sure to read the caveats!

3 replies

coldnebo Mar 5, 2024
Author

we avoided DirectFileStore because one of our apps exploded into thousands of files. indeed, I have read this part before:

Large numbers of files: Because there is an individual file per metric and per process (which is done to optimize for observation performance), you may end up with a large number of files. We don't currently have a solution for this problem, but we're working on it.

So I assume that this combined with the other issue open for prefork servers means that Prometheus client doesn't support multiple-processes/multithreading in Ruby without DirectFileStore. Good to know, I'll take a deeper look at DirectFileStore. In pods there shouldn't be much of a difference, AFAIK, the filesystem is a memory filesystem I think?

SuperQ Mar 5, 2024
Maintainer

The typical solution for puma is to ID the workers by their worker ID number, instead of the OS PID. This way the number of files is limited to the number of Puma workers. The files are re-used between worker process restarts.

dmagliola Mar 5, 2024
Maintainer

Prometheus client doesn't support multiple-processes/multithreading in Ruby without DirectFileStore

Right. The only reason DirectFileStore exists is precisely to support multiprocess. Multithreading you can totally do with the default store, but not multiprocess.

one of our apps exploded into thousands of files

Yeah, this, together with performance of exports is the #1 thing we want to fix. We're struggling a bit for time to dedicate to it, but it's the top of our list.

Answer selected by coldnebo

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decreasing counters in deployed environments #305

Uh oh!

{{title}}

Uh oh!

coldnebo
Mar 5, 2024

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

dmagliola
Mar 5, 2024
Maintainer

Uh oh!

{{title}}

Uh oh!

coldnebo Mar 5, 2024
Author

Uh oh!

{{title}}

Uh oh!

SuperQ Mar 5, 2024
Maintainer

Uh oh!

{{title}}

Uh oh!

dmagliola Mar 5, 2024
Maintainer

Select a reply

Uh oh!

decreasing counters in deployed environments #305

Uh oh!

coldnebo Mar 5, 2024

Replies: 1 comment * 3 replies

Uh oh!

Uh oh!

dmagliola Mar 5, 2024 Maintainer

Uh oh!

coldnebo Mar 5, 2024 Author

Uh oh!

SuperQ Mar 5, 2024 Maintainer

Uh oh!

dmagliola Mar 5, 2024 Maintainer

coldnebo
Mar 5, 2024

Replies: 1 comment 3 replies

dmagliola
Mar 5, 2024
Maintainer

coldnebo Mar 5, 2024
Author

SuperQ Mar 5, 2024
Maintainer

dmagliola Mar 5, 2024
Maintainer