Light Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

feat(provider): add External Metrics provider#1863

Open
jlore-decathlon wants to merge 6 commits intofluxcd:mainfrom
jlore-decathlon:feat/externalmetrics
Open

feat(provider): add External Metrics provider#1863
jlore-decathlon wants to merge 6 commits intofluxcd:mainfrom
jlore-decathlon:feat/externalmetrics

Conversation

Copy link

jlore-decathlon commented Nov 20, 2025 *
edited
Loading

Proposed addition

The current Datadog metric provider relies on their Metric API.
However, this API has pretty low rate limits, and people with a moderately sized infrastructure tend to reach these limits quite easily when scaling their usage of Flagger or datadog-based autoscaling (like KEDA).

Datadog offers a more scalable alternative by making its Cluster Agent batch requests by groups of 35 see Cluster Agent Autoscaling Metrics. It then makes these metrics available within the cluster by exposing an endpoint following Kubernetes External Metrics API.

Note

This endpoint is not documented by Datadog, as they expect people to have the agent register against the control plane as the cluster's external metrics provider and then making these metrics available through k8s API Server, removing the need to query the endpoint directly.
However, by implementing a kubernetes API, its behavior is predictable and stable enough to be used directly.

We've relied on the way KEDA implemented a similar feature during design and implementation. However, Flagger is not an autoscaling solution so we're not going to mimic the metric proxy Keda operates. We simply propose to query the external metric server directly. By doing this, we also chose to make the provider generic and compatible with any external metrics server. The downside is that we cannot abstract the way datadog names its metrics which isn't trivial.

fix: #1235

Any alternatives you've considered?

We've pondered modifying the Datadog metric provider instead of making an external metrics provider. But we felt that this had the benefit of making other external metric providers compatible and kept the code datadog-agnostic.

We could theoretically make it even more generic and use any kubernetes metric API (standard, Custom or External), but I think Flagger already offers this

Disclaimer

  • This PR was peer programmed with my colleague @mveroone.
  • We're not Go developers, yet we did our best to follow the project's coding conventions and guidelines. Any feedback is welcome and we'll be happy to rework any part of that contribution you think needs it.
  • AI disclosure : AIL 1 (see https://danielmiessler.com/blog/ai-influence-level-ail). Minor code autocomplete, but mostly manual coding and writing.
  • We've built the docker image and tested end-to-end on one of our GKE clusters against Datadog cluster agent endpoint. it seems to work as expected, but the lack of feedback from Flagger leaves room for some doubts, but we were unsure if adding some logging was a good idea.

stephenwilliams and mveroone reacted with thumbs up emoji
jlore-decathlon requested review from aryan9600 and stefanprodan as code owners November 20, 2025 15:48
jlore-decathlon marked this pull request as draft November 20, 2025 16:00
jlore-decathlon force-pushed the feat/externalmetrics branch 2 times, most recently from 85c595d to 139a34a Compare November 24, 2025 16:33
jlore-decathlon changed the title feat(externalmetrics): implement ExternalMetricsProvider for querying... feat(externalmetrics): implement ExternalMetricsProvider for querying external metrics Dec 1, 2025
jlore-decathlon marked this pull request as ready for review December 1, 2025 12:39
jlore-decathlon force-pushed the feat/externalmetrics branch 2 times, most recently from 3757b5a to 72ad54a Compare December 1, 2025 12:48
jlore-decathlon changed the title feat(externalmetrics): implement ExternalMetricsProvider for querying external metrics feat(provider): add External Metrics provider Dec 1, 2025
jlore-decathlon force-pushed the feat/externalmetrics branch from 72ad54a to 9925699 Compare December 1, 2025 13:10
mveroone reviewed Dec 1, 2025
mveroone reviewed Dec 1, 2025
Datadog provider is often meeting API rate limits on bigger
implementations. Datadog Cluster Agent can batch metric queries
and expose them through an endpoint compatible with Kubernetes External
Metrics API.

This implementations allows to use this endpoint and any other server
implementing Kubernetes External Metrics API. Including k8s API server
itself.

Co-authored-by: Johan Lore
Co-authored-by: Maxime Veroone
Signed-off-by: Johan Lore
jlore-decathlon force-pushed the feat/externalmetrics branch from 9925699 to 2ee47e0 Compare December 2, 2025 16:17
mveroone force-pushed the feat/externalmetrics branch from 31f6e7d to eeeccfc Compare December 2, 2025 16:25
Signed-off-by: Maxime Veroone
mveroone force-pushed the feat/externalmetrics branch from eeeccfc to 86cc361 Compare December 3, 2025 08:28
aryan9600 reviewed Jan 2, 2026
Copy link
Member

aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for opening this PR!

mveroone reacted with heart emoji
mveroone and others added 2 commits January 8, 2026 23:59
Co-authored-by: Sanskar Jaiswal
Signed-off-by: Maxime Veroone
Quite an extensive refactor thanks to using native k8s packages. Lots more tests and more coverage.
mveroone force-pushed the feat/externalmetrics branch from eb8b59f to 2e0a69c Compare January 19, 2026 10:32
Copy link

mveroone commented Jan 20, 2026

Note : if that's okay, we'll squash commits after a few rounds of review so we can fix the DCO

aryan9600 reviewed Mar 2, 2026
Comment on lines +63 to +65
restConfig.TLSClientConfig = rest.TLSClientConfig{
Insecure: provider.InsecureSkipVerify,
}
Copy link
Member

aryan9600 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why we override the entire tls client config here with an insecure tls configuration?

Copy link

mveroone Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None, just lack of golang skills from my part. let me know if the fix is the right one.

if tokenBytes, ok := credentials["token"]; ok {
restConfig.BearerToken = string(tokenBytes)
}
// TODO: handle user name/password auth if needed
Copy link
Member

aryan9600 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets address this TODO

Copy link

mveroone Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm choosing to just remove it.
ExternalMetrics Servers use the same principle as Kubernetes API Server, where Bearer Token auth is standard. No known need for user/password auth so far, so let's not handle needs we don't know we have.

// at the ExternalMetricsProvider's Address, using the provided query string,
// and returns the *first* result as a float64.
func (p *ExternalMetricsProvider) RunQuery(query string) (float64, error) {
// The Provider interface only allows a plain string query so decode it
Copy link
Member

aryan9600 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The Provider interface only allows a plain string query so decode it

mveroone reacted with thumbs up emoji
Copy link

mveroone Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in my commit

// where only the metricName is required.
// and returns the namespace, metricName, and labelSelector separately.
func parseExternalMetricsQuery(query string) (namespace string, metricName string, labelSelector labels.Selector, err error) {
u, err := url.Parse("dummy:///" + query)
Copy link
Member

aryan9600 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a comment explaining why we add a dummy scheme here?

Copy link

mveroone Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// ExternalMetricsProvider fetches metrics from an ExternalMetricsProvider.
type ExternalMetricsProvider struct {
timeout time.Duration
Copy link
Member

aryan9600 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to set the rest.Config's Timeout field to this value - this timeout gets used internally for the http client used to send requests.

Copy link

mveroone Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I overlooked it.
However, I'm a little uncomfortable in hardcoding it, although since we're limited by the Provider Interface / API, which I tried not to extend in this PR, that seems like an acceptable trade-off. WDYT?
Also I'm not sure if I should test it ? if it's part of the restClient, it's already tested ?

Copy link
Member

aryan9600 Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to test it. i understand your qualms about hardcoding it but as you said we're limited by the Provider interface so we just need to swallow this poor design. if you're up for it, please feel free to raise another PR which adds a timeout field to the Provider APi and modify all other providers to use that over the hardcoded default.

mveroone and others added 2 commits March 2, 2026 15:20
Co-authored-by: Sanskar Jaiswal
Signed-off-by: Maxime Veroone
* Moved the timeout config so it's correctly handled by the REST Client
* Adjusted a few comments
* Fixed a wrongful override of TLSClient
aryan9600 reviewed Mar 3, 2026
Copy link
Member

aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good - a few nits

Comment on lines +54 to +56
if err != nil || restConfig == nil {
return nil, fmt.Errorf("Not in a kubernetes cluster: %w", err)
}
Copy link
Member

aryan9600 Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err != nil || restConfig == nil {
return nil, fmt.Errorf("Not in a kubernetes cluster: %w", err)
}
if err != nil {
return nil, fmt.Errorf("not in a kubernetes cluster: %w", err)
}
if restConfig == nil {
return nil, fmt.Errorf("not in a kubernetes cluster: rest config is nil")
}

InsecureSkipVerify bool
creds map[string][]byte
builderFunc func() (*rest.Config, error)
wantErr bool
Copy link
Member

aryan9600 Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wantErr bool

no point of this field if we are not testing the error path

client: &fakeExternalMetricsClient,
}

tests := []struct {
Copy link
Member

aryan9600 Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add one test for the error path

metadata:
name: external-metric-server-token
namespace: default
data:
Copy link
Member

aryan9600 Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
data:
stringData:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

aryan9600 aryan9600 left review comments

stefanprodan Awaiting requested review from stefanprodan stefanprodan is a code owner

+1 more reviewer

mveroone mveroone left review comments

Reviewers whose approvals may not affect merge requirements

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Using external metrics from the Kubernetes API server

3 participants