on Sep 12, 2022

Gauges and Kubernetes Namespaces

I didn’t think I’d be writing this

So you’ve got an app and you want to monitor how many streaming connections you have. You’ve got Datadog as a metrics collector, so it feels like we should really be a line or two of code away from a solution.

But here we are.

A Basic Gauge

Micrometer has various types of metrics. Counters, Timers, etc, but if you want a basic “what is the level of X over time”, gauge is your answer.

Here’s a basic example of using a Gauge. This is a Micronaut example, but is pretty generalizable.

@Singleton
public class ConfigStreamMetrics {

  private final AtomicInteger projectConnections;

  @Inject
  public ConfigStreamMetrics(MeterRegistry meterRegistry) {
    projectConnections =
      meterRegistry.gauge(
        "config.broadcast.project-connections",
        Tags.empty(),
        new AtomicInteger()
      );
  }

  @Scheduled(fixedDelay = "1m")
  public void recordConnections(){
    projectConnections.set(calculateConnections());
  }
}

Ok, with that code in place and feeling pretty sure that calculateConnections() was returning a consistent value. You can imagine how I felt looking at:

Why is my gauge not working?

What is happening here? The gauge is all over the place. It made sense to me that taking the avg was going to be wrong, if I have 2 servers I don’t want the average of the gauge on each of them, I want the sum. But that doesn’t explain what’s happening here.

The Key

The key is remembering how statsd with tagging works and discovering some surprising behavior from a default DataDog setup.

Metrics from micrometer come out looking like config.broadcast.project-connections.connections:0|g|#statistic:value,type:grpc. As an aside, I’d highly recommend setting up a quick clone of git@github.com:etsy/statsd.git locally that just outputs to stdout when you’re trying to get this all working.

The “aha” is that all of these metrics get aggregated based on just that string. So if you have

Server 1: config.broadcast.project-connections.connections:99|g|#statistic:value,type:grpc

Server 2: config.broadcast.project-connections.connections:0|g|#statistic:value,type:grpc

A gauge is expecting a single value at any given point, so what we end up with here is a heisengauge that could be either 0 or 99. Our sum doesn’t work, because we don’t have a two data points to sum across. We just have one value that is flapping back and forth.

The gotcha

Now we know what’s up, but it’s definitely a sad state of affairs. This is definitely not what we want, and our expected behavior here is that we should be outputting a different value per host.

It turns out that https://micronaut-projects.github.io/micronaut-micrometer/latest/guide/#metricsAndReportersDatadog hits DataDog directly, not my local Datadog agent

Since it goes straight there and we aren’t explicitly sending a host tag, these metrics are clobbering each others.

Two solutions

1) Point your metrics to your datadog agent and get the host tags that way.

2) Set CommonTags Yourself

The other solution is to calculate the same DataDog hostname that the datadog agent uses and manually add that as a commonTag to our MetricRegistry.

@Order(Integer.MAX_VALUE)
@Singleton
@RequiresMetrics
public class MetricFactory
  implements MeterRegistryConfigurer<DatadogMeterRegistry>, Ordered {
  
  @Property(name = "gcp.project-id")
  protected String projectId;

  @Override
  public void configure(DatadogMeterRegistry meterRegistry) {
    List<Tag> tags = new ArrayList<>();
    addIfNotNull(tags, "env", "MICRONAUT_ENVIRONMENTS");
    addIfNotNull(tags, "service", "DD_SERVICE");
    addIfNotNull(tags, "version", "DD_VERSION");

    if (System.getenv("SPEC_NODENAME") != null) {
      final String hostName =
        "%s.%s".formatted(System.getenv("SPEC_NODENAME"), projectId);
      tags.add(Tag.of("host", hostName));
    }

    meterRegistry.config().commonTags(tags);
  }

  private void addIfNotNull(List<Tag> tags, String tagName, String envVar) {
    if (System.getenv(envVar) != null) {
      tags.add(Tag.of(tagName, System.getenv(envVar)));
    }
  }

  @Override
  public Class<DatadogMeterRegistry> getType() {
    return DatadogMeterRegistry.class;
  }
}

Passing the node name in required some kubernetes yaml work.

    spec:
      containers:
      - image: gcr.io/-----
        name: -----------        
        env:        
        - name: SPEC_NODENAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName

Wrap