After being quiet for a a decade I love to start sharing some experiences again!
My new posts will be focused on Kong API Gateway Enterprise Edition. As we started years ago with Oracle API Gateway we switched two years ago and migrated all our API's to Kong 2.3.
This year I've had many interesting journeys with Kong, I successfully
migrated the datastore from Cassandra to PostgreSQL, upgraded Kong to 3.4,
performed some POC's with Grafana and Dynatrace and updated and improved our
CI/CD pipelines.
This
post is about monitoring, how to monitor Kong API Gateway with Grafana Cloud,
Grafana is the dashboard to visualize all metrics (Mimir), logs (Loki) and traces (Tempo).
Metrics:
Prometheus: Kong offers the
Prometheus plugin which exposes metrics on the Kong /metrics endpoint, to be
scraped by an agent, like the Grafana agent.
- Advantages: out-of-the-box, Kong official dashboard available in Grafana.
- Disadvantages: agent needed, Prometheus plugin may create high
cardinality of metrics and may cause performance issues.
- Links:
https://docs.konghq.com/hub/kong-inc/prometheus/
https://docs.konghq.com/gateway/latest/production/monitoring/prometheus/
https://grafana.com/grafana/dashboards/7424-kong-official/
Statsd: Statsd with Kong 2.8
didn't work smooth, required a lot of field mappings, in 2.8 there existed both
a statsd and statsd advanced plugin, statsd should work better from 3.x
onwards, see https://konghq.com/blog/engineering/how-to-use-prometheus-to-monitor-kong-gateway
Logs:
Http-log: Kong offers with the
http-log the possibility to send the log of each request to Grafana.
- Advantages: all request meta-information is available in Grafana, from
latencies to upstream IP. Minimal performance impact as the HTTP Log plugin
uses internal queues to decouple the production of log entries from their
transmission to the upstream log server.
- Disadvantages: No out-of-the-box dashboards available
- Links:
https://docs.konghq.com/hub/kong-inc/http-log/
Note
that this plugin works nicer with a custom field by lua added: Streams with the
following value:
local cjson = require "cjson" local ts=string.format('%18.0f', os.time()*1000000000) local log_payload = kong.log.serialize() local json_payload = cjson.encode(log_payload) local service = log_payload['service'] local t = { {stream = {kong_http_log='log-payload', service=service['name']}, values={{ts, json_payload}}}} return t
When
using cjson the following should be added to kong.conf:
untrusted_lua_sandbox_requires=cjson
Now we can easier explore the logs in Grafana, see
- all requests as they come in, simply select log-payload: {kong_http_log="log-payload"} |= ``
- parsed as json: {kong_http_log="log-payload"} | json
- filter on service, as Kong sent the service in the custom field by lua Streams: {service="ASW_Europe_Standards_API"} | json
Some
example dashboards:
- Workspace request per minute: sum by(workspace_name) (count_over_time({kong_http_log="log-payload"} | json [1m]))
- Response status per minute: sum by(response_status) (count_over_time({kong_http_log="log-payload"} | json [1m]))
- Service per minute: sum by(service) (count_over_time({kong_http_log="log-payload"} | json [1m]))
- Service status per minute: sum by(service, response_status) (count_over_time({kong_http_log="log-payload"} | json [1m]))
File
logs: the Grafana agent can monitor the access and error logs, especially the
access logs provide useful information about total latency and upstream path, but the whole set of information is less than what the http-log provides. Also the format of the loglines need to be defined in order to get parsed...
Traces:
OpenTelemetry: or olly (o-11-y) is a
framework to send spans and traces from Kong to Grafana, or any other OTLP enabled application.
- Advantages: a lot of information available, won't add much if the other systems (client/upstream) don't use OpenTelemetry
- Disadvantages: not fully ready yet, request to Grafana support was
needed in order to get metrics from traces. Also Kong mentions high tracing
instrumentation sampling rate has impact of the instrumentation on Kong
Gateway’s proxy performance in production.
- Links:
https://docs.konghq.com/hub/kong-inc/opentelemetry/
https://grafana.com/docs/grafana-cloud/send-data/otlp/send-data-otlp/
Currently I'm exploring this feature to see what kind of extra insights this gives
us when e.g. our upstream microservices also enable OpenTelemetry.
No comments:
Post a Comment