Wednesday, April 10, 2024

Continuous Deployment API Pipeline

This blogpost explains our CD (Continuous Deployment) pipeline, use it for your own convenience and as always tips and remarks are welcome!

We're using the Kong API Gateway to handle all the REST and SOAP calls to middleware and back-end (micro-)services.

Each API should be described as an Open API Specification file, with all the details of the API, Kong plugins and in case it's a REST service also the request and response schema's.

All these Open API Specifications (OAS files) are stored in our on-premise Gitlab Repository server, combined with a Gitlab Runners (agents) server to execute the pipelines for deployment.

Now let me describe our pipeline set-up, it consists of eight steps:

1.     Get the Open API Specification

2.     Validate the Open API Specification

3.     Generate the Kong decK file

4.     Replace project specific variables

5.     Validate Kong decK file

6.     Synchronize (deploy) Kong artefacts

7.     Remove Kong plugins from Open API Specification

8.     Deploy Open API Specification to Portal or API Marketplace/Platform

An API Project in Gitlab consists of the pipeline (.gitlab-ci.yml file), which includes file variables.yml and the actual pipeline in Project "library" and thirdly the API Design (OAS).

The OAS is either a yml file in the Gitlab project specified in file variables.yml or included in the Insomnia project in directory .insomnia. With Insomnia 2023.5.8 our teams can still Git Sync for free.

Step 1) Get the OAS, either variable OAS is present in file variables.yml or it's exported from Insomnia executing inso export spec within docker image kong-inso-8.4.5.

Results from this step are the oas spec name and the actual oas.yml file.

Step 2) Validate OAS, within docker image stoplight/spectral we download using curl the .spectral.yml ruleset from our library project and execute spectral lint.

Our .spectral.yml extends: [[spectral:oas, all], [spectral:asyncapi, all]] and we've added some specific errors, like:

  • we need contact name and email present, email should be a company email address
  • the oas file need x-kong-plugin-application-registration present with value auto_approve set to false

Step 3) Generate Kong decK (decK is derived from the combination of words ‘declarative’ and ‘Kong’) file within docker image kong-deck-1.36.1, after creating decK file we set the service protocol, host and port from environment specific variables in variables.yml file, and we add the project tag:

only:
- development
script:
- deck file openapi2kong --inso-compatible -s oas.yaml -o kong.yaml
- deck file patch -s kong.yaml -o kong.yaml --selector="$..services[*]" --value='protocol:"'"$DEV_PROTOCOL"'"'
- deck file patch -s kong.yaml -o kong.yaml --selector="$..services[*]" --value='host:"'"$DEV_HOST"'"'
- deck file patch -s kong.yaml -o kong.yaml --selector="$..services[*]" --value='port:'$DEV_PORT''
- deck file add-tags -s kong.yaml -o kong.yaml $projectname

Some years ago we were adding upstream with targets, but as we have a dedicated load balancer we don't need Kong to balance over targets, also setting endpoint on service gives better overview of upstream systems in Kong Manager.

Step 4) If the decK file contains project specific placeholders which should be replaced by environment specific values we add replace steps to the project .gitlab-ci.yml file.

The replacements can be done with simple linux commands within basic docker image linux-alpine-3.18.

Step 5) DecK validation, this step validates the Kong decK file within image kong-deck-1.36.1, executing both deck gateway validate and deck gateway diff.

We noticed that if a service has an existing application_instance and service is renamed this leads to deletion of old and creation of new service, validation step will pass but sync fails due to existing reference.

Step 6) Synchronize (deploy) to Kong, executing deck gateway sync within docker image kong-deck-1.36.1

Step 7) As the OAS contains Kong specific plugins that we don't want to expose in the Developer Portal, or any API Platform, we remove all the plugins.

For now we use hashtags within the OAS to specify begin and end of a plugin, using linux script within docker image linux-alpine-3.18 to remove everything between and including the hashtags.

In the future we might add smarter plugin removal using yq, as Kong plugins are well defined objects starting with x-kong-plugin.

Step 8) Using docker image linux-alpine-3.18 with curl included we can post the censored OAS to our Developer Portal.

Time for a small example, see the following snapshot of a single API design, where there is a single path on Kong: /orders with the following rules:

  • If the optional HTTP Header field X-Order-Version contains v2 the request should be routed to upstream system ORDER_V2_HOST with path v2
  • Else the request should be routed to default upstream system with path v1

This can be achieved with different projects/OAS, but sometimes this is requested within the same spec:

paths:
  /orders:
    get:
#BEGIN_KONG_PLUGINS_1
      x-kong-plugin-request-transformer-advanced:
        name: request-transformer-advanced
        config:
          replace:
            uri: /v1/orders
#END_KONG_PLUGINS_1
...
#BEGIN_KONG_PLUGINS_2
  /orders[REMOVEME]:
    get:
      x-kong-route-defaults:
        headers:
          X-Order-Version:
          - v2
      x-kong-plugin-route-transformer-advanced:
        name: route-transformer-advanced
        config:
          host: ORDER_V2_HOST
          path: /v2/orders
#END_KONG_PLUGINS_2

This OAS is valid in all OAS editors like Insomnia, Swagger etc.

The duplicated path is extended with [REMOVEME] to make the design a valid Open API Specification. After creating the decK file this [REMOVEME] will be removed in step 4. The resulted decK file remains valid for Kong and contains identical paths but with different HTTP Header configuration. ORDER_V2_HOST is an environment specific placeholder which is replaced in step 4 by the value set in variables.yml.

In step 6 the service is deployed to Kong.

In step 7 the script removes everything between #BEGIN_KONG_PLUGINS_1 and #END_KONG_PLUGINS_1 which is the Kong plugin changing the upstream uri. Also everything between #BEGIN_KONG_PLUGINS_2 and #END_KONG_PLUGINS_2 is removed, which is the duplicated path.
Removal includes the lines starting with #, including additional comments starting with #.

After deployment in step 8 the result is a single path visible in the design on the Developer Portal. The API consumers won't see the upstream systems and the technical routing based on X-Order-Version. Surely the HTTP Header X-Order-Version should be described as an optional header field with it's purpose.

Friday, January 26, 2024

Finding a new Kong slogan with AI

When configuring smtp in Gitlab one value can be set for gitlab_email_subject_suffix, the emails get this text behind the subject. Like emails for new users or password resets and so on, initially I had: 

apes should manage your APIs

From my LinkedIn profile, this sentence was added some time ago as a reaction on Gravitee's title "Gravitee vs Kong | Apes shouldn't manage APIs", see html title of https://www.gravitee.io/comparison/gravitee-vs-kong, not on the page but visible in html source and when you google.

For the ones who do or explicitly don't identify them/theirselves with apes this might be a little offensive, to avoid DEI police I had to find a new slogan. That's when I thought of AI, with the following prompt: 

have a break, then give me a nice slogan which contains humor full sentence about why developers should use Kong inc API Gateway. You get $200 is the slogan is really good

The reply from Copilot (AI engine 4?) came not further then 

Sure, here’s a slogan that might make you chuckle: “Kong Gateway: Because APIs are like bananas - they’re best when they’re peeled, sliced, and served with Kong!” 😂

On a more serious note, ... and then came the marketing stuff and sales pit

The reply from chatGPT (AI engine 3.5) was

Sure, let's give this a shot:

"Kong API Gateway: Because even developers need a smooth ride in the data traffic jam. Avoid coding traffic tickets and let Kong be your code traffic cop – making API management so smooth, it's like merging onto the information superhighway in a convertible with the top down. Your APIs deserve a Kong, not a traffic con!"

I hope this brings a smile to your face! If you have any specific preferences or adjustments, feel free to let me know.

This result was above my expectations, and my new slogan is now

your APIs deserve a Kong

See also this page header and my LinkedIn profile 👍

As I'm a newbie with AI I realize the hardest part is creating the right prompt (command), some tips that I randomly use can be found at https://arxiv.org/pdf/2312.16171v1.pdf, 26 prompt principles on page 5 with the first being:

No need to be polite with LLM so there is no need to add phrases like “please”, “if you don’t mind”, “thank you”, “I would like to”, etc., and get straight to the point.

Adding to the list is tip #27, according to Google DeepMind researchers: start with 

Take a deep breath and work on this problem step by step, ...

Which I changed to 

Have a break, then ...

#KitKat from the article and ad on https://the-message.ca/2024/01/19/kitkat-shows-how-even-ai-is-better-after-a-break/

On a different AI note, a few weeks ago I was looking for a new Teams background, I ended up with the below image taken from https://www.bing.com/images/create/i-want-a-background-wallpaper-of-size-1920-by-1080/1-65a5348511c04c0f90def08c2baf34e3?id=D1d8mIGWjcRjEmb%2fFD43BA%3d%3d&view=detailv2&idpp=genimg

After removing the lower-half of the result, now my colleagues see me sitting between the Dragon and the Gorilla 🤣


Friday, January 19, 2024

Traces in Tempo vs logs in Loki

In my last post I mentioned how to use the http-log plugin in Kong to provide logs to Loki. Also how we're gonna use OpenTelemetry to provide traces to Tempo.

The OpenTelemetry plugin requires a change in Kong config, enabling tracing by setting tracing_instrumentations to all and restart the plane.

In the configuration of the plugin we had to set the plugin config setting queue.max_batch_size from default 1 to 1000, to avoid full queue errors.

Without repeating my last post the http log provides valuable information like received time in milliseconds, source ip, incoming endpoint, method and http headers, authenticated id, Kong service and route invoked, upstream ip and port and http status code.

The traces provide similar information, same starttime in milliseconds, source ip, incoming endpoint and method, Kong route invoked, upstream name, ip and port and http status code.

In Grafana we can explore both logs from Loki and traces from Tempo, but we want to take advantage of the built-in Observability, which is now rebranded to Applications. Initially this looks promising, we have metrics generated from traces and see quickly the duration and duration distribution of all requests.

Traces: both in Explore (Tempo) and Application Kong we see all traces, each trace contains the set of spans. No further configuration needed, we have in Kong the sampling rate configured to 1, which is 100%, so far we see no reason to lower this.

Logs: in Explore (Loki) we see all logs, not in Application Kong. As Application Kong Log query is defaulted to {exporter="OTLP", job="${serviceName}"} we have to change our log stream from Kong towards Loki, new custom_fields_by_lua is Streams with value

local cjson = require "cjson" local ts=string.format('%18.0f', os.time()*1000000000) local log_payload = kong.log.serialize() local json_payload = cjson.encode(log_payload) local service = log_payload['service'] local t = { {stream = {exporter='OTLP', job='kong', service=service['name']}, values={{ts, json_payload}}}} return t

After this change all Kong http logs appear in Application Kong, of course we have to update our dashboards from kong_http_log="log-payload" to job="kong".

Now the correlation between traces and logs, we learned that this doesn't work out-of-the-box with Kong version 3.4, we need to upgrade to 3.5 in order to have the field trace_id in the logs.

As a workaround we can use the timestamp up to milliseconds, this value is identical for the log and the trace for each request.

For example I've exported a trace (5.0 kB, length 5102) containing 9 spans, the parent and 8 children from kong.router till kong.header_filter.plugin.opentelemetry, see below screenshot:

Surely this is just for fun, we see that durations are in up to a hundredth of microseconds, e.g. the key-auth plugin Duration: 71.94μs Start Time:658.25μs (11:43:50.364)

In the span we find "startTimeUnixNano": 1705661030364658200, "endTimeUnixNano": 1705661030364730000

Now when I take duration I come to 71.8 microseconds, googling both values with minus in between returns 71936, Grafana comes to 71.94μs

All nano timestamps in the exported trace end with '00', exact to 100 nanoseconds, which is 0.1 microseconds.

Clever that Google and Grafana can get more precise, but yeah, this is already about a tenth of a thousandth of a thousandth of a second...

Taking the milliseconds (1705661030364) the correlated log can be found easily, saving this json to file it's 3.3 kB (length 3390), size is around 70% of the size of the trace. These numbers are interesting because the average ingestion rates of these logs and traces are other way around:

1 log is 2/3 the size of the trace of the same request, while the average logs ingestion rate is more than 3 times the average traces ingestion rate, 14.5 GiB log versus 4.50 GiB traces. This seems like a mystery, which I leave unsolved for now.

As mentioned this exercise is more fun than practical, Grafana can provide insights on Kong latencies, number of errors, alerts and so on, but detailed information on sub-components is overkill. As soon as we have our landscape OpenTelemetry enabled, especially our upstream MicroServices, only then I expect to gain useful insights and nice service maps. Till that time I enjoy playing with dashboards on the http logs in Loki 🤣


Monday, December 18, 2023

Monitoring Kong with Grafana

After being quiet for a a decade I love to start sharing some experiences again!


My new posts will be focused on Kong API Gateway Enterprise Edition. As we started years ago with Oracle API Gateway we switched two years ago and migrated all our API's to Kong 2.3.

This year I've had many interesting journeys with Kong, I successfully migrated the datastore from Cassandra to PostgreSQL, upgraded Kong to 3.4, performed some POC's with Grafana and Dynatrace and updated and improved our CI/CD pipelines.

 

This post is about monitoring, how to monitor Kong API Gateway with Grafana Cloud, Grafana is the dashboard to visualize all metrics (Mimir), logs (Loki) and traces (Tempo).

 

Metrics:

Prometheus: Kong offers the Prometheus plugin which exposes metrics on the Kong /metrics endpoint, to be scraped by an agent, like the Grafana agent.

 

Statsd: Statsd with Kong 2.8 didn't work smooth, required a lot of field mappings, in 2.8 there existed both a statsd and statsd advanced plugin, statsd should work better from 3.x onwards, see https://konghq.com/blog/engineering/how-to-use-prometheus-to-monitor-kong-gateway

 

Logs:

Http-log: Kong offers with the http-log the possibility to send the log of each request to Grafana.

  • Advantages: all request meta-information is available in Grafana, from latencies to upstream IP. Minimal performance impact as the HTTP Log plugin uses internal queues to decouple the production of log entries from their transmission to the upstream log server.
  • Disadvantages: No out-of-the-box dashboards available
  • Links:
    https://docs.konghq.com/hub/kong-inc/http-log/

Note that this plugin works nicer with a custom field by lua added: Streams with the following value:

local cjson = require "cjson" local ts=string.format('%18.0f', os.time()*1000000000) local log_payload = kong.log.serialize() local json_payload = cjson.encode(log_payload) local service = log_payload['service'] local t = { {stream = {kong_http_log='log-payload', service=service['name']}, values={{ts, json_payload}}}} return t

When using cjson the following should be added to kong.conf: untrusted_lua_sandbox_requires=cjson

Now we can easier explore the logs in Grafana, see

  • all requests as they come in, simply select log-payload: {kong_http_log="log-payload"} |= ``
  • parsed as json: {kong_http_log="log-payload"} | json
  • filter on service, as Kong sent the service in the custom field by lua Streams: {service="ASW_Europe_Standards_API"} | json

Some example dashboards:

  • Workspace request per minute: sum by(workspace_name) (count_over_time({kong_http_log="log-payload"} | json [1m]))
  • Response status per minute: sum by(response_status) (count_over_time({kong_http_log="log-payload"} | json [1m]))
  • Service per minute: sum by(service) (count_over_time({kong_http_log="log-payload"} | json [1m]))
  • Service status per minute: sum by(service, response_status) (count_over_time({kong_http_log="log-payload"} | json [1m]))

File logs: the Grafana agent can monitor the access and error logs, especially the access logs provide useful information about total latency and upstream path, but the whole set of information is less than what the http-log provides. Also the format of the loglines need to be defined in order to get parsed...

 

Traces:

OpenTelemetry: or olly (o-11-y) is a framework to send spans and traces from Kong to Grafana, or any other OTLP enabled application.

Currently I'm exploring this feature to see what kind of extra insights this gives us when e.g. our upstream microservices also enable OpenTelemetry.

Monday, November 12, 2012

ESB's rated by analysts

After my post last year [ref: SOA Suite rated by analysts] where I looked into the evaluation of the Oracle SOA Suite by Gartner and Forrester it's time to take a new look on their saying about ESB's (Enterprise Service Bus).
For this I took a look at some newer reports from Forrester and Gartner:
  • The Forrester Wave: Enterprise Service Bus, Q2 2011
    April 25, 2011 by Ken Vollmer
  • Magic Quadrant for Application Infrastructure for Systematic Application Integration Projects
    June 20, 2012 by Jess Thompson, Yefim Natis, Massimo Pezzini, Daniel Sholler, Ross Altman and Kimihiko Iijima

As shown in the graphs the leaders are clearly IBM, Oracle, Software AG and Tibco.
Software AG - worked with webMethods for many years, good product.
Oracle - worked the last couple of years with SOA Suite, the 11g and 12c are build upon weblogic application server, hence the good score.
IBM - will be working with IBM in the near futurs, the multiple ESB offerings from IBM makes me wonder.
Now let's see what is said about IBM's multiple ESB offering:
Forrester - IBM offers three ESB's: WebSphere Enterprise Service Bus (WESB), WebSphere Enterprise Service Bus Registry Edition (WESBRE) and WebSphere Message Broker (WMB). Funny in the Forrester wave is that the actual ESB from IBM, the WebSphere ESB, scored somewhat lower than the other two.
Gartner - Some caution for IBM as despite plans to rationalize and simplify the product portfolio (e.g., in ESB), the fine-grained differences, functional overlaps and product integration challenges — for example, among WMB, WESB, WebSphere Cast Iron and the WebSphere DataPower integration appliances — make it difficult for potential users to determine the best fit for their requirements.
IBM - the faqs on the IBM website mention three different ESB's: IBM WebSphere ESB, IBM WebSphere Message Broker and IBM WebSphere DataPower Integration Appliance XI50.

Here a selection from IBM [ref: faq] on when to use which ESB:

When to Use WebSphere ESB?
  • You use WebSphere Application Server and/or your team has skills with WAS Administration and Java coding
  • You are focused on standards based interactions using XML, SOAP, and WS
  • Reliability and extensive transactional support are key requirements
When to Use WebSphere Message Broker?
  • You are currently using WebSphere Message Broker but not as an ESB
  • You are using Industry formats such as SWIFT, EDI, HL7
  • You are implementing a wide range of messaging and integration patterns
  • You have very complex transformation needs
  • Reliability and extensive transactional support are key requirements
  • To achieve very high-performance with horizontal and vertical scaling
When To Use WebSphere DataPower?
  • Ease of use is a pre-dominant consideration
  • You are transforming between XML-and-XML or XML-and-any other format
  • Your interaction patterns are relatively simple
  • You are using XML-based or WS-Security extensively
  • You require use of advanced Web services standards
  • You need to minimize message latency when adding an ESB layer
  • Your ESB must be in production very quickly
What if you require an ESB from IBM for standards based integration, with complex integration needs, high performance, advanced web service standards and want to move to production quickly?

Tuesday, November 6, 2012

FOTY0001 and logfiles roulation

A certain person named, or used the alias of, Vivek wrote a few years ago some interesting articles on his blog [ref: OracleD] about Oracle SOA Suite 10g. He experienced many flaws in Oracle OC4J Application Server and recommended the OTN discussion: Oracle BPEL + Oc4j + Jdeveloper = brain damage. Clearly he was not so happy with the features in the 10g version of Oracle SOA Suite...

He referred also to the so-called FOTY0001 errors, these often occur during XRef calls and XSL Transformations. More information about this FOTY0001 can be found in the OPMN log files. For example, a typical error is the following:
subLanguageExecutionFault - XPathExecutionError
  XPath expression failed to execute.
  Error while processing xpath expression, the expression is "ora:processXSLT('myTransformation.xsl',bpws:getVariableData('myInputMessage'))", the reason is FOTY0001: type error.
  Please verify the xpath query.

This error occurs e.g., when in JDeveloper the transform activity is opened and closed immediately. There was no time for the messagePart to be loaded, so it will be missing in the code leading to this FOTY0001 error on runtime. The proper syntax in the code is ora:processXSLT('myTransformation.xsl',bpws:getVariableData('myInputMessage','myMessagePart'))
To prevent this either click cancel in JDeveloper or wait for the message parts to load completely.

To view the FOTY0001 errors in detail the obvious way is to view them using Enterprise Manager. But an easier way is to view them directly by opening the logfiles on filesystem. Depending on the logging level the server logfiles can quickly become pretty huge. In the opmn configuration the roulation can be configured as follows, open $SOA_HOME/opmn/conf/opmn.xml and make the following changes:
<ias-component id="soa_group" status="enabled">
  <process-type id="oc4j_soa" module-id="OC4J" status="enabled">
    <module-data>
      <category id="start-parameters">
        <data id="java-options"
          value="-server

Add here the following parameters:
          -Dstdstream.filesize=10 - File size in MegaBytes
          -Dstdstream.filenumber=50 - Number of files
To separate the output and error messages add the following data element with oc4j-options inside the same category:
<data id="oc4j-options" value="-out $ORACLE_HOME/opmn/logs/oc4j_soa.out -err $ORACLE_HOME/opmn/logs/oc4j_soa.err"/>
To view the FOTY0001 error details simply open (in a good text-editor) the *.err file containing the timestamp of the error.

Monday, November 5, 2012

ORABPEL dehydration store purge scripts

By default all BPEL instances (messages) in the SOA Suite are persisted in the internal database, the so-called dehydration store. Good practice is to purge older messages, to avoid database sizing problems and to increase performance.

Unfortunately the original Oracle scripts were not sufficient to do the task, so many (consulting) companies created their own purge scripts. The good news is that Oracle realized the need and offers improved purge scripts, which are available to download from Oracle. Please take a look at the Oracle 10G FMW purge strategy whitepaper [ref: note ID 1286265.1], this document also contains the BPEL database schema.

For example, the second option from the whitepaper, multi-threaded purge can be found at Oracle Note: New BPEL 10g Purge Scripts From 10.1.3.5 MLR#2 [ref: note ID 1110833.1]. This script can be scheduled with e.g., crontab, or any other scheduling tool. The start parameters can be configured like
  • P_OLDER_THAN := sysdate-21; (purge instances older than 21 days)
  • P_ROWNUM := 10000000; (purge up to 10 million instances)
  • P_DOP := 3; (use three threads in parallel)
  • P_CHUNKSIZE := 1000; (commit per 1000 rows)
This last note contains all three scripts mentioned in the whitepaper:
  • SINGLE THREADED LOOPED PURGE PROCEDURE
  • MULTI THREADED LOOPED PURGE PROCEDURE
  • CTAS (Create Table As Select) PROCEDURE
Afterwards if needed you can fine-tune this script a little.
Tips:
  • Create a script for all state instances to purge dev and test environments, modify INSERT INTO temp_cube_instance, change WHERE state >= 5 into WHERE state >= 0
  • When purging all instances older than the configured days, it might be handy to keep the process history a little longer, like one year, modify DELETE FROM process_log, change WHERE event_date < p_older_than into WHERE event_date < SYSDATE - 365
  • When sensor data is used you might want to include this data in the purge script:
    DELETE FROM activity_sensor_values WHERE creation_date < p_older_than;
    DELETE FROM fault_sensor_values WHERE creation_date < p_older_than;
    DELETE FROM variable_sensor_values WHERE creation_date < p_older_than;
  • When using AIA ErrorHandler you might want to prevent the carthesian product by adding a max to the script. The AIA Error Handler doesn't use unique conversation id's, modify INSERT INTO temp_invoke_message, change FROM temp_cube_instance tci into FROM (SELECT MAX (cikey) cikey, conversation_id FROM temp_cube_instance GROUP BY conversation_id) tci
For the ORAESB schema the scripts can simply be found at $SOA_HOME/integration/esb/sql/other.