Monday, December 18, 2023

Monitoring Kong with Grafana

After being quiet for a a decade I love to start sharing some experiences again!


My new posts will be focused on Kong API Gateway Enterprise Edition. As we started years ago with Oracle API Gateway we switched two years ago and migrated all our API's to Kong 2.3.

This year I've had many interesting journeys with Kong, I successfully migrated the datastore from Cassandra to PostgreSQL, upgraded Kong to 3.4, performed some POC's with Grafana and Dynatrace and updated and improved our CI/CD pipelines.

 

This post is about monitoring, how to monitor Kong API Gateway with Grafana Cloud, Grafana is the dashboard to visualize all metrics (Mimir), logs (Loki) and traces (Tempo).

 

Metrics:

Prometheus: Kong offers the Prometheus plugin which exposes metrics on the Kong /metrics endpoint, to be scraped by an agent, like the Grafana agent.

 

Statsd: Statsd with Kong 2.8 didn't work smooth, required a lot of field mappings, in 2.8 there existed both a statsd and statsd advanced plugin, statsd should work better from 3.x onwards, see https://konghq.com/blog/engineering/how-to-use-prometheus-to-monitor-kong-gateway

 

Logs:

Http-log: Kong offers with the http-log the possibility to send the log of each request to Grafana.

  • Advantages: all request meta-information is available in Grafana, from latencies to upstream IP. Minimal performance impact as the HTTP Log plugin uses internal queues to decouple the production of log entries from their transmission to the upstream log server.
  • Disadvantages: No out-of-the-box dashboards available
  • Links:
    https://docs.konghq.com/hub/kong-inc/http-log/

Note that this plugin works nicer with a custom field by lua added: Streams with the following value:

local cjson = require "cjson" local ts=string.format('%18.0f', os.time()*1000000000) local log_payload = kong.log.serialize() local json_payload = cjson.encode(log_payload) local service = log_payload['service'] local t = { {stream = {kong_http_log='log-payload', service=service['name']}, values={{ts, json_payload}}}} return t

When using cjson the following should be added to kong.conf: untrusted_lua_sandbox_requires=cjson

Now we can easier explore the logs in Grafana, see

  • all requests as they come in, simply select log-payload: {kong_http_log="log-payload"} |= ``
  • parsed as json: {kong_http_log="log-payload"} | json
  • filter on service, as Kong sent the service in the custom field by lua Streams: {service="ASW_Europe_Standards_API"} | json

Some example dashboards:

  • Workspace request per minute: sum by(workspace_name) (count_over_time({kong_http_log="log-payload"} | json [1m]))
  • Response status per minute: sum by(response_status) (count_over_time({kong_http_log="log-payload"} | json [1m]))
  • Service per minute: sum by(service) (count_over_time({kong_http_log="log-payload"} | json [1m]))
  • Service status per minute: sum by(service, response_status) (count_over_time({kong_http_log="log-payload"} | json [1m]))

File logs: the Grafana agent can monitor the access and error logs, especially the access logs provide useful information about total latency and upstream path, but the whole set of information is less than what the http-log provides. Also the format of the loglines need to be defined in order to get parsed...

 

Traces:

OpenTelemetry: or olly (o-11-y) is a framework to send spans and traces from Kong to Grafana, or any other OTLP enabled application.

Currently I'm exploring this feature to see what kind of extra insights this gives us when e.g. our upstream microservices also enable OpenTelemetry.

Monday, November 12, 2012

ESB's rated by analysts

After my post last year [ref: SOA Suite rated by analysts] where I looked into the evaluation of the Oracle SOA Suite by Gartner and Forrester it's time to take a new look on their saying about ESB's (Enterprise Service Bus).
For this I took a look at some newer reports from Forrester and Gartner:
  • The Forrester Wave: Enterprise Service Bus, Q2 2011
    April 25, 2011 by Ken Vollmer
  • Magic Quadrant for Application Infrastructure for Systematic Application Integration Projects
    June 20, 2012 by Jess Thompson, Yefim Natis, Massimo Pezzini, Daniel Sholler, Ross Altman and Kimihiko Iijima

As shown in the graphs the leaders are clearly IBM, Oracle, Software AG and Tibco.
Software AG - worked with webMethods for many years, good product.
Oracle - worked the last couple of years with SOA Suite, the 11g and 12c are build upon weblogic application server, hence the good score.
IBM - will be working with IBM in the near futurs, the multiple ESB offerings from IBM makes me wonder.
Now let's see what is said about IBM's multiple ESB offering:
Forrester - IBM offers three ESB's: WebSphere Enterprise Service Bus (WESB), WebSphere Enterprise Service Bus Registry Edition (WESBRE) and WebSphere Message Broker (WMB). Funny in the Forrester wave is that the actual ESB from IBM, the WebSphere ESB, scored somewhat lower than the other two.
Gartner - Some caution for IBM as despite plans to rationalize and simplify the product portfolio (e.g., in ESB), the fine-grained differences, functional overlaps and product integration challenges — for example, among WMB, WESB, WebSphere Cast Iron and the WebSphere DataPower integration appliances — make it difficult for potential users to determine the best fit for their requirements.
IBM - the faqs on the IBM website mention three different ESB's: IBM WebSphere ESB, IBM WebSphere Message Broker and IBM WebSphere DataPower Integration Appliance XI50.

Here a selection from IBM [ref: faq] on when to use which ESB:

When to Use WebSphere ESB?
  • You use WebSphere Application Server and/or your team has skills with WAS Administration and Java coding
  • You are focused on standards based interactions using XML, SOAP, and WS
  • Reliability and extensive transactional support are key requirements
When to Use WebSphere Message Broker?
  • You are currently using WebSphere Message Broker but not as an ESB
  • You are using Industry formats such as SWIFT, EDI, HL7
  • You are implementing a wide range of messaging and integration patterns
  • You have very complex transformation needs
  • Reliability and extensive transactional support are key requirements
  • To achieve very high-performance with horizontal and vertical scaling
When To Use WebSphere DataPower?
  • Ease of use is a pre-dominant consideration
  • You are transforming between XML-and-XML or XML-and-any other format
  • Your interaction patterns are relatively simple
  • You are using XML-based or WS-Security extensively
  • You require use of advanced Web services standards
  • You need to minimize message latency when adding an ESB layer
  • Your ESB must be in production very quickly
What if you require an ESB from IBM for standards based integration, with complex integration needs, high performance, advanced web service standards and want to move to production quickly?

Tuesday, November 6, 2012

FOTY0001 and logfiles roulation

A certain person named, or used the alias of, Vivek wrote a few years ago some interesting articles on his blog [ref: OracleD] about Oracle SOA Suite 10g. He experienced many flaws in Oracle OC4J Application Server and recommended the OTN discussion: Oracle BPEL + Oc4j + Jdeveloper = brain damage. Clearly he was not so happy with the features in the 10g version of Oracle SOA Suite...

He referred also to the so-called FOTY0001 errors, these often occur during XRef calls and XSL Transformations. More information about this FOTY0001 can be found in the OPMN log files. For example, a typical error is the following:
subLanguageExecutionFault - XPathExecutionError
  XPath expression failed to execute.
  Error while processing xpath expression, the expression is "ora:processXSLT('myTransformation.xsl',bpws:getVariableData('myInputMessage'))", the reason is FOTY0001: type error.
  Please verify the xpath query.

This error occurs e.g., when in JDeveloper the transform activity is opened and closed immediately. There was no time for the messagePart to be loaded, so it will be missing in the code leading to this FOTY0001 error on runtime. The proper syntax in the code is ora:processXSLT('myTransformation.xsl',bpws:getVariableData('myInputMessage','myMessagePart'))
To prevent this either click cancel in JDeveloper or wait for the message parts to load completely.

To view the FOTY0001 errors in detail the obvious way is to view them using Enterprise Manager. But an easier way is to view them directly by opening the logfiles on filesystem. Depending on the logging level the server logfiles can quickly become pretty huge. In the opmn configuration the roulation can be configured as follows, open $SOA_HOME/opmn/conf/opmn.xml and make the following changes:
<ias-component id="soa_group" status="enabled">
  <process-type id="oc4j_soa" module-id="OC4J" status="enabled">
    <module-data>
      <category id="start-parameters">
        <data id="java-options"
          value="-server

Add here the following parameters:
          -Dstdstream.filesize=10 - File size in MegaBytes
          -Dstdstream.filenumber=50 - Number of files
To separate the output and error messages add the following data element with oc4j-options inside the same category:
<data id="oc4j-options" value="-out $ORACLE_HOME/opmn/logs/oc4j_soa.out -err $ORACLE_HOME/opmn/logs/oc4j_soa.err"/>
To view the FOTY0001 error details simply open (in a good text-editor) the *.err file containing the timestamp of the error.

Monday, November 5, 2012

ORABPEL dehydration store purge scripts

By default all BPEL instances (messages) in the SOA Suite are persisted in the internal database, the so-called dehydration store. Good practice is to purge older messages, to avoid database sizing problems and to increase performance.

Unfortunately the original Oracle scripts were not sufficient to do the task, so many (consulting) companies created their own purge scripts. The good news is that Oracle realized the need and offers improved purge scripts, which are available to download from Oracle. Please take a look at the Oracle 10G FMW purge strategy whitepaper [ref: note ID 1286265.1], this document also contains the BPEL database schema.

For example, the second option from the whitepaper, multi-threaded purge can be found at Oracle Note: New BPEL 10g Purge Scripts From 10.1.3.5 MLR#2 [ref: note ID 1110833.1]. This script can be scheduled with e.g., crontab, or any other scheduling tool. The start parameters can be configured like
  • P_OLDER_THAN := sysdate-21; (purge instances older than 21 days)
  • P_ROWNUM := 10000000; (purge up to 10 million instances)
  • P_DOP := 3; (use three threads in parallel)
  • P_CHUNKSIZE := 1000; (commit per 1000 rows)
This last note contains all three scripts mentioned in the whitepaper:
  • SINGLE THREADED LOOPED PURGE PROCEDURE
  • MULTI THREADED LOOPED PURGE PROCEDURE
  • CTAS (Create Table As Select) PROCEDURE
Afterwards if needed you can fine-tune this script a little.
Tips:
  • Create a script for all state instances to purge dev and test environments, modify INSERT INTO temp_cube_instance, change WHERE state >= 5 into WHERE state >= 0
  • When purging all instances older than the configured days, it might be handy to keep the process history a little longer, like one year, modify DELETE FROM process_log, change WHERE event_date < p_older_than into WHERE event_date < SYSDATE - 365
  • When sensor data is used you might want to include this data in the purge script:
    DELETE FROM activity_sensor_values WHERE creation_date < p_older_than;
    DELETE FROM fault_sensor_values WHERE creation_date < p_older_than;
    DELETE FROM variable_sensor_values WHERE creation_date < p_older_than;
  • When using AIA ErrorHandler you might want to prevent the carthesian product by adding a max to the script. The AIA Error Handler doesn't use unique conversation id's, modify INSERT INTO temp_invoke_message, change FROM temp_cube_instance tci into FROM (SELECT MAX (cikey) cikey, conversation_id FROM temp_cube_instance GROUP BY conversation_id) tci
For the ORAESB schema the scripts can simply be found at $SOA_HOME/integration/esb/sql/other.

Tuesday, June 7, 2011

SOA Suite rated by analysts

As my focus moved in the last decade from BizTalk to webMethods to SAP PI to nowadays Oracle integration it's interesting for me to see how Oracle's SOA Suite is positioned by analyst firms. Of course the evaluation of Oracle's position is based on the functionality provided by the Oracle Fusion Middleware (OFM) 11gR1 family of products, as the 10g version is at it's end, especially 10.1.3.4 [ref: note ID 1128203.1]:
This document intends to inform customers using SOA Suite 10.1.3.4.x components about the forthcoming end of the error correction period after August 31st, 2010
Let's see what Gartner says, Gartner uses magic quadrants to position the competing players in a specific technology market. Based on the Completeness of vision and Ability to execute Gartner rates vendors as Challengers, Leaders, Visionaries and Niche players. The main magic quadrant I looked at is Magic Quadrant for Application Infrastructure for Systematic Application Integration Projects, dated October 2010. The same month Gartner also published three other magic quadrants where Oracle 11g was evaluated: Application Infrastructure for Systematic SOA-Style Application Projects, Business Process Management Suites and Shared SOA Interoperability Infrastructure Projects. In all these magic quadrants Oracle is positioned as a leader.

SOA Suite strengths:
  • Oracle Fusion Middleware (OFM) is a large and fast-growing business that positions Oracle as the second-largest application infrastructure middleware vendor in the market. The technology is supported by a vast network of partners, and thousands of organizations in virtually every geography and in multiple vertical industries have successfully deployed the current or previous versions of OFM, in a large number of cases to support large and business-critical application integration scenarios.
  • Synergies with large Oracle DBMSs and packaged application businesses could potentially create plenty of opportunities for cross-selling OFM technologies to support application integration projects.
  • OFM provides a comprehensive, integrated, and feature-rich application infrastructure offerings, also providing leading technologies to support application integration requirements.
  • The OFM road map addresses key integration technologies (e.g., more-powerful mapping and transformation and new unified adapter architecture) and emerging requirements (e.g., support for integration of mobile applications).
SOA Suite cautions:
  • The relentless pace of Oracle's acquisitions in the packaged applications and application integration middleware markets (e.g., BEA Systems, Sun Microsystems and AmberPoint) requires further technology integration work, and poses migration and upgrade challenges for preacquisition product users.
  • Despite significant adoption, the OFM 11gR1 product set requires more proof points about its use in complex and large-scale, real-life deployments.
  • Oracle's campaign management features are weak. Oracle doesn't offer integration as a service, although it has partnerships in place for this market.
  • The migration path from prior-generation application integration technologies coming from acquisitions to the strategic Oracle SOA Suite 11gR1 is still onerous for some clients.
  • Some Oracle clients are experiencing licensing and pricing issues when upgrading from previous versions to SOA Suite 11gR1, due to the change in the underlying application server (from Oracle Internet Application Server to Oracle WebLogic Suite) that may imply higher licensing costs.

Let's take a look at another analyst firm, Forrester Research, Inc. Forrester evaluated 15 leading comprehensive integration solution (CIS) vendors against 137 criteria that reflect the requirements of application development and delivery professionals. This resulted in the Forrester Wave: Comprehensive Integration Solutions (CIS) dated November 2010:
Oracle delivers a well-integrated CIS solution. Oracle has been identified as a Leader. The Oracle solution, which enables rapid development of integration-related functionality, includes Oracle SOA Suite and Oracle BPM Suite as its key components. The vendor has the secondlargest base of CIS customers (approximately 6,000) and has consistently achieved leadership status in this software category over the past five years. Oracle achieved very strong scores in four out of five product evaluation areas (architecture, integration server, application development framework, and business process management) and achieved an above-average score for its B2B features.

So the Oracle SOA Suite 11g is both by Gartner as well as by Forrester rated as a leader, that's nice. I wonder how the 10g version would have been rated by them...

As I migrated a few integrations from 10g to 11g I can say that such a migration is pretty much doable, the 11g Weblogic server is straight-forward and easy to use, at least compared to the 10g version. Just take some time to learn about the MDS (MetaData Service), it's like a version management system used all accross the platform and that you can use to share common artifacts at design and runtime. This MDS is a huge improvement, just like the central GUI with the whole integration scenario in one single overview instead of divided over the BPEL and ESB consoles. When migrating the components plan some time to clean up the sourcecode as well, by deleting obsolete files and applying advancing insights and new best practices that were not known at the time of the original developments.

Just one small point of attention when migrating the AIA EBO library (Enterprise Business Object). The AIA EBO customizations (every EBO has a custom xsd file for your own customizations) should be upgrade-safe, that's the whole purpose of these custom xsd's. But some have been extended by Oracle in 11g. A few AIA 11g custom xsd files contain more types than were present in AIA v2.5, thus in a migration the following files should be merged manually: CustomCommonComponents.xsd, CustomCustomerPartyEBO.xsd and CustomSalesOrderEBO.xsd.

As I mentioned Gartners Magic quadrant and Forresters Wave, here they are:

Monday, June 6, 2011

Multi-tenancy

After visiting the Cloud Forum a few weeks ago [ref: cloud forum 2011] I remained with the question what multi-tenancy actually means. Multi-tenancy is a relatively new software architecture principle in the realm of the Software as a Service (SaaS) business model. It allows to make full use of the economy of scale, as multiple customers - "tenants" - share the same application and database instance. All the while, the tenants enjoy a highly configurable application, making it appear that the application is deployed on a dedicated server. The major benefits of multi-tenancy are increased utilization of hardware resources and improved ease of maintenance resulting in lower overall application costs. [ref: Multi-Tenant SaaS Applications: Maintenance Dream or Nightmare? (pdf), a report by Cor-Paul Bezemer and Andy Zaidman].

Basically multi-tenancy is the only proven SaaS delivery architecture that eliminates many of the problems created by the traditional software licensing and upgrade model, so it’s extremely valuable to know whether the cloud provider uses a multi-tenant architecture. Multi-tenancy ensures that every customer is on the same version of the software, the 1 in 0 - 1 - ∞: 0 investments, 1 version, ∞ scalability. See also the following whitepaper [ref: 10 Critical Requirements (pdf)].

As said before multi-tenancy results in lower overall application costs, together with the dramatic drop of infrastructure costs over the last five years the price is basically the main reason why everyone is now so convinced Cloud Computing will work.
Factors for driving companies toward the Cloud are
  • Cost-savings - smaller in-house IT staff, less software licensing, less hardware and having an easy and inexpensive way to store a redundant copy of their data.
  • Ease of management, customers no longer have to worry about software upgrades, hardware upgrades, migrations, or any of the management that comes with running a datacenter.
Barriers preventing some companies from moving to the Cloud are
  • Trust - if a company loses it's data, it is very likely that this company will go out of business.
  • Losing their IT staff - many companies see their IT staff as a competitive advantage against other competitors in their field.
  • Lack of knowledge about Cloud Computing.

Basically Cloud Computing is not just the next step, but more a paradigm shift. To illustrate this a quote from the book The Big Switch by Nicholas Carr:
A hundred years ago, companies stopped generating their own power with steam engines and dynamos and plugged into the newly built electric grid. The cheap power pumped out by electric utilities didn’t just change how businesses operate. It set off a chain reaction of economic and social transformations that brought the modern world into existence. Today, a similar revolution is under way. Hooked up to the Internet’s global computing grid, massive information-processing plants have begun pumping data and software code into our homes and businesses. This time, it’s computing that’s turning into a utility.
Updated November 6, 2012 with the proper reference to the used definition of multi-tenancy.

Monday, May 30, 2011

Overload leads to timeout errors

Sometimes integration scenarios can receive a bulk load from an application, especially when the source application uses batches as the trigger for sending out messages. This load can lead to the following non-retryable SOAP errors in the ESB:
Fault message:
Failed to enqueue deferred event "oracle.tip.esb.server.dispatch.QueueHandlerException:
Context lookup failed "[CONFIGURATION ERROR] Invalid Destination "Topics/ESB_JAVA_DEFERRED" :
javax.jms.InvalidDestinationException: Looking up java:comp/resource/esbRP/Topics/ESB_JAVA_DEFERRED:
javax.naming.NameNotFoundException: No resource named 'esbRP/Topics/ESB_JAVA_DEFERRED'found.
Please verify configuration of adminobject or the lookup string."
Make sure the topic is mapped to a jndi tree"


This behaviour is described in an Oracle note [ref: note ID 1173584.1] with the following solution: increase the timeout settings. Together with two other Oracle notes describing how to avoid BPEL errors due to adapters response time and what timeout settings in SOA can impact AIA [ref: note ID 1074227.1 and 885114.1] this leads to the following timeout settings:

  • xa_timeout in $SOA_HOME/integration/esb/config/esb_config.ini
    Default: 60; recommended setting: 3600
  • jms_receive_timeout in $SOA_HOME/integration/esb/config/esb_config.ini
    Default: 30; recommended setting: 300
  • Also according to another Oracle note [ref: note ID 752385.1] you might want to set PingCount and PingInterval to 30 in $SOA_HOME/integration/esb/config/esb_config.ini
  • syncMaxWaitTime in $SOA_HOME/bpel/domains/default/config/domain.xml
    Default: 45; recommended setting: 600
  • transaction-timeout in $SOA_HOME/j2ee/oc4j_soa/application-deployments/orabpel/ejb_ob_engine/orion-ejb-jar.xml (change this value for all 6 occurences of transaction-timeout in this file)
    Default: up to 3000; recommended setting: 3600
  • transaction-timeout in $SOA_HOME/j2ee/[container]/config/transaction-manager.xml (change this value for all containers: e.g. home, designtime and runtime [oc4j_soa])
    Default: from 300 to 3600; recommended setting: 7200
  • Timeout in $SOA_HOME/Apache/Apache/conf/httpd.conf
    Default: 300; recommended setting: 300, this option specifies the amount of time Apache will wait for a GET, POST, PUT request and ACKs on transmissions. The default is 300 (seconds) however this may need to be increased.

The settings in the $ORACLE_HOME/integration/esb/config/esb_config.ini file apply to all ESB instances in the Oracle Home. One small bonus feature: if you have many BPEL processes deployed and you experience long waiting times in the ESB Console: put a lazyLoad property in the esb_config.ini file and the waiting time is gone [ref: Bug 7720420].
esb.console.services.lazyLoad.Allowed=true
It's a little confusing regarding the file orion-application.xml, different notes say different things, one note mentions the location in application-deployments which overrides esb_config.ini, another note mentions the location under applications which overrides esb_config.ini. Just to be safe and consistent check the orion-application.xml file in the following locations and if they are present and not commented out apply the values mentioned above for the xa_timeout, jms_receive_timeout, PingCount and PingInterval:
$SOA_HOME/j2ee/[ESB_RUNTIME_CONTAINER]/application-deployments/esb-rt/orion-application.xml
$SOA_HOME/j2ee/[ESB_RUNTIME_CONTAINER]/applications/esb-rt/META-INF/orion-application.xml
$SOA_HOME/j2ee/[ESB_DESIGNTIME_CONTAINER]/application-deployments/esb-dt/orion-application.xml
$SOA_HOME/j2ee/[ESB_DESIGNTIME_CONTAINER]/applications/esb-dt/META-INF/orion-application.xml


The above changes also take care of the following errors in the container logfile from $SOA_HOME/opmn/logs:
ORABPEL-05002
ORABPEL-02182
JTA transaction is not present the transaction is not in active state
Message handle error


Instead of increasing the timeout settings for the BPEL processes a more durable solution would be to throtte the inbound flow. There is an activation agent (bpel.xml) property (since 10.1.3.1), which can be used to control the speed at which the adapter posts messages to BPEL [ref: note ID 1178163.1 or Oracle® SOA Suite Best Practices Guide 10g Release 3 (10.1.3.3.0) E10971-01 December 2007]
...
    <activationAgents>
      <activationAgent partnerLink="JmsDequeuePL" ... >
          <property name="minimumDelayBetweenMessages">1000</property>
      </activationAgent>
    </activationAgents>
  </BPELProcess>
</BPELSuitcase>

This setting ensures that there at least will be 1000 milliseconds delay between two consecutive messages being posted to this BPEL process.

Actually this last best practices guide provides many valuable tips, like
  • Performance tuning guidelines.
  • Answers to frequently asked questions about threading.
  • Answers to frequently asked questions about transactions.
  • How to optimize the JVM heap. The heap size controls the amount of memory the JVM can use.
  • How to relieve the dehydration store by making BPEL processes synchronous.
  • Description of performance persistence parameters in bpel.xml.
  • WSIF binding (localhost) and EJB binding optimization.
  • The relationship among some performance settings
  • The objectives and best practices for creating the BPEL cluster.
  • Increasing the instanceKeyBlockSize to 100,000. Doing so decreases the frequency at which Oracle BPEL Server visits the dehydration store.

Some best practices should be considered during design and development time, like the BPEL parameters that are configured per BPEL component, whether a BPEL process should be synchronous or asynchronous and whether it should participate in a transaction or not. Other best practices are SOA Suite wide and should be part of the overall configuration, like the start-up memory settings.

Updated June 6, 2011 with LazyLoad, PingCount and PingInterval, another location of the orion-application.xml file and jms_receive_timeout to 300.