Leveraging Athena with KNIME in a Robust Manner, Part 2

Paul Wisneskey

In my previous blog posting, I introduced an issue we were having with seemingly random intermittent failures using Amazon Web Services’ Athena backed by a large number of data files in S3. The issue was arising because S3 is eventually consistent and occasionally queries were being executed before their underlying data files were fully materialized in S3.

Our solution was to introduce try/catch Knime nodes with a loop to retry failed queries a few times in case of intermittent failures. To do this we had to do our own flow variable resolution in the Athena SQL queries since the standard Knime database SQL executor node does not do variable substitution when the query is configured via a flow variable. My previous blog posting (link to part 1 here) covered how we resolved embedded flow variable references in a Java snippet node.

In this blog posting I am going to cover the workflow we use for the actual query retry logic so that we retry each query a given number of times until it either succeeds or we have to abort the workflow processing. This workflow logic was done by my coworker, Alex To, and based on a post he found in the Knime community forums.

The snippet above shows the retry loop surrounding the try/catch block that is used to execute a single SQL statement at a time (the outer loop that executes all statements sequentially is not shown). The Generic Loop Start and Try (Variable Ports) nodes are set with their default configuration. The Database SQL Executor node is simply configured to use a flow variable for the statement to execute as follows:

For the Catch node, we want to configure it so that the error variables are always set so that they can be tested to determine if the SQL execution succeeded or failed:

The core of this technique is the retry logic which is built into the Java Edit Variable node which immediate follows the Catch node:

Any error that occurs (e.g. the Catch node variables are not set to their default values of “none”), will cause the if statement to trigger. The conditional block uses flow variables to keep count of the number of tries in the currentIteration flow variable (which was initialized to zero in an upstream node). The maximum number of retries is also configured in a previously initialed flow variable, maxTry.

If we have not reached the maximum number of tries, the node will sleep for the number of seconds configured in the waitTimeSecond flow variable before allowing execution to proceed to the next node. However, if we have reached the maximum number of attempts, the workflow will be aborted by throwing an Abort exception. It is important to note that the technique of throwing this exception to stop executed is only supported in Knime 3.7.2 and later versions.

The final piece of this workflow snippet is the Variable Loop Condition End node which is configured to use the value of the isContinue variable to decide if the loop needs to execute again. This variable is set in the Java snippet’s retry logic based on whether or not an error was detected from the SQL execution node.

This completes the workflow snippet and shows how to support robust execution of AWS Athena queries using KNIME. Furthermore, it demonstrates power of the Try/Catch KNIME nodes and the surrounding retry logic which can be generalized to make KNIME workflows much more robust in any environments where there may be intermittent failures in things like file transfers, resource locking, etc.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_NK4L4Q320Q	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_163894009_2	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.

Cookie	Duration	Description
f5avraaaaaaaaaaaaaaaa_session_	session	businesswire.com cookie
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Leveraging Athena with KNIME in a Robust Manner, Part 2

Contact

Contact