My colleague, Justin Rasband, and I have been working on an exciting new project. We are converting a legacy relational database to a NEO4J graph database. The challenge set before us by the customer was to create a cost-effective, flexible, and scalable ETL solution that will allow the user to move the table attribute data into customized node hierarchies and relationships with metadata at every level.
We started our research by looking at existing software tools. We assumed that many people had done this before, and we could just use an existing software tool to do the work for us. First, we looked at creating cypher scripts. This was quickly dismissed as there were over 660 tables and 7K attributes that need to be converted. Next, we looked at loading directly from Excel but the challenge here was all attributes became node properties and we could not customize the hierarchy. We then tested GraphQL but it too was limited to converting tables to nodes. Finally, we looked at other graph ETL solutions but the ones we tested were immature and costly.
We knew that we needed to think outside the box, or the node circle, to solve this challenge and came up with a novel solution using a customized CSV file and an open-source ETL tool called KNIME that allowed us to meet all our customer requirements.
We started by dumping the data model (table name, column name, data type) out into a CSV file. We then worked with our customer to define the data contained in each node. Each attribute was placed in a separate attribute state node so that we could capture history, a feature the customer has wanted to implement for several years. We then created a node hierarchy and decided where in this new hierarchy each attribute node would be created. Finally, we created a relationship section to address those nodes not in a hierarchy that need to be connected.
Once the CSV file was complete, we went looking for an ETL tool that would meet our needs. We looked at several tools, but the KNIME Analytics Platform (KNIME) stood out as the best solution to our challenge. KNIME is an open-source software platform that allows users to create visual workflows using thousands of intuitive, drag and drop nodes and components found in the KNIME Hub. This platform provided us with the flexibility to leverage the CSV file we just created and create the workflow we needed to meet the customer’s requirements. In fact, we were able to do this by using less than 50 KNIME nodes and components. Please watch our NODES 2021 presentation on Custom Conversion from a Relational Database to NEO4J for an overview of the project and a demo of this easy but powerful solution.
About the Author
Jim McHugh is the Vice President of National Intelligence Service – Emerging Markets Portfolio. Jim is responsible for the delivery of Analytics and Data Management to the Intelligence Community.