BigBear.ai
  • Home
  • Industries
    • Academia
    • Government
    • Healthcare
    • Manufacturing
  • Solutions
    • Cyber
    • Data Analytics
    • Enterprise Planning and Logistics
    • Intelligent Automation
    • Modeling Solutions
    • Professional Services
  • Company
    • About
    • Investor Relations
    • Partners
    • Team
  • Careers
    • Benefits
    • Culture
    • Explore Jobs
    • Military and Veterans
    • Applicant Login
    • Employee Login
  • Resources
    • Blog
    • Events
    • Newsroom
    • Resource Library
    • Online Store
  • Contact
Search

Home Artificial Intelligence Property Graphs: Is it a Node, a Relationship, or a Property?

Blog

Property Graphs: Is it a Node, a Relationship, or a Property?

Justin Rasband
May 10, 2022
  • Share
  • Share

Introduction

One of the most frequent questions I get asked from new Database Developers and Administrators just beginning to look into Graph Databases is, “How should I model?” They then list some of their strange and specific data needs. They always want a Magic Answer to fix things. Sometimes they have edited ways that a Property Graph Database could accommodate it. Normally, it is very literal as they try to use nodes and arrows to express hierarchies, control, actions, etc.

I’ve found the most helpful direction I can offer these young Graph Database disciples is to reframe their question into, “Is it a Node, a Relationship, or a Property?” This question breaks things down into the smallest units that Property graphs, like Neo4j (which I will be using for all scripts and visualizations in this post), utilize. These data components aren’t as narrowly defined and rigidly utilized as many relational models are, and some of their uses are more abstract. That’s why I focus on Nodes, Relationships, and Properties. Understanding when to use them is vital to anyone creating a new graph database.

 

Comprehending the Model

If you are new to Neo4j or Property Graphs, I suggest you learn a little more about these components. Neo4j’s Getting Started page has a lot of helpful information. Now let’s dive into a comprehensive side-by-side alignment of these components.

Think about the following scenario. Alice works at Soda Co. Inc. Ltd. at their American River branch. She asks, “Justin, how can we model this?” Let’s see what we can identify as best being a node, what should be a relationship, and what should be a property. Here are three ways to look at it:

Left: Property-heavy Single Nodes

In the model on the left, all of Alice’s information is listed as properties on a single node. It looks like it handles all our information! We can certainly extend it! There could be Person nodes for Bob and Carol, each with properties for their own employment. This is a very Property-heavy approach.

Some graph newcomers might like this approach, especially if they are attempting to transition over a relational database. Look at the mapping! You could have a very simple relational mapping:

  • Everything from the ‘Person’ relational table ends up with a ‘Person’ node in the graph.
  • Each column from the ‘Person’ table becomes a property on the ‘Person’ nodes in the graph.
  • Each row from the ‘Person’ table gets its own ‘Person’ node in the graph.

You might get to this point and think, “Congratulations! I’ve just converted my relational database into a graph!” While technically that is correct (although a collection of unconnected nodes being called a ‘graph’ is a taxonomic issue), this approach is not taking advantage of how graph databases gain speed over relational databases. Examine how this model gets queried:

  1. Query for all distinct employers that employ Persons in this graph.
    1. MATCH (p:Person) RETURN DISTINCT p.employer;
  2. Query for all employees of Soda Co. Inc. Ltd.
    1. MATCH (p:Person) WHERE p.employer = ‘Soda Co. Inc. Ltd.’ RETURN p.name;
  3. Query for all employees who work at the same location as Alice.
    1. MATCH (p:Person) WHERE p.name = ‘Alice’ MATCH (other:Person) WHERE other.employed_location = p.employed_location RETURN other.name;

How will a Graph Database like Neo4j perform on these queries? Overall, about the same (if not worse) than it would be in a graph database. Neo4j has the fastest access points on labels, then on relationships, and properties (this is an important sentence we’ll repeat later). What does this mean for our queries?

  1. Query #1 fares well. It is accessing the graph by the Person label, which means it’s going to be accessing the data in question quickly.
  2. Query #2 is alright, but not great. It’s filtering Person nodes by a property, which might mean it’s a bit slower. We can put an index on that property to help a bit.
  3. Query #3 is going to suffer since it is doing multiple filters based on a property, and it helps that we’re narrowing that ‘other’ node down to a ‘Person’. As soon as we start trying to compare Person nodes to other types of nodes in our data model (for example, there could be nodes labeled `Office Supplies` or `Delivery Truck` or `Country` that could all want some property to equal Alice’s employed_location), it becomes difficult to build the query and slower and slower to execute it.

This approach is getting good performance for one out of three of our sample queries. You might say, “That’s good, right?” If you’re thinking like this, it’s a very solid mental approach. I’ll tell you the answer to “Should it be a Node, a Relationship, or a Property?” The answer depends on the kinds of queries you want to run against your model. If the only type of information our database is meant to return are queries like #1 (queries that just access nodes via a label and return some properties), this Property-Heavy approach works well. As soon as we need to go beyond that, our query performance suffers. Most databases will likely have to support querying beyond that.

Let’s say it once more, loud and clear, “A good model supports the queries it needs to return in the best performance.”

Center: Relationships and Relationship Properties

Now we will begin examining graph advantages. This graph begins to relate data together with a relationship and adds some definition to that relationship. Notice right away how some of the long property names of the Left approach become a lot more simple and semantically meaningful here in the Center.

The question is, “Is it the best model?” Based on the guidance that the best model is the one that supports our query needs with top performance, let’s look at how we would query our data of interest on this model.

  1. Query for all distinct employers that employ Persons in this graph.
    1. MATCH (p:Person) –[:WORKS_FOR]-> (c:Company) RETURN DISTINCT c.name;
  2. Query for all employees of Soda Co. Inc. Ltd.
    1. MATCH (c:Company) <–[:WORKS_FOR]- (p:Person) WHERE c.name = ‘Soda Co. Inc. Ltd.’ RETURN p.name;
  3. Query for all employees who work at the same location as Alice.
    1. MATCH (p:Person) –[r1:WORKS_FOR]-> (c:Company) WHERE p.name = ‘Alice’ MATCH MATCH (other:Person) –[r2:WORKS_FOR]-> (c:Company) WHERE r1.location = r2.location RETURN other.name;

How is performance looking here?

  1. Query #1 is still matching on a labeled pattern and getting a single property. It’s probably about as fast as the Left approach.
  2. Query #2 sees some significant gains, as we’re able to treat each Company as its Node and then access the graph from the point of view of a Company.
  3. Query #3 still seems to have some issues; a relationship property is not easy to treat as an access point into our graph

Graph databases like Neo4j, excel at relating data together and matching the patterns that exist between them. Now that Company is a node, it’s a much easier access point than when it was a property (always easier to access nodes by labels than properties!).

The takeaway for this section:

When you think of how you will query your data, think of the data elements that drive access into your graph. Nodes, Labels, and Pattern matches are the best way to access a graph. Access to your graph is going to be the driving force for performance and should be the main deciding factor in whether you model data as nodes + labels, as relationships, or as properties.

Right: Single Node Myopia

You likely already know the advantages of the model on the Right, and it’s that Location that suddenly becomes an access point into our graph. We’re going to use this section to talk about how far to take things with node-centricity, but before we do, let’s present that model again and show how to query it.

The only query that this impacts is #3, which now looks like:

  • MATCH (p:Person) –[:WORKS_AT]-> (loc:Location) <-[:WORKS_AT]- (other:Person) WHERE p.name = ‘Alice’ RETURN other.name;

Look at the performance gain! We now are no longer comparing any properties between separate disjointed nodes/relationships, but we are using a single pattern match to find out everyone who works with Alice. Having Location as an access point into the graph lets us do a lot!

Some readers may think, “Well, why not make everything its own node?” It’s a fair question, but to illustrate its meaning, examine the following thought process:

If companies had four facts about them, there could be a node for the company name, a node for the company logo, a node for the company type, and a node for the company’s founding year. Consider the following models:

This shows two ways to model Companies; one where the Company is a single node laden with properties, and one where all properties are spread to their nodes. Now that we’ve looked at different ways of modeling, we can analyze these models more critically:

  • The Left Person model may seem as limited as the Left Person model was but consider it as part of a larger graph where People are connecting to it. Let’s then take what we learned from the Left Person model and apply it here. If we never have to query our database to filter on or do equals logic between logos, types, or founding years, the Left Company model seems like a perfectly adequate way to model a Company. Some factors, like logo or year, seem very unlikely to be access points for queries, and therefore make good properties.
  • The Right Company Model provides access points to company names, types, years, or logos, but that may not be what we want.
    • How frequently are we going to be filtering or starting access into a company by its logo? One might imagine that’s not going to be frequent. Maybe we could walk the logo back into a property?
    • When looking at nodes for Company type and Founding Year, we may start to imagine how nice it could be to start queries on those. Saying, “Get me all Companies in a specific sector/type,” becomes a lot easier if Company Type is its own node. It is the same for a year, but how often will we need to ask, “Can you get me all companies that were founded in a specific year.” If Company Founding Year is a node, we’ll need to be ready to have one node for every year in our database as well (certainly a size consideration).
  • The Left Company model doesn’t allow us to further define logo, type, or founding year. If we need to add more metadata or connections to any of those data elements, treating them as properties is not sufficient.
  • One issue with the Right Company model is that if we ever query for company info, we need to write our queries carefully to make sure we don’t miss any information that should go down. Those queries will also be longer as well.
  • One advantage of the Right Company model is that it can support multiple logos and company types. The previous model couldn’t handle that without converting those values into Lists, which are among the least-performant kind of graph values to filter on.

These kinds of thoughts begin to reflect an answer to the question: What should be a Property? The best answer I can come up with is: Any data element that isn’t frequently used to access the graph directly, has no internal object complexity/multiplicity, and/or needs to always return with the rest of the data in the node is a good candidate for a Property.

As a side note, if you really enjoyed the idea of separating every fact about a Company into its own node, I recommend you check out the other major type of graph database, the RDF graph model. RDF models attempt to disassemble data facts very much like that Right Company model while still being performant.

 

Conclusion

This blog looked at the fundamental pieces of a Property Graph database model and the drivers one should consider when employing them in a model. Overall, it demonstrated:

  • Nodes and Labels are the best way to access a graph when querying and represent distinct entities.
  • Relationships are powerful ways to access different types of nodes at the same time, move through the graph, and filter data. Yet relationship properties can be difficult to utilize for other logic (such as attempting to find equality among different relationships).
  • Properties are suited for data that isn’t a point of access directly, can’t be broken down further or duplicated, and/or needs to be considered a component of a Node.

These principles, comprehended over practice and application, help graph modelers understand how to model their data. If you’re just starting to build a graph database, start by listing out as many types of questions you want your database to answer through queries and consider how those queries will be crafted and perform.

Join me on future blog posts where we take the next steps beyond these fundamentals, where we look at Intermediate/Abstract nodes, Multiple-Relationships and Hierarchies, and beyond.

 

About the Author

Justin Rasband is a Full-Stack Developer at BigBear.ai. He has six years of experience in API development, Javascript/HTML5 coding, and Neo4j data design.

Posted in Artificial Intelligence, Knowledge Graphs, Machine Learning.
BigBear.ai
  • Home
  • Industries
  • Solutions
  • Company
  • Careers
  • Blog
  • Investor Relations
  • Contact
  • Twitter
  • Facebook
  • Linkedin
  • Google My business for BigBear.ai
1-410-312-0885
[email protected]
  • Privacy Policy
  • Terms of Use
  • Accessibility
  • Site Map
© BigBear.ai 2023
We value your privacy
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Privacy Policy | Do not sell my personal information
AcceptCookie Settings
Manage Consent

Cookies Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONIDsessionThe JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
CookieDurationDescription
__atuvc1 year 1 monthAddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs30 minutesAddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
CookieDurationDescription
_ga2 yearsThe _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_NK4L4Q320Q2 yearsThis cookie is installed by Google Analytics.
_gat_gtag_UA_163894009_21 minuteSet by Google to distinguish users.
_gid1 dayInstalled by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
at-randneverAddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT2 yearsYouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc1 year 1 monthSet by addthis.com to determine the usage of addthis.com service.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
CookieDurationDescription
f5avraaaaaaaaaaaaaaaa_session_sessionbusinesswire.com cookie
loc1 year 1 monthAddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE5 months 27 daysA cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSCsessionYSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devicesneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-idneverYouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
Save & Accept