Taxonomies, Ontologies, Semantic Models & Knowledge Graphs

Jim McHugh

Several people have recently asked me about taxonomies, ontologies, and semantic models and why they are important. In this blog post, I hope to show you why these are foundational steps to Knowledge Graphs, and by extension, to AI/ML solutions.

Taxonomies

A taxonomy is a hierarchical framework, or schema, for the organization of organisms, inanimate objects, events, and/or concepts. We see taxonomies daily as humans, and we don’t give them much thought. Taxonomies are the facets, filters, and search suggestions commonly seen on modern websites.

For example, books can be categorized as fiction and nonfiction at a high level. That may work in some instances, but in most cases, that is too high of a grouping level, so we further subdivide each high-level category until we are satisfied we have achieved an appropriate grouping level. Figure 1 shows an example of a taxonomy for books.

Figure 1. – Book Taxonomy

Another taxonomy example is how you sort your documents on your computer. For example, some may choose to start with a subject and then sub-divide by year, while others may do the opposite.

There are no absolute right and wrong with taxonomies, just degrees of appropriateness. The most important question to ask when creating a taxonomy is, “does this hierarchical grouping meet my needs?”

Ontologies

According to Wikipedia, an ontology “encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse.” In other words, ontologies allow us to organize the jargon of a subject area into a controlled vocabulary, thereby decreasing complexity and confusion. Without ontologies, you have no frame of reference, and understanding is lost. As Robert Engles states in his blog post On the role and the whatabouts of Ontology, ontologies are “essential in modern architectural patterns to ensure data quality, governance, findability, interoperability, accessibility, and reusability.”

For example, an ontology will allow one to associate the Book taxonomy with the Customer taxonomy via relationships.

An ontology is more challenging to create than a taxonomy because it needs to capture the interrelationships between business objects/concepts by encapsulating the language and terminology of the business area you are modeling.

Figure 2. – OntologiesStephen DeAngelis, Ontology Power of Understanding

A properly created ontology will expose the understanding of how the elements in the model relate to each other. Based on this understanding, one can infer intent via the relationships. A virtual assistant like Alexa uses these relationships phrases and synonyms of those phrases to define the user’s intention.

Semantic Data Model

A Semantic Data Model is a method of organizing data that reflects the basic meaning of data items and the relationships among them. An example of a semantic model is a conceptual data model. This model has enough information to convey meaning to someone who may not know or understand the subject area.

We call semantic models to contain the ontology and the factual knowledge in a large, combined model with definitions added to concepts, links, and facts based on business needs.

Knowledge Graphs

Knowledge graphs are models that instantiate the taxonomy and ontology via a semantic model using the actual data and associated relationships. These graphs are the foundation for us to realize the promise of Artificial Intelligence (AI) and Machine Learning (ML) capabilities by capturing and exposing the relationships between nodes. These relationships contain data and metadata about the relationship between nodes, which is very different from the inferred relationships between columns of data in a relational database.

This relationship data and metadata is critical to successful Machine Learning (ML) solutions. By creating an understanding of the relationships between the nodes, we can achieve progressive improvements to the improvement of the data model without creating and injecting new code. These incremental improvements to the knowledge graph are critical to implementing Artificial Intelligence (AI) because this mimics how the human brain can reassess a concept or situation based on new data and derive a course correction.

AI/ML solutions like this already exist and are used every day. Fraud detection solutions, virtual assistant tools like Alexa, Netflix recommendations, and the “someone you may know” features on Facebook or LinkedIn use AI/ML, built on taxonomies, ontologies, and semantic data models.

Figure 3. – Fraud Detection Knowledge Graph

In conclusion, don’t try to skip steps. Instead, start any AI/ML solution by ensuring you have laid a good foundation of understanding via taxonomies and ontologies that will create a robust and flexible knowledge graph.

About the Author

Jim McHugh is the Vice President of National Intelligence Service – Emerging Markets Portfolio. Jim is responsible for the delivery of Analytics and Data Management to the Intelligence Community.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_NK4L4Q320Q	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_163894009_2	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.

Cookie	Duration	Description
f5avraaaaaaaaaaaaaaaa_session_	session	businesswire.com cookie
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Press Release

Taxonomies

Figure 1. – Book Taxonomy

Ontologies

Figure 2. – OntologiesStephen DeAngelis, Ontology Power of Understanding

Semantic Data Model

Knowledge Graphs

Figure 3. – Fraud Detection Knowledge Graph

About the Author

Contact