World Of Taxonomy

Web Vocabularies - schema.org, SKOS, and Related Type Systems

Web Vocabularies

Web vocabularies are the type and concept systems used by AI assistants, search engines, and structured-data crawlers to label real-world entities on the public web. World Of Taxonomy hosts the type-tree subset of these vocabularies (the part that is enumerated and hierarchical, per the Inclusion Policy). Property and relation vocabularies are out of scope.

schema.org

Field Value
System ID schema_org
Total types 926 (rdfs:Class entries with the schema: prefix)
Authority schema.org consortium (Google, Microsoft, Yahoo, Yandex)
License CC BY-SA 3.0
Source https://schema.org/version/latest/schemaorg-current-https.jsonld

schema.org publishes a vocabulary of types and properties that web pages and APIs use to mark up structured data so search engines and AI assistants can understand what a page is about. WoT ingests the type tree only:

  • Rooted at Thing. Every type is a subclass of Thing (transitively).
  • ~926 types organized via rdfs:subClassOf chains (CreativeWork, Person, Place, Action, etc.).
  • ~57 types have multiple parents; WoT keeps the first listed parent as the canonical hierarchy edge and notes the alternative parents in the description.
  • 100% description coverage native to the source (every class has rdfs:comment).

Why this matters for AEO and SEO: schema.org type tags are the single most important signal AI search overviews, ChatGPT browsing, and Google Knowledge Graph use to identify what a page is about. Hosting the full type tree in WoT means downstream products (WoO agent runtime, classification/crosswalk APIs) can cite schema.org URIs natively as the canonical anchor for "what kind of thing is this." It also enables crosswalks between schema.org types and domain classifications: schema:Restaurant to NAICS 7225, schema:Hotel to NAICS 7211, schema:MedicalSpecialty to MeSH, and so on.

What WoT does not host: the ~1,676 schema.org rdf:Property entries (name, address, priceRange, etc.) are property definitions, not classification categories. Per the inclusion policy, pure property vocabularies are out of scope. Consumers who need the full property surface should hit schema.org directly.

Related vocabularies in WoT

System Codes Role
wordnet_nouns 82,115 Princeton WordNet noun hypernym tree (entity.n.01 root); lexical semantic anchor
skos 17 W3C Simple Knowledge Organization System (metamodel for thesauri)
w3c_standards 16 W3C standards index
iab_content 21 IAB Tech Lab content taxonomy for advertising

These sit alongside schema.org as web-adjacent classification systems.

WordNet (Nouns)

WordNet is the canonical lexical-semantic database for English, originally built at Princeton in 1985 and now the de-facto standard for "what is the common-sense type of this concept" in NLP. WoT ingests the noun hypernym tree (82,115 synsets rooted at entity.n.01); verbs / adjectives / adverbs are not in scope for this PR (verbs may follow if downstream demand surfaces).

Where WordNet differs from schema.org: schema.org names types of publishable web entities (Restaurant, Article, MedicalCondition); WordNet names common-sense concept categories (dog.n.01, physician.n.01, wedding.n.01). Both are valid concept anchors at different abstractions:

  • A Wikipedia / encyclopedia article on "dogs" would carry a wordnet_nouns:dog.n.01 anchor for content classification.
  • A pet-store website page about a specific dog would carry a schema:Product anchor for SEO.
  • A retail product feed listing dog food would carry a gs1_gpc:10000091 brick anchor for product identification.

WoT lets downstream products (WoO agent runtime, classification APIs) cite the right anchor for the right surface without picking a winner.

License: WordNet License (BSD-style). Sourced via NLTK's WordNet 3.1 corpus.

What's not yet ingested

The following web-vocabulary candidates have been audited against the inclusion policy and are queued for follow-up PRs:

  • WordNet verbs (~13K synsets) - hypernym tree exists for verbs too; deferred until downstream demand justifies it.
  • DBpedia ontology (~700 classes) - Wikipedia-derived class hierarchy.
  • SUMO / BFO / DOLCE - upper ontologies (small, peripheral relevance).
  • FOAF classes, DCMI Type Vocabulary - small auxiliary vocabularies.

The full Wikidata Q-number space (~100M entities) and the DBpedia instance set (~5M) are entity registries above WoT's size cap and out of scope. They belong in a sister product (a hypothetical "World of Registries" or directly inside WoO).

How to use schema.org from WoT

# Look up a specific schema.org type
GET /api/v1/systems/schema_org/nodes/Restaurant

# Browse children of CreativeWork
GET /api/v1/systems/schema_org/nodes/CreativeWork/children

# Search across the type tree
GET /api/v1/search?q=medical&systems=schema_org

# Crosswalk to a domain classification (when wired)
GET /api/v1/systems/schema_org/nodes/Restaurant/equivalences

The MCP server exposes the same data via get_industry, browse_children, and search_classifications tools, so AI assistants can resolve a schema.org type to its WoT counterparts without leaving the chat.

Related reading