Key Concepts

Core Data Architecture

The CMDB uses Single-Table Inheritance with Partitioning to manage Configuration Items (CIs) and their relationships.

  • Configuration Items (CIs): Stored in cmdb_base_element, they combine strict core attributes (Name, Status, Class) with fully flexible, schema-less extended attributes stored in a JSONB column.
  • Directional Relationships: Connections between CIs (e.g., RUNS_ON, DEPENDENCY) are tracked in cmdb_base_relationship, enforcing strict Source-to-Target cardinality.
  • Ontology & Schema Dictionary: A native management system for CI Classes, custom attributes, and a taxonomy (Category, Type, Item, Model) to standardise infrastructure definitions.
  • Dataset Isolation: Data is segregated into datasets (e.g., PRODUCTION, SANDBOX) via PostgreSQL table partitioning. This ensures massive read/write scalability and prevents data corruption during ingestion.

CI Class Inheritance

All CIs derive from a foundational hierarchy based on the DMTF CIM Schema. Below are the core properties of the base classes.

Foundation Classes (ManagedElement, LogicalElement, etc.)

Property Data Type Description
InstanceID string Opaque unique identifier (OrgID:LocalID pattern)
Caption string Short textual description (max 64 chars)
Description string Detailed textual description
ElementName string User-friendly name for the instance
InstallDate datetime When the object was installed
Name string Label by which the object is known (max 256 chars)
Status string Current status: OK, Error, Degraded, Pred Fail…

PhysicalElement

Property Data Type Description
Manufacturer string Organization that produced the device
Model string Name by which the physical element is known
SerialNumber string Manufacturer-allocated serial number
Tag string Uniquely identifies the physical element
Version string Version of the physical element

ComputerSystem

Property Data Type Description
NameFormat string How the ComputerSystem Name is generated (e.g., IP, UUID)
Dedicated[] uint16 array Purpose(s) of the system (Switch, Firewall, …)
ResetCapability uint16 Hardware reset capability

UnitaryComputerSystem

Property Data Type Description
InitialLoadInfo[] string array Data to locate boot device
LastLoadInfo string Identifies device that last loaded the OS
PowerManagementCapabilities[] uint16 array Power management capabilities
PowerState uint16 Current power state (Full Power, Standby, …)

The full DMTF CIM schema (v2.38) is used as a reference. MOF files are located under the Schemas/CIM238/DMTF/ tree.

Reconciliation & Merge Engine

The Merge Engine is the “brain” of the CMDB, ensuring data from multiple, often conflicting sources is deduplicated and merged into a single “Golden Record”. It operates in three distinct phases.

Phase 1 – Identification (Match)

The engine uses configurable Identification Rules (stored in cmdb_ident_rules) evaluated in order of priority.

  • If a match is found (e.g., on unique_identifier, serial_number + model, or name + class_id), the staging CI is linked to the production CI via a shared reconciliation_identity (UUID).
  • If no match is found, the CI is treated as new and a fresh Golden Record is created.

Phase 2 – Dynamic Precedence (Win)

Conflicts are resolved using a granular hierarchy:

  1. Dataset Level (Global): e.g., AWS_DISCOVERY (Priority 80) beats MANUAL_IMPORT (50).
  2. Class Level (Override): e.g., for NETWORKROUTER, CISCO_API (90) may override the global default.
  3. Attribute Level (Micro-Override): A specific attribute, like cpu_count, can be owned by VMWARE_DISCOVERY (100) regardless of any other priority.

The engine builds a “Frankenstein” record using the most trusted source for each attribute.

Phase 3 – Time-Decay Aging (Freshness)

To prevent stale data from a once-trusted source from permanently locking attributes, the engine applies Time Decay.

When enabled (enable_time_decay = true), the effective priority of a source is calculated as:

  1. Days Old = Current Date – Last Update Timestamp

  2. If Days Old <= grace_period_days, Effective Priority = Original Priority.

  3. If Days Old > grace_period_days:

    Penalty = (Days Old - grace_period_days) * decay_rate_per_day

  4. Effective Priority = MAX(Calculated Priority, priority_floor)

Example Scenario

Parameter Value
Original Priority 800
Grace Period 14 days
Decay Rate 10/day
Priority Floor 400
Incoming Source Priority 500
Day Days Past Grace Penalty Effective Priority Overwrite by Source (500)?
1–14 0 0 800 No
24 10 100 700 No
45 31 310 490 Yes

This mechanism allows fresh data from a lower-priority source to naturally take over when the authoritative source goes silent.

Graph Traversal & Service Mapping

PostgreSQL is used as a graph database via Adjacency List modelling and Recursive Common Table Expressions (CTEs).

  • Nodes: Configuration Items stored in cmdb_base_element.
  • Edges: Directional relationships in cmdb_base_relationship with attributes like source_instance_id, target_instance_id, class_id (verb), and dataset_id (always ‘PRODUCTION’).

Impact Analysis (Top-Down)

When a server fails, this walk finds everything that depends on it.

WITH RECURSIVE graph_walk(root_id, parent_id, child_id, depth) AS (
    SELECT
        r.source_instance_id,
        r.source_instance_id,
        r.target_instance_id,
        1
    FROM cmdb_base_relationship r
    JOIN cmdb_base_element ci ON ci.instance_id = r.source_instance_id
    WHERE ci.name ILIKE '%SRV-01%'
      AND ci.class_id = 'ComputerSystem'
      AND ci.dataset_id = 'PRODUCTION'
      AND ci.is_deleted = false
    UNION
    SELECT
        gw.root_id,
        r.source_instance_id,
        r.target_instance_id,
        gw.depth + 1
    FROM cmdb_base_relationship r
    JOIN graph_walk gw ON r.source_instance_id = gw.child_id
    WHERE gw.depth < 6
      AND r.dataset_id = 'PRODUCTION'
)
SELECT DISTINCT ON (gw.root_id, gw.child_id)
    gw.depth,
    gw.child_id as instance_id,
    ci.name,
    ci.class_id,
    ci.ci_status
FROM graph_walk gw
JOIN cmdb_base_element ci ON gw.child_id = ci.instance_id
WHERE ci.is_deleted = false
ORDER BY gw.root_id, gw.child_id, gw.depth ASC;

Dependency Mapping (Bottom-Up)

When an application is slow, this walk identifies the foundational infrastructure supporting it.

WITH RECURSIVE graph_walk(root_id, parent_id, child_id, depth) AS (
    SELECT
        r.target_instance_id,
        r.target_instance_id,
        r.source_instance_id,
        1
    FROM cmdb_base_relationship r
    JOIN cmdb_base_element ci ON ci.instance_id = r.target_instance_id
    WHERE ci.name ILIKE '%DIG-BKS-RAC%'
      AND ci.class_id = 'BusinessService'
      AND ci.dataset_id = 'PRODUCTION'
      AND ci.is_deleted = false
    UNION
    SELECT
        gw.root_id,
        r.target_instance_id,
        r.source_instance_id,
        gw.depth + 1
    FROM cmdb_base_relationship r
    JOIN graph_walk gw ON r.target_instance_id = gw.child_id
    WHERE gw.depth < 6
      AND r.dataset_id = 'PRODUCTION'
)
SELECT DISTINCT ON (gw.root_id, gw.child_id)
    gw.depth,
    gw.child_id as instance_id,
    ci.name,
    ci.class_id,
    ci.ci_status
FROM graph_walk gw
JOIN cmdb_base_element ci ON gw.child_id = ci.instance_id
WHERE ci.is_deleted = false
  AND ci.dataset_id = 'PRODUCTION'
ORDER BY gw.root_id, gw.child_id, gw.depth ASC;

Architectural Safeguards:

  • Depth limiter (``gw.depth < 6``): Prevents infinite loops in cyclic topologies.
  • DISTINCT ON: Returns only the shortest path to each CI.
  • Outer-join metadata: Class names and statuses are fetched outside the recursive loop to avoid query plan degradation.

Security, Identity & IAM

The IAM engine is built on four pillars: Hybrid Authentication, Stateless Tokenization, Role-Based Access Control (RBAC), and Graph-Based Row-Level Security.

Hybrid Authentication

  • Local Auth: Passwords are verified against Bcrypt hashes stored in sys_user.
  • LDAP/AD Auth: If the user is flagged as LDAP, the system binds to the enterprise directory over LDAPS, validates credentials, and dynamically syncs group memberships.

LDAP Configuration (``sys_config.ldap_config``)

{
  "enabled": true,
  "url": "ldaps://ad.company.internal:636",
  "bind_dn": "cn=cmdb_service_account,ou=Services,dc=company,dc=internal",
  "bind_password": "YourSecurePassword",
  "user_base_dn": "ou=Users,dc=company,dc=internal",
  "user_search_filter": "(sAMAccountName={0})",
  "group_base_dn": "ou=Groups,dc=company,dc=internal",
  "group_search_filter": "(member={0})"
}
  • User Base DN restricts login scans to a specific organisational unit.
  • Group Base DN tells the backend where to search for security roles.

Just-In-Time (JIT) Provisioning: On first login, a “Shadow Account” is created in sys_user (with an empty password hash and auth_source = LDAP). Group memberships in sys_user_group_member are wiped and re-inserted at every login, ensuring real-time synchronisation with the directory.

Stateless JWT Sessions

After successful authentication, the server returns a signed JWT containing:

{
  "sub": "jdoe",
  "role": "editor",
  "groups": ["GRP-HR-Admins", "GRP-IT-Viewers"],
  "exp": 1711814400
}

Because the token carries all required claims, backend nodes never need to query the database for the user’s identity.

Role-Based Access Control (RBAC)

System roles are stored in the JWT and enforced on every request.

Role Capabilities
admin Full control: manage config, rules, users; bypasses all Row-Level Security.
editor CRUD on CIs and relationships if Row-Level access permits.
reader Read-only access to authorised CIs, audit logs, and graphs.

Multi-Tenant Row-Level Security

Access to CIs is granted by linking a user’s LDAP/Local group to an Organisation CI via the sys_group_org_access table.

To avoid calculating graph dependencies on every query, a Visibility Caching Engine runs asynchronously:

  1. When a group is linked to an Organisation, a background worker performs a top-down graph walk.
  2. Every CI discovered beneath the Organisation is written to cache_group_visibility along with the group name.
  3. All API queries are transparently joined with this cache:
SELECT ci.* FROM cmdb_base_element ci
JOIN cache_group_visibility v ON ci.instance_id = v.instance_id
WHERE ci.class_id = 'ComputerSystem'
  AND v.group_name = 'GRP-HR-Admins';

This delivers sub-millisecond responses while guaranteeing strict multi-tenant isolation.

Artificial Intelligence (AIOps)

A dual-model cognitive layer translates natural language into database queries and visual summaries.

  • Heavy Engine (NL2SQL): Typically a cloud-based reasoning model (e.g., DeepSeek, GPT-4). It receives the user’s question and a dynamically recompiled system prompt describing the current CMDB schema. It generates a read-only SQL query.
  • Fast Engine (Summarisation & Visualisation): A local model (e.g., Ollama Gemma) that consumes raw database results and produces a human-readable summary plus a widget configuration (SCALAR, PIE_CHART, BAR_CHART, DATA_TABLE). This keeps sensitive data on-premises.

Dynamic Schema Awareness

Whenever a new CI class or custom attribute is added, a cmdb.schema.updated event is broadcast. Every AI Service node intercepts it, rebuilds the system prompt, and stores the new version in sys_ai_prompt_history. The AI is therefore always aware of the latest ontology without a restart.

Example NL2SQL Interaction

User: “Show me a breakdown of all Physical Servers running in Production that currently have Critical vulnerabilities.”

Generated SQL:

SELECT ci.instance_id, ci.name, ci.ci_status, ci.vulnerability, ci.vulnerability_description, ci.class_id, ci.category, ci.type, ci.item, ci.model
FROM cmdb_base_element ci
WHERE ci.dataset_id = 'PRODUCTION'
 AND ci.is_deleted = false
 AND ci.class_id = 'COMPUTERSYSTEM'
 AND ci.vulnerability = 'Yes'
ORDER BY ci.name
LIMIT 500;

AI Response (JSON sent to frontend):

 {
      ...
      "summary": "There is 1 server running in Production that currently has vulnerabilities.",
      ...
      "data": [
              {
                      "instance_id": "08b78964-f4ad-4fc8-bd0e-86b9c15558a7",
                      "name": "Test1",
                      "class_id": "COMPUTERSYSTEM",
                      "type": "Container",
                      "item": "",
                      "model": "AWS",
                      "ci_status": "UP",
                      "vulnerability": "Yes",
                      "vulnerability_description": "Vulnerability CVE-12345"
              }
      ]
}

Stateful Chat Memory

Conversations are stored in sys_ai_chat_sessions and sys_ai_chat_messages.

Event-Driven Automation (Trigger Engine)

The CMDB acts as a real-time automation orchestrator. It listens to the clustered Event Bus for CI/Relationship changes, evaluates rules, and dispatches actions.

Event Bus Architecture

Whenever a CI is created, updated, or deleted (even by background merges), a JSON message is published on the Event Bus:

{
  "operation": "UPDATE",
  "target_type": "CI",
  "class_id": "COMPUTERSYSTEM",
  "instance_id": "ce5de307-87ac-4fb1-9353-9608c19e72c5",
  "payload": {
    "source": "merge-engine",
    "changes": {
      "vulnerability": { "old": "No", "new": "Yes" },
      "ci_status": { "old": "UP", "new": "MAINTENANCE" }
    }
  }
}

The TriggerEngine consumes these events and evaluates matching rules.

Rule Evaluation

Rules are defined per CI class and contain condition arrays. Supported operators:

  • EQUALS / NOT_EQUALS
  • CHANGED
  • CHANGED_TO
  • CONTAINS / REGEX

If all conditions evaluate to TRUE, the configured actions are executed.

Action Dispatcher

The engine supports multiple native action types:

Type Description
WEBHOOK POST/PUT/PATCH JSON payloads to external REST APIs (e.g., Ansible Tower, ServiceNow). Variable injection (${instance_id}) is supported.
SLACK_TEAMS Posts formatted, colour-coded messages to collaboration channels.
EMAIL Sends SMTP emails using the Vert.x Mail Client. Recipients can be dynamically pulled from CI attributes.

Example Ransomware Containment Workflow:

  1. A Tenable Nessus scan updates a server’s vulnerability to “Yes” in SANDBOX.
  2. The Merge Engine publishes an UPDATE event.
  3. The trigger rule vulnerability CHANGED_TO "Yes" fires.
  4. A Slack alert is sent and a webhook posts the instance_id to Ansible Tower.
  5. Ansible isolates the server from the network.

All of this happens in milliseconds, outside the user’s request context.

Audit, Compliance & History

The CMDB captures an immutable, millisecond-precise record of every meaningful change.

Attribute-Level Diff Engine

Instead of copying an entire CI row for every update, the engine calculates a precise JSON diff. It ignores ephemeral fields like last_seen_date and only records real changes.

"changes": {
  "ci_status": { "old": "MAINTENANCE", "new": "UP" },
  "attr_ram_gb": { "old": "16", "new": "32" }
}

If no data actually changes, the audit entry is aborted.

Relational Neighborhood Snapshots

At the moment an audit record is written, the engine captures a snapshot of all active inbound and outbound relationships (via the reconciliation_identity), together with the health status of neighbouring CIs. This “Topology Time Machine” allows engineers to see exactly what depended on a failed CI at the time of the incident, without complex temporal joins.

Example snapshot payload:

"neighborhood_snapshot": [
  {
    "relationship_id": "f50cd8f0-...",
    "class_id": "RUNS_ON",
    "direction": "INBOUND",
    "related_ci_name": "Payroll-App-Prod",
    "related_ci_status": "UP"
  }
]

Immutable Metadata

Every audit record carries:

  • audit_id (UUID)
  • target_instance_id, target_type (CI or RELATION)
  • operation (CREATE, UPDATE, DELETE)
  • changed_by (username or service account)
  • dataset_id
  • change_time (timestamp)

Automated Data Retention

A background watchdog (runLogRetention) periodically purges aged records from:

  • sys_access_log (access logins)
  • sys_ai_chat_sessions (AI conversations, cascading to messages)
  • sys_activity_logs (background job histories)

Retention periods are configurable in sys_config.

Enterprise High Availability & Operations

The platform is engineered for Active-Active High Availability with self-healing capabilities.

Hazelcast Clustering

Vert.x nodes discover each other (via multicast or explicit TCP/IP) and form a unified mesh. This enables:

  • Distributed Event Bus: A trigger event generated on Node A can be consumed by Node B.
  • Distributed Caching: The IAM Visibility Cache is shared across all cluster members.

If a node crashes, the remaining nodes instantly absorb its workload.

Self-Healing Watchdogs

Long-running jobs periodically write heartbeats. A watchdog sweeps the execution tables:

  • If a job is RUNNING but its heartbeat is older than staleJobTimeoutMinutes (default 30), it is forcefully killed and set to FAILED.
  • Orphaned queues (jobs stuck for 24+ hours) are cleaned up, allowing the engine to resume processing.

Zero-Downtime Hot Reloading

Configuration changes (e.g., toggling audit logging, adding a new CI class) are broadcast via the clustered Event Bus. All nodes intercept the message and silently reload their in-memory settings without dropping requests.

Cryptographic Licensing

The platform expects a JWT-signed license key. Every 60 seconds it validates:

  • Maximum HA nodes
  • Maximum CIs in PRODUCTION
  • Number of allowed Tenants and Write-Users
  • AI entitlement (enables/disables the NL2SQL endpoints)

A node attempting to join a fully licensed cluster is gracefully rejected.