Key Concepts¶

Core Data Architecture¶

The CMDB uses Single-Table Inheritance with Partitioning to manage Configuration Items (CIs) and their relationships.

Configuration Items (CIs): Stored in cmdb_base_element, they combine strict core attributes (Name, Status, Class) with fully flexible, schema-less extended attributes stored in a JSONB column.
Directional Relationships: Connections between CIs (e.g., RUNS_ON, DEPENDENCY) are tracked in cmdb_base_relationship, enforcing strict Source-to-Target cardinality.
Ontology & Schema Dictionary: A native management system for CI Classes, custom attributes, and a taxonomy (Category, Type, Item, Model) to standardise infrastructure definitions.
Dataset Isolation: Data is segregated into datasets (e.g., PRODUCTION, SANDBOX) via PostgreSQL table partitioning. This ensures massive read/write scalability and prevents data corruption during ingestion.

CI Class Inheritance¶

All CIs derive from a foundational hierarchy based on the DMTF CIM Schema. Below are the core properties of the base classes.

Foundation Classes (ManagedElement, LogicalElement, etc.)

Property	Data Type	Description
InstanceID	string	Opaque unique identifier (OrgID:LocalID pattern)
Caption	string	Short textual description (max 64 chars)
Description	string	Detailed textual description
ElementName	string	User-friendly name for the instance
InstallDate	datetime	When the object was installed
Name	string	Label by which the object is known (max 256 chars)
Status	string	Current status: OK, Error, Degraded, Pred Fail…

PhysicalElement

Property	Data Type	Description
Manufacturer	string	Organization that produced the device
Model	string	Name by which the physical element is known
SerialNumber	string	Manufacturer-allocated serial number
Tag	string	Uniquely identifies the physical element
Version	string	Version of the physical element

ComputerSystem

Property	Data Type	Description
NameFormat	string	How the ComputerSystem Name is generated (e.g., IP, UUID)
Dedicated[]	uint16 array	Purpose(s) of the system (Switch, Firewall, …)
ResetCapability	uint16	Hardware reset capability

UnitaryComputerSystem

Property	Data Type	Description
InitialLoadInfo[]	string array	Data to locate boot device
LastLoadInfo	string	Identifies device that last loaded the OS
PowerManagementCapabilities[]	uint16 array	Power management capabilities
PowerState	uint16	Current power state (Full Power, Standby, …)

The full DMTF CIM schema (v2.38) is used as a reference. MOF files are located under the Schemas/CIM238/DMTF/ tree.

Reconciliation & Merge Engine¶

The Merge Engine is the “brain” of the CMDB, ensuring data from multiple, often conflicting sources is deduplicated and merged into a single “Golden Record”. It operates in three distinct phases.

Phase 1 – Identification (Match)¶

The engine uses configurable Identification Rules (stored in cmdb_ident_rules) evaluated in order of priority.

If a match is found (e.g., on unique_identifier, serial_number + model, or name + class_id), the staging CI is linked to the production CI via a shared reconciliation_identity (UUID).
If no match is found, the CI is treated as new and a fresh Golden Record is created.

Phase 2 – Dynamic Precedence (Win)¶

Conflicts are resolved using a granular hierarchy:

Dataset Level (Global): e.g., AWS_DISCOVERY (Priority 80) beats MANUAL_IMPORT (50).
Class Level (Override): e.g., for NETWORKROUTER, CISCO_API (90) may override the global default.
Attribute Level (Micro-Override): A specific attribute, like cpu_count, can be owned by VMWARE_DISCOVERY (100) regardless of any other priority.

The engine builds a “Frankenstein” record using the most trusted source for each attribute.

Phase 3 – Time-Decay Aging (Freshness)¶

To prevent stale data from a once-trusted source from permanently locking attributes, the engine applies Time Decay.

When enabled (enable_time_decay = true), the effective priority of a source is calculated as:

Days Old = Current Date – Last Update Timestamp
If Days Old <= grace_period_days, Effective Priority = Original Priority.
If Days Old > grace_period_days:

Penalty = (Days Old - grace_period_days) * decay_rate_per_day
Effective Priority = MAX(Calculated Priority, priority_floor)

Example Scenario

Parameter	Value
Original Priority	800
Grace Period	14 days
Decay Rate	10/day
Priority Floor	400
Incoming Source Priority	500

Day	Days Past Grace	Penalty	Effective Priority	Overwrite by Source (500)?
1–14	0	0	800	No
24	10	100	700	No
45	31	310	490	Yes

This mechanism allows fresh data from a lower-priority source to naturally take over when the authoritative source goes silent.

Graph Traversal & Service Mapping¶

PostgreSQL is used as a graph database via Adjacency List modelling and Recursive Common Table Expressions (CTEs).

Nodes: Configuration Items stored in cmdb_base_element.
Edges: Directional relationships in cmdb_base_relationship with attributes like source_instance_id, target_instance_id, class_id (verb), and dataset_id (always ‘PRODUCTION’).

Impact Analysis (Top-Down)¶

When a server fails, this walk finds everything that depends on it.

WITH RECURSIVE graph_walk(root_id, parent_id, child_id, depth) AS (
    SELECT
        r.source_instance_id,
        r.source_instance_id,
        r.target_instance_id,
        1
    FROM cmdb_base_relationship r
    JOIN cmdb_base_element ci ON ci.instance_id = r.source_instance_id
    WHERE ci.name ILIKE '%SRV-01%'
      AND ci.class_id = 'ComputerSystem'
      AND ci.dataset_id = 'PRODUCTION'
      AND ci.is_deleted = false
    UNION
    SELECT
        gw.root_id,
        r.source_instance_id,
        r.target_instance_id,
        gw.depth + 1
    FROM cmdb_base_relationship r
    JOIN graph_walk gw ON r.source_instance_id = gw.child_id
    WHERE gw.depth < 6
      AND r.dataset_id = 'PRODUCTION'
)
SELECT DISTINCT ON (gw.root_id, gw.child_id)
    gw.depth,
    gw.child_id as instance_id,
    ci.name,
    ci.class_id,
    ci.ci_status
FROM graph_walk gw
JOIN cmdb_base_element ci ON gw.child_id = ci.instance_id
WHERE ci.is_deleted = false
ORDER BY gw.root_id, gw.child_id, gw.depth ASC;

Dependency Mapping (Bottom-Up)¶

When an application is slow, this walk identifies the foundational infrastructure supporting it.

WITH RECURSIVE graph_walk(root_id, parent_id, child_id, depth) AS (
    SELECT
        r.target_instance_id,
        r.target_instance_id,
        r.source_instance_id,
        1
    FROM cmdb_base_relationship r
    JOIN cmdb_base_element ci ON ci.instance_id = r.target_instance_id
    WHERE ci.name ILIKE '%DIG-BKS-RAC%'
      AND ci.class_id = 'BusinessService'
      AND ci.dataset_id = 'PRODUCTION'
      AND ci.is_deleted = false
    UNION
    SELECT
        gw.root_id,
        r.target_instance_id,
        r.source_instance_id,
        gw.depth + 1
    FROM cmdb_base_relationship r
    JOIN graph_walk gw ON r.target_instance_id = gw.child_id
    WHERE gw.depth < 6
      AND r.dataset_id = 'PRODUCTION'
)
SELECT DISTINCT ON (gw.root_id, gw.child_id)
    gw.depth,
    gw.child_id as instance_id,
    ci.name,
    ci.class_id,
    ci.ci_status
FROM graph_walk gw
JOIN cmdb_base_element ci ON gw.child_id = ci.instance_id
WHERE ci.is_deleted = false
  AND ci.dataset_id = 'PRODUCTION'
ORDER BY gw.root_id, gw.child_id, gw.depth ASC;

Architectural Safeguards:

Depth limiter (``gw.depth < 6``): Prevents infinite loops in cyclic topologies.
DISTINCT ON: Returns only the shortest path to each CI.
Outer-join metadata: Class names and statuses are fetched outside the recursive loop to avoid query plan degradation.

Security, Identity & IAM¶

The IAM engine is built on four pillars: Hybrid Authentication, Stateless Tokenization, Role-Based Access Control (RBAC), and Graph-Based Row-Level Security.

Hybrid Authentication¶

Local Auth: Passwords are verified against Bcrypt hashes stored in sys_user.
LDAP/AD Auth: If the user is flagged as LDAP, the system binds to the enterprise directory over LDAPS, validates credentials, and dynamically syncs group memberships.

LDAP Configuration (``sys_config.ldap_config``)

{
  "enabled": true,
  "url": "ldaps://ad.company.internal:636",
  "bind_dn": "cn=cmdb_service_account,ou=Services,dc=company,dc=internal",
  "bind_password": "YourSecurePassword",
  "user_base_dn": "ou=Users,dc=company,dc=internal",
  "user_search_filter": "(sAMAccountName={0})",
  "group_base_dn": "ou=Groups,dc=company,dc=internal",
  "group_search_filter": "(member={0})"
}

User Base DN restricts login scans to a specific organisational unit.
Group Base DN tells the backend where to search for security roles.

Just-In-Time (JIT) Provisioning: On first login, a “Shadow Account” is created in sys_user (with an empty password hash and auth_source = LDAP). Group memberships in sys_user_group_member are wiped and re-inserted at every login, ensuring real-time synchronisation with the directory.

Stateless JWT Sessions¶

After successful authentication, the server returns a signed JWT containing:

{
  "sub": "jdoe",
  "role": "editor",
  "groups": ["GRP-HR-Admins", "GRP-IT-Viewers"],
  "exp": 1711814400
}

Because the token carries all required claims, backend nodes never need to query the database for the user’s identity.

Role-Based Access Control (RBAC)¶

System roles are stored in the JWT and enforced on every request.

Role	Capabilities
admin	Full control: manage config, rules, users; bypasses all Row-Level Security.
editor	CRUD on CIs and relationships if Row-Level access permits.
reader	Read-only access to authorised CIs, audit logs, and graphs.

Multi-Tenant Row-Level Security¶

Access to CIs is granted by linking a user’s LDAP/Local group to an Organisation CI via the sys_group_org_access table.

To avoid calculating graph dependencies on every query, a Visibility Caching Engine runs asynchronously:

When a group is linked to an Organisation, a background worker performs a top-down graph walk.
Every CI discovered beneath the Organisation is written to cache_group_visibility along with the group name.
All API queries are transparently joined with this cache:

SELECT ci.* FROM cmdb_base_element ci
JOIN cache_group_visibility v ON ci.instance_id = v.instance_id
WHERE ci.class_id = 'ComputerSystem'
  AND v.group_name = 'GRP-HR-Admins';

This delivers sub-millisecond responses while guaranteeing strict multi-tenant isolation.

Artificial Intelligence (AIOps)¶

A dual-model cognitive layer translates natural language into database queries and visual summaries.

Heavy Engine (NL2SQL): Typically a cloud-based reasoning model (e.g., DeepSeek, GPT-4). It receives the user’s question and a dynamically recompiled system prompt describing the current CMDB schema. It generates a read-only SQL query.
Fast Engine (Summarisation & Visualisation): A local model (e.g., Ollama Gemma) that consumes raw database results and produces a human-readable summary plus a widget configuration (SCALAR, PIE_CHART, BAR_CHART, DATA_TABLE). This keeps sensitive data on-premises.

Dynamic Schema Awareness¶

Whenever a new CI class or custom attribute is added, a cmdb.schema.updated event is broadcast. Every AI Service node intercepts it, rebuilds the system prompt, and stores the new version in sys_ai_prompt_history. The AI is therefore always aware of the latest ontology without a restart.

Example NL2SQL Interaction¶

User: “Show me a breakdown of all Physical Servers running in Production that currently have Critical vulnerabilities.”

Generated SQL:

SELECT ci.instance_id, ci.name, ci.ci_status, ci.vulnerability, ci.vulnerability_description, ci.class_id, ci.category, ci.type, ci.item, ci.model
FROM cmdb_base_element ci
WHERE ci.dataset_id = 'PRODUCTION'
 AND ci.is_deleted = false
 AND ci.class_id = 'COMPUTERSYSTEM'
 AND ci.vulnerability = 'Yes'
ORDER BY ci.name
LIMIT 500;

AI Response (JSON sent to frontend):

 {
      ...
      "summary": "There is 1 server running in Production that currently has vulnerabilities.",
      ...
      "data": [
              {
                      "instance_id": "08b78964-f4ad-4fc8-bd0e-86b9c15558a7",
                      "name": "Test1",
                      "class_id": "COMPUTERSYSTEM",
                      "type": "Container",
                      "item": "",
                      "model": "AWS",
                      "ci_status": "UP",
                      "vulnerability": "Yes",
                      "vulnerability_description": "Vulnerability CVE-12345"
              }
      ]
}

Stateful Chat Memory¶

Conversations are stored in sys_ai_chat_sessions and sys_ai_chat_messages.

Event-Driven Automation (Trigger Engine)¶

The CMDB acts as a real-time automation orchestrator. It listens to the clustered Event Bus for CI/Relationship changes, evaluates rules, and dispatches actions.

Event Bus Architecture¶

Whenever a CI is created, updated, or deleted (even by background merges), a JSON message is published on the Event Bus:

{
  "operation": "UPDATE",
  "target_type": "CI",
  "class_id": "COMPUTERSYSTEM",
  "instance_id": "ce5de307-87ac-4fb1-9353-9608c19e72c5",
  "payload": {
    "source": "merge-engine",
    "changes": {
      "vulnerability": { "old": "No", "new": "Yes" },
      "ci_status": { "old": "UP", "new": "MAINTENANCE" }
    }
  }
}

The TriggerEngine consumes these events and evaluates matching rules.

Rule Evaluation¶

Rules are defined per CI class and contain condition arrays. Supported operators:

EQUALS / NOT_EQUALS
CHANGED
CHANGED_TO
CONTAINS / REGEX

If all conditions evaluate to TRUE, the configured actions are executed.

Action Dispatcher¶

The engine supports multiple native action types:

Type	Description
WEBHOOK	POST/PUT/PATCH JSON payloads to external REST APIs (e.g., Ansible Tower, ServiceNow). Variable injection (`${instance_id}`) is supported.
SLACK_TEAMS	Posts formatted, colour-coded messages to collaboration channels.
EMAIL	Sends SMTP emails using the Vert.x Mail Client. Recipients can be dynamically pulled from CI attributes.

Example Ransomware Containment Workflow:

A Tenable Nessus scan updates a server’s vulnerability to “Yes” in SANDBOX.
The Merge Engine publishes an UPDATE event.
The trigger rule vulnerability CHANGED_TO "Yes" fires.
A Slack alert is sent and a webhook posts the instance_id to Ansible Tower.
Ansible isolates the server from the network.

All of this happens in milliseconds, outside the user’s request context.

Audit, Compliance & History¶

The CMDB captures an immutable, millisecond-precise record of every meaningful change.

Attribute-Level Diff Engine¶

Instead of copying an entire CI row for every update, the engine calculates a precise JSON diff. It ignores ephemeral fields like last_seen_date and only records real changes.

"changes": {
  "ci_status": { "old": "MAINTENANCE", "new": "UP" },
  "attr_ram_gb": { "old": "16", "new": "32" }
}

If no data actually changes, the audit entry is aborted.

Relational Neighborhood Snapshots¶

At the moment an audit record is written, the engine captures a snapshot of all active inbound and outbound relationships (via the reconciliation_identity), together with the health status of neighbouring CIs. This “Topology Time Machine” allows engineers to see exactly what depended on a failed CI at the time of the incident, without complex temporal joins.

Example snapshot payload:

"neighborhood_snapshot": [
  {
    "relationship_id": "f50cd8f0-...",
    "class_id": "RUNS_ON",
    "direction": "INBOUND",
    "related_ci_name": "Payroll-App-Prod",
    "related_ci_status": "UP"
  }
]

Immutable Metadata¶

Every audit record carries:

audit_id (UUID)
target_instance_id, target_type (CI or RELATION)
operation (CREATE, UPDATE, DELETE)
changed_by (username or service account)
dataset_id
change_time (timestamp)

Automated Data Retention¶

A background watchdog (runLogRetention) periodically purges aged records from:

sys_access_log (access logins)
sys_ai_chat_sessions (AI conversations, cascading to messages)
sys_activity_logs (background job histories)

Retention periods are configurable in sys_config.

Enterprise High Availability & Operations¶

The platform is engineered for Active-Active High Availability with self-healing capabilities.

Hazelcast Clustering¶

Vert.x nodes discover each other (via multicast or explicit TCP/IP) and form a unified mesh. This enables:

Distributed Event Bus: A trigger event generated on Node A can be consumed by Node B.
Distributed Caching: The IAM Visibility Cache is shared across all cluster members.

If a node crashes, the remaining nodes instantly absorb its workload.

Self-Healing Watchdogs¶

Long-running jobs periodically write heartbeats. A watchdog sweeps the execution tables:

If a job is RUNNING but its heartbeat is older than staleJobTimeoutMinutes (default 30), it is forcefully killed and set to FAILED.
Orphaned queues (jobs stuck for 24+ hours) are cleaned up, allowing the engine to resume processing.

Zero-Downtime Hot Reloading¶

Configuration changes (e.g., toggling audit logging, adding a new CI class) are broadcast via the clustered Event Bus. All nodes intercept the message and silently reload their in-memory settings without dropping requests.

Cryptographic Licensing¶

The platform expects a JWT-signed license key. Every 60 seconds it validates:

Maximum HA nodes
Maximum CIs in PRODUCTION
Number of allowed Tenants and Write-Users
AI entitlement (enables/disables the NL2SQL endpoints)

A node attempting to join a fully licensed cluster is gracefully rejected.