Key Concepts¶
Core Data Architecture¶
The CMDB uses Single-Table Inheritance with Partitioning to manage Configuration Items (CIs) and their relationships.
- Configuration Items (CIs): Stored in
cmdb_base_element, they combine strict core attributes (Name, Status, Class) with fully flexible, schema-less extended attributes stored in a JSONB column. - Directional Relationships: Connections between CIs (e.g.,
RUNS_ON,DEPENDENCY) are tracked incmdb_base_relationship, enforcing strict Source-to-Target cardinality. - Ontology & Schema Dictionary: A native management system for CI Classes, custom attributes, and a taxonomy (Category, Type, Item, Model) to standardise infrastructure definitions.
- Dataset Isolation: Data is segregated into datasets (e.g.,
PRODUCTION,SANDBOX) via PostgreSQL table partitioning. This ensures massive read/write scalability and prevents data corruption during ingestion.
CI Class Inheritance¶
All CIs derive from a foundational hierarchy based on the DMTF CIM Schema. Below are the core properties of the base classes.
Foundation Classes (ManagedElement, LogicalElement, etc.)
| Property | Data Type | Description |
|---|---|---|
| InstanceID | string | Opaque unique identifier (OrgID:LocalID pattern) |
| Caption | string | Short textual description (max 64 chars) |
| Description | string | Detailed textual description |
| ElementName | string | User-friendly name for the instance |
| InstallDate | datetime | When the object was installed |
| Name | string | Label by which the object is known (max 256 chars) |
| Status | string | Current status: OK, Error, Degraded, Pred Fail… |
PhysicalElement
| Property | Data Type | Description |
|---|---|---|
| Manufacturer | string | Organization that produced the device |
| Model | string | Name by which the physical element is known |
| SerialNumber | string | Manufacturer-allocated serial number |
| Tag | string | Uniquely identifies the physical element |
| Version | string | Version of the physical element |
ComputerSystem
| Property | Data Type | Description |
|---|---|---|
| NameFormat | string | How the ComputerSystem Name is generated (e.g., IP, UUID) |
| Dedicated[] | uint16 array | Purpose(s) of the system (Switch, Firewall, …) |
| ResetCapability | uint16 | Hardware reset capability |
UnitaryComputerSystem
| Property | Data Type | Description |
|---|---|---|
| InitialLoadInfo[] | string array | Data to locate boot device |
| LastLoadInfo | string | Identifies device that last loaded the OS |
| PowerManagementCapabilities[] | uint16 array | Power management capabilities |
| PowerState | uint16 | Current power state (Full Power, Standby, …) |
The full DMTF CIM schema (v2.38) is used as a reference. MOF files are
located under the Schemas/CIM238/DMTF/ tree.
Reconciliation & Merge Engine¶
The Merge Engine is the “brain” of the CMDB, ensuring data from multiple, often conflicting sources is deduplicated and merged into a single “Golden Record”. It operates in three distinct phases.
Phase 1 – Identification (Match)¶
The engine uses configurable Identification Rules (stored in
cmdb_ident_rules) evaluated in order of priority.
- If a match is found (e.g., on
unique_identifier,serial_number + model, orname + class_id), the staging CI is linked to the production CI via a sharedreconciliation_identity(UUID). - If no match is found, the CI is treated as new and a fresh Golden Record is created.
Phase 2 – Dynamic Precedence (Win)¶
Conflicts are resolved using a granular hierarchy:
- Dataset Level (Global): e.g.,
AWS_DISCOVERY(Priority 80) beatsMANUAL_IMPORT(50). - Class Level (Override): e.g., for
NETWORKROUTER,CISCO_API(90) may override the global default. - Attribute Level (Micro-Override): A specific attribute, like
cpu_count, can be owned byVMWARE_DISCOVERY(100) regardless of any other priority.
The engine builds a “Frankenstein” record using the most trusted source for each attribute.
Phase 3 – Time-Decay Aging (Freshness)¶
To prevent stale data from a once-trusted source from permanently locking attributes, the engine applies Time Decay.
When enabled (enable_time_decay = true), the effective priority of a
source is calculated as:
Days Old = Current Date – Last Update Timestamp
If Days Old <= grace_period_days, Effective Priority = Original Priority.
If Days Old > grace_period_days:
Penalty = (Days Old - grace_period_days) * decay_rate_per_dayEffective Priority = MAX(Calculated Priority, priority_floor)
Example Scenario
| Parameter | Value |
|---|---|
| Original Priority | 800 |
| Grace Period | 14 days |
| Decay Rate | 10/day |
| Priority Floor | 400 |
| Incoming Source Priority | 500 |
| Day | Days Past Grace | Penalty | Effective Priority | Overwrite by Source (500)? |
|---|---|---|---|---|
| 1–14 | 0 | 0 | 800 | No |
| 24 | 10 | 100 | 700 | No |
| 45 | 31 | 310 | 490 | Yes |
This mechanism allows fresh data from a lower-priority source to naturally take over when the authoritative source goes silent.
Graph Traversal & Service Mapping¶
PostgreSQL is used as a graph database via Adjacency List modelling and Recursive Common Table Expressions (CTEs).
- Nodes: Configuration Items stored in
cmdb_base_element. - Edges: Directional relationships in
cmdb_base_relationshipwith attributes likesource_instance_id,target_instance_id,class_id(verb), anddataset_id(always ‘PRODUCTION’).
Impact Analysis (Top-Down)¶
When a server fails, this walk finds everything that depends on it.
WITH RECURSIVE graph_walk(root_id, parent_id, child_id, depth) AS (
SELECT
r.source_instance_id,
r.source_instance_id,
r.target_instance_id,
1
FROM cmdb_base_relationship r
JOIN cmdb_base_element ci ON ci.instance_id = r.source_instance_id
WHERE ci.name ILIKE '%SRV-01%'
AND ci.class_id = 'ComputerSystem'
AND ci.dataset_id = 'PRODUCTION'
AND ci.is_deleted = false
UNION
SELECT
gw.root_id,
r.source_instance_id,
r.target_instance_id,
gw.depth + 1
FROM cmdb_base_relationship r
JOIN graph_walk gw ON r.source_instance_id = gw.child_id
WHERE gw.depth < 6
AND r.dataset_id = 'PRODUCTION'
)
SELECT DISTINCT ON (gw.root_id, gw.child_id)
gw.depth,
gw.child_id as instance_id,
ci.name,
ci.class_id,
ci.ci_status
FROM graph_walk gw
JOIN cmdb_base_element ci ON gw.child_id = ci.instance_id
WHERE ci.is_deleted = false
ORDER BY gw.root_id, gw.child_id, gw.depth ASC;
Dependency Mapping (Bottom-Up)¶
When an application is slow, this walk identifies the foundational infrastructure supporting it.
WITH RECURSIVE graph_walk(root_id, parent_id, child_id, depth) AS (
SELECT
r.target_instance_id,
r.target_instance_id,
r.source_instance_id,
1
FROM cmdb_base_relationship r
JOIN cmdb_base_element ci ON ci.instance_id = r.target_instance_id
WHERE ci.name ILIKE '%DIG-BKS-RAC%'
AND ci.class_id = 'BusinessService'
AND ci.dataset_id = 'PRODUCTION'
AND ci.is_deleted = false
UNION
SELECT
gw.root_id,
r.target_instance_id,
r.source_instance_id,
gw.depth + 1
FROM cmdb_base_relationship r
JOIN graph_walk gw ON r.target_instance_id = gw.child_id
WHERE gw.depth < 6
AND r.dataset_id = 'PRODUCTION'
)
SELECT DISTINCT ON (gw.root_id, gw.child_id)
gw.depth,
gw.child_id as instance_id,
ci.name,
ci.class_id,
ci.ci_status
FROM graph_walk gw
JOIN cmdb_base_element ci ON gw.child_id = ci.instance_id
WHERE ci.is_deleted = false
AND ci.dataset_id = 'PRODUCTION'
ORDER BY gw.root_id, gw.child_id, gw.depth ASC;
Architectural Safeguards:
- Depth limiter (``gw.depth < 6``): Prevents infinite loops in cyclic topologies.
- DISTINCT ON: Returns only the shortest path to each CI.
- Outer-join metadata: Class names and statuses are fetched outside the recursive loop to avoid query plan degradation.
Security, Identity & IAM¶
The IAM engine is built on four pillars: Hybrid Authentication, Stateless Tokenization, Role-Based Access Control (RBAC), and Graph-Based Row-Level Security.
Hybrid Authentication¶
- Local Auth: Passwords are verified against Bcrypt hashes
stored in
sys_user. - LDAP/AD Auth: If the user is flagged as LDAP, the system binds to the enterprise directory over LDAPS, validates credentials, and dynamically syncs group memberships.
LDAP Configuration (``sys_config.ldap_config``)
{
"enabled": true,
"url": "ldaps://ad.company.internal:636",
"bind_dn": "cn=cmdb_service_account,ou=Services,dc=company,dc=internal",
"bind_password": "YourSecurePassword",
"user_base_dn": "ou=Users,dc=company,dc=internal",
"user_search_filter": "(sAMAccountName={0})",
"group_base_dn": "ou=Groups,dc=company,dc=internal",
"group_search_filter": "(member={0})"
}
- User Base DN restricts login scans to a specific organisational unit.
- Group Base DN tells the backend where to search for security roles.
Just-In-Time (JIT) Provisioning: On first login, a “Shadow Account”
is created in sys_user (with an empty password hash and
auth_source = LDAP). Group memberships in
sys_user_group_member are wiped and re-inserted at every login,
ensuring real-time synchronisation with the directory.
Stateless JWT Sessions¶
After successful authentication, the server returns a signed JWT containing:
{
"sub": "jdoe",
"role": "editor",
"groups": ["GRP-HR-Admins", "GRP-IT-Viewers"],
"exp": 1711814400
}
Because the token carries all required claims, backend nodes never need to query the database for the user’s identity.
Role-Based Access Control (RBAC)¶
System roles are stored in the JWT and enforced on every request.
| Role | Capabilities |
|---|---|
| admin | Full control: manage config, rules, users; bypasses all Row-Level Security. |
| editor | CRUD on CIs and relationships if Row-Level access permits. |
| reader | Read-only access to authorised CIs, audit logs, and graphs. |
Multi-Tenant Row-Level Security¶
Access to CIs is granted by linking a user’s LDAP/Local group to an
Organisation CI via the sys_group_org_access table.
To avoid calculating graph dependencies on every query, a Visibility Caching Engine runs asynchronously:
- When a group is linked to an Organisation, a background worker performs a top-down graph walk.
- Every CI discovered beneath the Organisation is written to
cache_group_visibilityalong with the group name. - All API queries are transparently joined with this cache:
SELECT ci.* FROM cmdb_base_element ci
JOIN cache_group_visibility v ON ci.instance_id = v.instance_id
WHERE ci.class_id = 'ComputerSystem'
AND v.group_name = 'GRP-HR-Admins';
This delivers sub-millisecond responses while guaranteeing strict multi-tenant isolation.
Artificial Intelligence (AIOps)¶
A dual-model cognitive layer translates natural language into database queries and visual summaries.
- Heavy Engine (NL2SQL): Typically a cloud-based reasoning model (e.g., DeepSeek, GPT-4). It receives the user’s question and a dynamically recompiled system prompt describing the current CMDB schema. It generates a read-only SQL query.
- Fast Engine (Summarisation & Visualisation): A local model (e.g., Ollama Gemma) that consumes raw database results and produces a human-readable summary plus a widget configuration (SCALAR, PIE_CHART, BAR_CHART, DATA_TABLE). This keeps sensitive data on-premises.
Dynamic Schema Awareness¶
Whenever a new CI class or custom attribute is added, a
cmdb.schema.updated event is broadcast. Every AI Service node
intercepts it, rebuilds the system prompt, and stores the new version in
sys_ai_prompt_history. The AI is therefore always aware of the
latest ontology without a restart.
Example NL2SQL Interaction¶
User: “Show me a breakdown of all Physical Servers running in Production that currently have Critical vulnerabilities.”
Generated SQL:
SELECT ci.instance_id, ci.name, ci.ci_status, ci.vulnerability, ci.vulnerability_description, ci.class_id, ci.category, ci.type, ci.item, ci.model
FROM cmdb_base_element ci
WHERE ci.dataset_id = 'PRODUCTION'
AND ci.is_deleted = false
AND ci.class_id = 'COMPUTERSYSTEM'
AND ci.vulnerability = 'Yes'
ORDER BY ci.name
LIMIT 500;
AI Response (JSON sent to frontend):
{
...
"summary": "There is 1 server running in Production that currently has vulnerabilities.",
...
"data": [
{
"instance_id": "08b78964-f4ad-4fc8-bd0e-86b9c15558a7",
"name": "Test1",
"class_id": "COMPUTERSYSTEM",
"type": "Container",
"item": "",
"model": "AWS",
"ci_status": "UP",
"vulnerability": "Yes",
"vulnerability_description": "Vulnerability CVE-12345"
}
]
}
Stateful Chat Memory¶
Conversations are stored in sys_ai_chat_sessions and
sys_ai_chat_messages.
Event-Driven Automation (Trigger Engine)¶
The CMDB acts as a real-time automation orchestrator. It listens to the clustered Event Bus for CI/Relationship changes, evaluates rules, and dispatches actions.
Event Bus Architecture¶
Whenever a CI is created, updated, or deleted (even by background merges), a JSON message is published on the Event Bus:
{
"operation": "UPDATE",
"target_type": "CI",
"class_id": "COMPUTERSYSTEM",
"instance_id": "ce5de307-87ac-4fb1-9353-9608c19e72c5",
"payload": {
"source": "merge-engine",
"changes": {
"vulnerability": { "old": "No", "new": "Yes" },
"ci_status": { "old": "UP", "new": "MAINTENANCE" }
}
}
}
The TriggerEngine consumes these events and evaluates matching
rules.
Rule Evaluation¶
Rules are defined per CI class and contain condition arrays. Supported operators:
EQUALS/NOT_EQUALSCHANGEDCHANGED_TOCONTAINS/REGEX
If all conditions evaluate to TRUE, the configured actions are
executed.
Action Dispatcher¶
The engine supports multiple native action types:
| Type | Description |
|---|---|
| WEBHOOK | POST/PUT/PATCH JSON payloads to external REST APIs (e.g., Ansible Tower, ServiceNow). Variable injection (${instance_id}) is supported. |
| SLACK_TEAMS | Posts formatted, colour-coded messages to collaboration channels. |
| Sends SMTP emails using the Vert.x Mail Client. Recipients can be dynamically pulled from CI attributes. |
Example Ransomware Containment Workflow:
- A Tenable Nessus scan updates a server’s
vulnerabilityto “Yes” in SANDBOX. - The Merge Engine publishes an
UPDATEevent. - The trigger rule
vulnerability CHANGED_TO "Yes"fires. - A Slack alert is sent and a webhook posts the
instance_idto Ansible Tower. - Ansible isolates the server from the network.
All of this happens in milliseconds, outside the user’s request context.
Audit, Compliance & History¶
The CMDB captures an immutable, millisecond-precise record of every meaningful change.
Attribute-Level Diff Engine¶
Instead of copying an entire CI row for every update, the engine
calculates a precise JSON diff. It ignores ephemeral fields like
last_seen_date and only records real changes.
"changes": {
"ci_status": { "old": "MAINTENANCE", "new": "UP" },
"attr_ram_gb": { "old": "16", "new": "32" }
}
If no data actually changes, the audit entry is aborted.
Relational Neighborhood Snapshots¶
At the moment an audit record is written, the engine captures a snapshot
of all active inbound and outbound relationships (via the
reconciliation_identity), together with the health status of
neighbouring CIs. This “Topology Time Machine” allows engineers to see
exactly what depended on a failed CI at the time of the incident,
without complex temporal joins.
Example snapshot payload:
"neighborhood_snapshot": [
{
"relationship_id": "f50cd8f0-...",
"class_id": "RUNS_ON",
"direction": "INBOUND",
"related_ci_name": "Payroll-App-Prod",
"related_ci_status": "UP"
}
]
Immutable Metadata¶
Every audit record carries:
audit_id(UUID)target_instance_id,target_type(CI or RELATION)operation(CREATE, UPDATE, DELETE)changed_by(username or service account)dataset_idchange_time(timestamp)
Automated Data Retention¶
A background watchdog (runLogRetention) periodically purges aged
records from:
sys_access_log(access logins)sys_ai_chat_sessions(AI conversations, cascading to messages)sys_activity_logs(background job histories)
Retention periods are configurable in sys_config.
Enterprise High Availability & Operations¶
The platform is engineered for Active-Active High Availability with self-healing capabilities.
Hazelcast Clustering¶
Vert.x nodes discover each other (via multicast or explicit TCP/IP) and form a unified mesh. This enables:
- Distributed Event Bus: A trigger event generated on Node A can be consumed by Node B.
- Distributed Caching: The IAM Visibility Cache is shared across all cluster members.
If a node crashes, the remaining nodes instantly absorb its workload.
Self-Healing Watchdogs¶
Long-running jobs periodically write heartbeats. A watchdog sweeps the execution tables:
- If a job is
RUNNINGbut its heartbeat is older thanstaleJobTimeoutMinutes(default 30), it is forcefully killed and set toFAILED. - Orphaned queues (jobs stuck for 24+ hours) are cleaned up, allowing the engine to resume processing.
Zero-Downtime Hot Reloading¶
Configuration changes (e.g., toggling audit logging, adding a new CI class) are broadcast via the clustered Event Bus. All nodes intercept the message and silently reload their in-memory settings without dropping requests.
Cryptographic Licensing¶
The platform expects a JWT-signed license key. Every 60 seconds it validates:
- Maximum HA nodes
- Maximum CIs in PRODUCTION
- Number of allowed Tenants and Write-Users
- AI entitlement (enables/disables the NL2SQL endpoints)
A node attempting to join a fully licensed cluster is gracefully rejected.