Chapter 7 Component architecture

7.1 Principles of modular architecture

The Enterprise Digital Twin (EDT) platform builds on the principles of modular architecture, which gives it flexibility, scalability, and the ability to roll out in stages. The modular approach lets you tailor the system to a specific enterprise by connecting only the components you need.

Key principles of modularity:

1. Module independence

Each module is a self-contained functional unit with a clearly defined interaction interface. Teams can develop, test, and update modules independently of one another, which lowers the risk when you introduce changes.

2. Loose coupling

Modules talk to each other through standardized APIs, which minimizes the dependencies between components. Changing the internal implementation of one module does not force you to modify the others, as long as the interface stays the same.

3. High cohesion

Each module concentrates logically related functionality in one place. The forecasting module, for example, contains everything you need to build and run forecast models.

4. Reusability

We design modules as general-purpose components that work across different projects and industries. The balance-calculation module, for example, serves both the chemical industry and the agro-industrial sector.

5. Scalability

Modular architecture lets you scale individual components horizontally based on load. You can deploy compute-intensive modules on dedicated, high-performance servers.

Standard platform modules:

Data management module — collects, normalizes, stores, and serves access to data;
modeling module — builds structural-technological schemes and balance models;
optimization module — solves linear and nonlinear programming problems;
forecasting module — time series, machine learning, scenario analysis;
visualization module — dashboards, reports, interactive charts;
integration module — REST API, ETL processes, connectors to external systems;
administration module — manages users, access rights, and configuration.

Modular architecture supports a staged rollout: the customer can start with the basic modules (data management, visualization) and gradually connect advanced functionality (optimization, machine learning) as readiness and need grow.

7.2 Technical components of the system

The component architecture of the EDT rests on a microservice approach, where each component owns its own area of functionality. The software runs on open-source R libraries and includes high-performance computing modules written in Java and C++.

Figure 37 — Component architecture of the EDT

The microservice architecture delivers scalability and flexibility, letting you add or remove components on the fly without disrupting the other parts of the system.

7.3 Data processing stages

7.3.1 Data ingestion and processing stage

ETL (Extract, Transform, Load): ETL technology handles collecting data from various sources, transforming it, and loading it into the central repository. Data can come from the enterprise’s internal systems as well as from external sources (for example, Rosstat, the Federal Tax Service, APIs).

API integration: Lets the system pull data in real time from various sources, which keeps calculations and analysis current.

Data normalization: The system normalizes data automatically, bringing it to a single standard and removing duplicates, gaps, and format errors.

In practice, the system combines open and closed data sources: Rosstat, the Federal Tax Service, the Federal Customs Service, the FGIS “Zerno” grain system, exchange data (FOREX, CME, CBOT), the WTO, and socio-economic development data for countries worldwide. The figure below shows the data collection pipeline for a deployment in the agro-industrial sector.

Figure 38 — Data sources and collection pipeline: open and closed sources

7.3.2 Calculation and analytics stage

Java and C++: High-performance calculations run on Java and C++ modules, which lets the system process large volumes of data with minimal latency.

R (Posit): Supports the complex statistical and mathematical models needed for scenario modeling, factor analysis, and forecasting.

Targets: Organizes calculation nodes and manages their dependencies. This technology tracks whether data and calculations are current, supports versioning of calculation models, and enables repeated parallel computations.

Balance models: The system supports intersectoral and interterritorial balance models, as well as a resource-balance model, which serve to assess the enterprise’s resource needs and allocate them optimally.

Take wheat production as an example: the technological scheme covers the full cycle — storing imported products, growing and harvesting the crop, sales, storage, livestock feeding, export, and processing. Each block ties to physical, cost, and forecast indicators.

Figure 39 — Technological scheme of wheat production with resource flows

7.4 Data layer

7.4.1 PostgreSQL — relational DBMS

Purpose: Stores transactional data and system metadata.

Stored data:

Reference books and classifiers
User data and access rights
System configuration
Transactional data
Model metadata

Characteristics:

Version: PostgreSQL 14+
Replication: Master-Slave
Backup: Daily
Performance: Up to 10,000 TPS

7.4.2 ClickHouse — analytical DBMS

Purpose: Stores large volumes of analytical data.

Stored data:

Time series (production, sales, prices)
Aggregated indicators
System logs and events
Calculation results

Characteristics:

Version: ClickHouse 22+
Data compression: Up to 10x
Query speed: Millions of rows/sec
Partitioning: By time

7.4.3 File storage

Purpose: Stores files and documents.

File types:

Reports (PDF, Excel)
Images and schemes
Backups
Imported files

Technologies:

MinIO (S3-compatible storage)
File versioning
Data encryption

7.5 Integration layer

7.5.1 ETL module

Purpose: Extracts, transforms, and loads data.

Components:

1. Connectors:

REST API clients
ODBC/JDBC drivers
File parsers (Excel, CSV, XML)
Web scraping

2. Transformers:

Data normalization
Validation and cleaning
Data enrichment
Aggregation

3. Loaders:

Batch loading
Streaming loading
Incremental loading

4. Technologies:

Apache NiFi for visual flow design
Python for custom transformations
Apache Kafka for stream processing

7.5.2 API Gateway

Purpose: A single entry point for all API requests.

Functions:

Authentication and authorization
Load balancing
Rate limiting
Request logging
Routing to microservices

Technologies:

Kong / NGINX
OAuth 2.0 / JWT
Redis for caching

7.6 Business logic layer

7.6.1 Analytical module (R/Posit)

Purpose: Statistical analysis and modeling.

Capabilities:

Descriptive statistics
Time series (forecast, prophet)
Correlation analysis
Regression analysis
Cluster analysis

Libraries:

tidyverse — data processing
forecast — forecasting
targets — computation management
shiny — interactive applications
ggplot2 — visualization

Performance:

Parallel computing
Result caching
Incremental calculations

7.6.2 Machine learning and optimization module (Python)

Purpose: Machine learning and optimization.

The platform organizes the use of artificial intelligence technologies in analytical calculations into 5 stages: preprocessing and building a single resource-balance model (NLP/BERT, Anomaly Detection, AutoML), fine-tuning the model (LSTM/GRU, XGBoost/CatBoost, Graph Neural Networks), forecasting (Meta-Learning/Stacking, Bayesian Inference), scenarios and risks (Reinforcement Learning, VAE/GAN, Monte Carlo), and decision optimization (SHAP/LIME, Real-time Monitoring).

Figure 40 — Use of AI technologies for analytical calculations: 5 stages from preprocessing to optimization

Capabilities:

Machine learning:

Supervised learning — regression, classification
Unsupervised learning — clustering
Time series forecasting
Deep learning (optional)

Optimization:

Linear programming
Integer programming
Nonlinear optimization
Genetic algorithms

Libraries:

scikit-learn — ML algorithms
xgboost, lightgbm — gradient boosting
scipy.optimize — optimization
pulp, pyomo — mathematical programming
pandas, numpy — data processing

7.6.3 Computing module (Java/C++)

Purpose: High-performance computing.

Use cases:

Simulations of production processes
Resource-balance calculations
Processing large volumes of data
Time-critical calculations

Technologies:

Java 17+ for business logic
C++ for compute-heavy tasks
Apache Spark for distributed computing

7.6.4 Calculation orchestration service

Purpose: Manages the sequence of calculations.

Functions:

Task scheduling
Dependency management
Parallel execution
Progress monitoring
Error handling

Technologies:

Apache Airflow
Celery for asynchronous tasks
Redis for queues

7.7 API layer

7.7.1 REST API services

Purpose: Exposes functionality through an HTTP API.

The EDT platform provides a full-featured REST API for integrating with external systems, automating processes, and accessing data and calculations programmatically. The API follows RESTful architecture principles and supports the standard HTTP methods (GET, POST, PUT, DELETE).

API documentation:

The full interactive API documentation is available at: https://api.dtwin.ru/docs/

The documentation includes:

A description of every endpoint with sample requests and responses;
an interactive console for testing the API;
data schemas and object models;
error codes and how to handle them;
integration examples in various programming languages.

Main endpoint groups:

# Data and reference books
GET    /api/v1/data/documents          # Document list
GET    /api/v1/data/kpi                # KPI indicators
POST   /api/v1/data/import             # Data import
GET    /api/v1/data/dictionaries       # Reference books

# Analytics and calculations
POST   /api/v1/analytics/forecast      # Forecasting
POST   /api/v1/analytics/optimize      # Optimization
GET    /api/v1/analytics/results       # Calculation results
POST   /api/v1/analytics/scenarios     # Scenario modeling

# External factors
GET    /api/v1/factors/list            # External factor list
POST   /api/v1/factors/update          # Factor updates
GET    /api/v1/factors/history         # Change history

# Monitoring and visualization
GET    /api/v1/monitor/dashboard       # Dashboard data
GET    /api/v1/monitor/alerts          # Alerts
GET    /api/v1/monitor/sts             # STS data

# Administration
GET    /api/v1/admin/users             # Users
POST   /api/v1/admin/config            # Configuration
GET    /api/v1/admin/logs              # System logs

System of monitored external factors:

The EDT platform includes a subsystem that manages the external factors affecting the enterprise’s production and economic indicators. The system lets you:

Track changes in macroeconomic indicators (exchange rates, raw material prices, indices);
automatically update data from external sources through APIs;
assess how external factors affect the enterprise’s key indicators;
run scenario analysis when external conditions change;
generate early warnings about critical changes.

External factors feed into the calculation models and automatically factor into forecasting and optimization.

Technologies and standards:

Node.js / Express for the API server;
Swagger/OpenAPI 3.0 for documentation;
JWT (JSON Web Tokens) for authentication;
OAuth 2.0 for authorizing external applications;
Rate limiting to guard against overload;
CORS for cross-domain requests.

7.7.2 WebSocket service

Purpose: Two-way real-time communication.

Use cases:

Dashboard updates
Push notifications
Streaming data
Support chat

Technologies:

Socket.io
Redis Pub/Sub

7.7.3 GraphQL API (optional)

Purpose: Flexible data queries.

Advantages:

Request only the fields you need
One request instead of many
Typed schema
Traffic optimization

7.8 Presentation layer

7.8.1 Web application

Technologies:

Frontend Framework:

React 18+ with TypeScript
Redux for state management
React Router for navigation

UI Components:

Material-UI for components
Styled Components for styles
Formik for forms

Visualization:

D3.js for custom charts
Plotly.js for interactive charts
Recharts for simple diagrams

Figure 41 — Dynamic graph of production flows: visualizing the enterprise’s resource connections

Build Tools:

Vite for fast builds
ESLint for linting
Prettier for formatting

7.8.2 Administrative panel

Functions:

User management
Access rights setup
System configuration
Performance monitoring
Log viewing

Technologies:

React Admin
Material-UI

7.9 Platform architecture

The platform architecture includes two main contours:

1. City digital twin (external factor analysis, macroeconomic modeling); 2. Enterprise digital twin (production balances, optimization).

Figure 42 — Platform architecture: City digital twin and Enterprise digital twin

The technology stack of the Unified Digital Platform includes subsystems for authentication (Nginx, Blitz Identity Provider), administration (RabbitMQ, Apache Tomcat, Spring Boot), data collection (Vue.js, PHP, PostgreSQL), analytical calculations (C++, jQuery, a proprietary-format database), integration (GosJava), and storage (Angular, Golang, PostgreSQL, Redis, Nginx).

Figure 43 — Technology stack of the Unified Digital Platform

7.10 Infrastructure components

7.10.1 Monitoring and logging

Prometheus + Grafana:

Performance metrics
Resource usage
API response time
Custom metrics

ELK Stack (Elasticsearch, Logstash, Kibana):

Centralized logging
Log search
Event visualization
Error analysis

7.10.2 Caching

Redis:

API response cache
User sessions
Temporary data
Task queues

Memcached:

Calculation result cache
Database query cache

7.10.3 Load balancing

NGINX:

Load balancing
SSL termination
Reverse proxy
Static files

HAProxy (optional):

TCP/HTTP load balancing
Health checks
Statistics

7.10.4 Containerization

Docker:

Containers for all services
Dependency isolation
Fast deployment
Image versioning

Docker Compose:

Local development orchestration
Declarative configuration

Kubernetes (optional):

Production orchestration
Auto-scaling
Rolling updates
Self-healing

7.11 Security

7.11.1 Authentication and authorization

OAuth 2.0 / OpenID Connect:

Single sign-on (SSO)
Access tokens
Refresh tokens

JWT (JSON Web Tokens):

Stateless authentication
Claims for access rights
Token expiration

RBAC (Role-Based Access Control):

User roles
Access rights
Role hierarchy

7.11.2 Encryption

TLS/SSL:

HTTPS for all connections
Let’s Encrypt certificates

Data encryption:

Database encryption (at rest)
Encryption of sensitive fields
Key management

7.11.3 Audit

Action logging:

All user actions
Data changes
Configuration changes
Unauthorized access attempts

7.12 Deployment and CI/CD

7.12.1 Continuous Integration

GitLab CI:

Automatic builds
Test runs
Code analysis
Docker image builds

Tests:

Unit tests (Jest, pytest)
Integration tests
E2E tests (Playwright)
Performance tests

7.12.2 Continuous Deployment

Deployment strategies:

Blue-Green deployment
Canary releases
Rolling updates

Environments:

Development
Testing
Staging
Production

7.13 Scaling

7.13.1 Horizontal scaling

Stateless services:

Adding API instances
Load balancing
Auto-scaling

Databases:

Read replicas for PostgreSQL
Sharding for ClickHouse
Redis cluster

7.13.2 Vertical scaling

Increasing resources:

CPU for computation
RAM for caching
Storage for data
Network for traffic