{{indexmenu_n>6}} ====== ePDG Monitoring ====== ===== Integrated VoWiFi Gateway Monitoring System (ePDG) ===== ===== 1. Review of the decision ===== The VAS Experts ePDG Monitoring system provides full operational control of the **fast-epdg** component, the VoWiFi (Voice over WiFi) gateway operating according to 3GPP TS 29.273 and TS 24.302. The gateway provides secure transmission of voice and packet traffic through untrusted Wi-Fi channels with IPSec / IKEv2 tunneling and integration with the EPC core through SWu, SWm, SWx, S2b, S6b interfaces. The solution provides a single monitoring platform for the mobile operator’s operational services — from the IPSec SA (L3 security) level to the KPI of VoWiFi subscriber experience. ==== Key advantages ==== * ** Real-time monitoring** — update metrics every 10-15 seconds, directly display the status of IKE SA / Child SA and GTP tunnels in NOC dashboards without delayed aggregation (hereinafter NOC — Network Operation Center, network management center). * **Proactive detection of anomalies** — 20+ alarms with automatic escalation in importance. PGW/AAA inaccessibility, increased IKEv2 delays, and an increase in EAP-AKA errors are detected before subscribers notice problems with calls. * **Open integration interfaces** — Prometheus, SNMP v2c, Alertmanager webhooks, Grafana support. Integration into the existing NMS/OSS infrastructure without vendor binding. * ** Minimum external dependencies at the plugin level** — built-in ''/metrics'' endpoint in fast-epdg, without Java, without JMX, without external agents. * **Coverage of the entire SWu → S2b** stack — IKEv2 (SWu), Diameter SWm/SWx/S6b, GTPv2-C (S2b) and GTP-U data plane — all in one place. The 33 metrics cover control plane and data plane. ===== 2. Architecture of the monitoring system ===== flowchart TB subgraph DataPlane["Data Plane"] IPSEC["IPSec ESP
IKEv2 SA / Child SA
Kernel xfrm"] GTPU["GTP-U Tunneller
S2b Data
ePDG ↔ PGW"] end subgraph ControlPlane["Control Plane"] IKE["IKEv2 SWu
EAP-AKA' auth"] DIAM["Diameter Client
SWx/SWm/S6b"] GTPC["GTPv2-C S2b
to PGW/SMF"] CTRL["ePDG Controller
Attach/Detach FSM"] end subgraph Collection["Metrics Collection"] PROMEXP["fast-epdg
/metrics endpoint
:9817"] end subgraph Storage["Storage"] PROM["Prometheus
TSDB
15-day retention"] end subgraph Visualization["Visualization"] GRAF["Grafana
4 дашборда, 35+ панелей"] end subgraph Alerting["Alerting"] AM["Alertmanager
Routing / Inhibition"] EMAIL["Email SMTP"] SNMPGW["SNMP Trap Sender
Webhook → Trap gateway"] NMS["Внешняя NMS
SNMP v2c UDP/162"] WH["Webhooks
Telegram / PagerDuty"] end IKE --> PROMEXP IPSEC --> PROMEXP GTPC --> PROMEXP GTPU --> PROMEXP DIAM --> PROMEXP CTRL --> PROMEXP PROMEXP --> PROM PROM --> GRAF PROM --> AM AM --> EMAIL AM --> SNMPGW SNMPGW --> NMS AM --> WH
==== Four-level monitoring architecture ==== ^Level ^ Component ^ Technology ^ | **Collection** | Built-in ''/metrics'' endpoint fast-epdg | Prometheus text format over HTTP | | **Storage** | Prometheus TSDB | Local storage, 15-day storage by default | | **Visualization** | Grafana + JSON support | Autodownload 4 dashboards | | **Alerting** | Alertmanager + SNMP Trap Sender | PromQL rules → webhook → SNMP v2c trap | ===== 3. Components and indicators ===== ==== Monitoring coverage ==== flowchart LR EXP["fast-epdg
/metrics :9817"] EXP --> CFG["Config
2 metrics"] EXP --> NET["Network
1 metric"] EXP --> PROTO["Protocols L5-L7
15 metrics"] EXP --> SVC["Service KPI
4 metrics"] EXP --> SESS["Session State
4 metrics"] EXP --> APP["Application
3 metrics"] EXP --> SYS["System
4 metrics"] PROTO --> IKEV2["IKEv2
SWu — 3"] PROTO --> GTPC["GTPv2-C
S2b — 4"] PROTO --> GTPU["GTP-U
S2b data — 3"] PROTO --> DIA["Diameter
SWm/SWx/S6b — 5"]
==== Quantitative review by category ==== ^ Category ^ Number of metrics ^ Survey interval ^ Key indicators ^ | **Config** | 2 | 10 sec | Configuration status, reload counter | | **Network** | 1 | 10 sec | Node connection status (PGW/AAA/HSS) | | **IKEv2 (SWu)** | 3 | 10 sec | Reports by type (IKE_SA_INIT, IKE_AUTH, CREATE_CHILD_SA), delay diagram, errors | | **GTPv2-C (S2b)** | 4 | 10 sec | Messages (Create/Modify/Delete Session), delays, errors, relays | | **GTP-U data plane** | 3 | 10 sec | Packets/bytes, tunneling errors | | **Diameter (SWm/SWx/S6b)** | 5 | 10 sec | Command code messages (DER/DEA, MAR/MAA, AAR/AAA), delays, errors, watchdog, connection status | | **Service KPI** | 4 | 10 sec | Percentage of successful attempts, duration histogram, service availability, uptime | | **Session State** | 4 | 10 sec | IKE SA, Child SA, GTP sessions, all users | | **Application** | 3 | 10 sec | Number of streams, memory, log messages by levels | | **System** | 4 | 10 sec | CPU recycling, memory, memory disposal, open FD | | **Total** | **33 metrics** | | | ==== Naming principles ==== All metrics have the prefix ''epdg_'' and are organized in a hierarchy: epdg_ ├── config_* # Configuration ├── network_* # Network layer ├── ikev2_* # SWu (IKEv2/IPSec) ├── gtp_* # S2b control-plane GTPv2-C ├── gtpu_* # S2b data-plane GTP-U ├── diameter_* # SWm/SWx/S6b ├── service_* # Service KPIs (attach, availability, uptime) ├── session_* # Session Status (IKE SA, Child SA, GTP, subscribers) ├── app_* # App Metrics (memory, threads, logs) └── system_* # System metrics (CPU, disk, network) ===== 4. List of metrics ===== All metrics are exported through a single ''/metrics'' endpoint in Prometheus text format. The name follows the rules of Prometheus: ''epdg__[_unit]'', the Counter type has the suffix ''_total'', Histogram is the suffix ''_seconds''/''_bytes''. ==== 4.1 Config (2) ==== ^ Name ^ Type ^ Appointment ^ | ''epdg_config_status'' | Gauge | Component configuration status (0=error, 1=ok) | | ''epdg_config_reload_total'' | Counter | Configuration download counter (success/failure) | ==== 4.2 Network (1) ==== ^ Name ^ Type ^ Appointment ^ | ''epdg_network_connection_status'' | Gauge | TCP/UDP connection status to a node (0=down, 1=up) — applies to PGW (S2b), AAA (SWm), HSS (SWx) | ==== 4.3 IKEv2 SWu (3) ==== ^ Name ^ Type ^ Appointment ^ | ''epdg_ikev2_messages_total'' | Counter | IKEv2 Message Counter (IKE_SA_INIT / IKE_AUTH / CREATE_CHILD_SA / INFORMATIONAL) | | ''epdg_ikev2_request_duration_seconds'' | Histogram | IKEv2 response time | | ''epdg_ikev2_errors_total'' | Counter | IKEv2 errors (NO_PROPOSAL_CHOSEN, AUTHENTICATION_FAILED, INVALID_SYNTAX, etc.) | ==== 4.4 GTPv2-C S2b (4) ==== ^ Name ^ Type ^ Appointment ^ | ''epdg_gtp_messages_total'' | Counter | GTPv2-C (Create/Modify/Delete Session, Echo) | | ''epdg_gtp_request_duration_seconds'' | Histogram | Waiting time request → reply | | ''epdg_gtp_errors_total'' | Counter | GTP-C error by Cause Code | | ''epdg_gtp_retransmissions_total'' | Counter | Redirecting GTP-C requests | ==== 4.5 GTP-U data plane (3) ==== ^ Name ^ Type ^ Appointment ^ | ''epdg_gtpu_packets_total'' | Counter | Packages via GTP-U tunnel (uplink/downlink) | | ''epdg_gtpu_bytes_total'' | Counter | Bytes through GTP-U tunnel | | ''epdg_gtpu_errors_total'' | Counter | Tunneling errors (TEID mismatch, decap fail) | ==== 4.6 Diameter SWm/SWx/S6b (5) ==== ^ Name ^ Type ^ Appointment ^ | ''epdg_diameter_messages_total'' | Counter | DER/DEA (SWm), MAR/MAA (SWx), AAR/AAA (S6b), STR/STA| | ''epdg_diameter_request_duration_seconds'' | Histogram | Waiting time request → reply by Diameter | | ''epdg_diameter_errors_total'' | Counter | Errors by Experimental-Result-Code | | ''epdg_diameter_watchdog_status'' | Gauge | DWR/DWA watchdog status to node (0=timeout, 1=ok) | | ''epdg_diameter_connection_status'' | Gauge | Diameter connection status to node (0=disconnected, 1=connected) | ==== 4.7 Service KPI (4) ==== ^ Name ^ Type ^ Appointment ^ | ''epdg_service_attach_total'' | Counter | Attempts to connect (success/failure) via APN | | ''epdg_service_attach_duration_seconds'' | Histogram | Duration of connection (IKE_SA_INIT → session ready) | | ''epdg_service_availability'' | Gauge | Accessibility flag (0=down, 1=up) | | ''epdg_service_uptime_seconds'' | Gauge | Service availability time | ==== 4.8 Session State (4) ==== ^ Name ^ Type ^ Appointment ^ | ''epdg_session_ike_sa_total'' | Gauge | Active IKE SA | | ''epdg_session_child_sa_total'' | Gauge | Active Child SA (IPSec tunnels) | | ''epdg_session_gtp_sessions_total'' | Gauge | Active GTP-C sessions on S2b | | ''epdg_session_subscribers_total'' | Gauge | Unique subscribers (UE connected) | ==== 4.9 Application (3) ==== ^ Name ^ Type ^ Appointment ^ | ''epdg_app_threads_total'' | Gauge | Total number of work streams | | ''epdg_app_memory_bytes'' | Gauge | Process memory by type | | ''epdg_app_log_messages_total'' | Counter | Log messages by level (debug/info/warn/error/fatal) | ==== 4.10 System (4) ==== ^ Name ^ Type ^ Appointment ^ | ''epdg_system_cpu_usage_percent'' | Gauge | Download CPU | | ''epdg_system_memory_bytes'' | Gauge | System memory | | ''epdg_system_disk_bytes'' | Gauge | Disk space | | ''epdg_system_open_fds'' | Gauge | Open file descriptions | ==== Types of metrics (reminder) ==== ^ Type ^ Appointment ^ | **Counter** | Monotonically growing counter (messages, errors, reboots) | | **Gauge** | Current value (active sessions, memory, status) | | **Histogram** | Distribution of values with automatic slices over intervals (duration, lifetime) | ===== 5. Integration interfaces ===== flowchart LR CORE["VAS Experts
ePDG Monitoring"] CORE --> P["Prometheus
CNCF / OpenMetrics"] CORE --> S["SNMP v2c
EPDG-MIB"] CORE --> G["Grafana
JSON Provisioning"] CORE --> W["Webhooks
ChatOps"] CORE --> AM["Alertmanager
Routing"] P --> P1["Cloud-native NMS
Thanos / Cortex / Mimir"] S --> S1["Legacy NMS
HP OpenView, NetAct
IBM Tivoli"] G --> G1["NOC Wall Displays
Drill-down Analytics"] W --> W1["Telegram / Slack
PagerDuty / OpsGenie"] AM --> AM1["Smart routing
Severity-based"]
==== 5.1 Prometheus (CNCF Standard) ==== The native ''/metrics'' endpoint on port **9817** is built into fast-epdg. The format is standard text format Prometheus v0.0.4 (compatible with OpenMetrics). Aggregation is supported with the central Prometheus operator; remote_write team support for long-term storage in Thanos, Cortex, Grafana Mimir. ==== 5.2 SNMP v2c — EPDG-MIB ==== **47 OID** covers the Prometheus metric + **14 trap notifications** (with raise/clear pairs according to RFC 3877 ALARM-MIB). Compatible with HP OpenView, IBM Tivoli NetCool, Nokia NetAct, Huawei U2000. flowchart TB IANA["IANA PEN
enterprises
.1.3.6.1.4.1"] VAS["VAS Experts
.1.3.6.1.4.1.43823
(vas.expert)"] EPDG["EPDG-MIB
.43823.1"] EPC["EPC Monitoring
.43823.100"] IANA --> VAS VAS --> EPDG VAS --> EPC EPDG --> OBJ["epdgObjects
.43823.1.1"] EPDG --> NOTIF["epdgNotifications
.43823.1.2
14 trap types"] EPDG --> CONF["epdgConformance
.43823.1.3"] OBJ --> SERVICE["service .1.1.1
4 OID"] OBJ --> IKE["ikev2 .1.1.2
6 OID"] OBJ --> GTP["gtp .1.1.3
8 OID"] OBJ --> DIAM["diameter .1.1.4
7 OID"] OBJ --> SESS["sessions .1.1.5
8 OID"] OBJ --> SYS["system .1.1.6
8 OID"] OBJ --> NET["network .1.1.7
6 OID"] NOTIF --> TRAPAGR["7 raise / 7 clear
pairs"]
Examples of SNMP requests: # The entire ePDG tree snmpwalk -v2c -c public .1.3.6.1.4.1.43823.1 # Service availability (Gauge 0..1) snmpget -v2c -c public .1.3.6.1.4.1.43823.1.1.0 ==== 5.3 Grafana ==== **4 JSON dashboard support** (35+ panels total): * **ePDG Overview** — availability, KPI connections, sessions, state of interfaces * **IKEv2 Details** — Messages, Performance, Errors, IKE SA Lifecycle * **GTP Details** — GTPv2-C + GTP-U data on PGW nodes * **Diameter Details** — Application messages, delays, watchdog Automatic installation through an API that supports Grafana. Adaptive design for Network Control Center (NOC) status monitors with auto-update every 15 seconds. ==== 5.4 Alertmanager Webhooks ==== Webhook interface for integration with any notification system: Telegram Bot, Slack, PagerDuty Events API v2, OpsGenie, Microsoft Teams. A separate **SNMP Trap Sender** service converts Alertmanager webhooks to SNMP v2c traps with Enterprise OID. ===== 6. The alarm system ===== ==== Alarm categories ==== ^ Criticism ^ Alarma ^ Description ^ Reaction ^ | **Critical** | ''ePDG_Service_Down'', ''ePDG_High_Attach_Failure_Rate'', ''ePDG_PGW_Unreachable'', ''ePDG_AAA_Unreachable'', ''ePDG_Diameter_Watchdog_Timeout'' | Component is unavailable, widespread connection failures, nodes are unavailable | Immediate escalation: Email + SNMP Trap + Webhook. Repeat every hour | | **Warning** | ''ePDG_High_IKEv2_Latency'', ''ePDG_High_GTP_Latency'', ''ePDG_High_IKEv2_Error_Rate'', ''ePDG_High_GTP_Error_Rate'', ''ePDG_High_Memory_Usage'', ''ePDG_High_CPU_Usage'', ''ePDG_Low_Disk_Space'', ''ePDG_High_Error_Log_Rate'' | Performance degradation, resource anomalies | Email. Resend every 4 hours. Suppressed if a “Critical” status is present on the same component | ==== Complete list of alarms (20+ rules) ==== flowchart LR AL["ePDG Alert Rules
20+"] AL --> CR["Critical
5 rules"] AL --> WR["Warning
8 rules"] AL --> INFO["Recording
34 rules"] CR --> C1["Service_Down
availability == 0"] CR --> C2["Attach_Failure_Rate
> 10%"] CR --> C3["PGW_Unreachable
connection_status{s2b} == 0"] CR --> C4["AAA_Unreachable
connection_status{swm} == 0"] CR --> C5["Diameter_Watchdog_Timeout
watchdog_status == 0"] WR --> W1["High_IKEv2_Latency
p95 > 1.0 s"] WR --> W2["High_GTP_Latency
p95 > 0.5 s"] WR --> W3["High_IKEv2_Error_Rate
> 5%"] WR --> W4["High_GTP_Error_Rate
> 5%"] WR --> W5["High_Memory_Usage
> 80%"] WR --> W6["High_CPU_Usage
> 80%"] WR --> W7["Low_Disk_Space
< 10%"] WR --> W8["High_Error_Log_Rate
> 10/s"] INFO --> I1["attach_success_rate
preaggregated"] INFO --> I2["p95_p99_latency
preaggregated"] INFO --> I3["throughput
preaggregated"]
==== Alarm treatment process ==== sequenceDiagram participant M as Метрика (Prometheus) participant R as Alert Rule (PromQL) participant AM as Alertmanager participant E as Email (SMTP) participant SG as SNMP Trap Gateway participant NMS as Внешняя NMS participant W as Webhook (ChatOps) M->>R: The value exceeds the threshold R->>R: Waiting (for: 1-10 мин) R->>AM: Alert FIRING AM->>AM: Group by [alertname, component] AM->>AM: Inhibition check (critical overrides warning) alt severity = critical AM->>E: Email [CRITICAL] AM->>SG: Webhook → SNMP Trap SG->>NMS: SNMP v2c Trap (OID .1.3.6.1.4.1.43823.1.2.X) AM->>W: Webhook (Telegram / PagerDuty) else severity = warning AM->>E: Email [WARNING] end Note over M,R: The metric is returning to normal R->>AM: Alert RESOLVED R->>SG: clear-trap (paired notification) AM->>E: Email [RESOLVED] ==== Features ==== * **Inhibition**: Critical alarms automatically suppress Warning for the same component * **Grouping**: Alarms are grouped into ''alertname'' + ''component'' with a 30-second window * **Dead time / Hysteresis**: 1 to 10 minutes ''for'' prevents false positives * **Trap pairing**: raise/clear simultaneous events for compliance with RFC 3877 ALARM-MIB ===== 7. Visualization and operational dashboards ===== ==== Composition of dashboards ==== ^ Dashboard ^ Panel ^ Purpose ^ | **ePDG Overview** | 10 | Service availability, connection success rate, number of active sessions, SWu/SWm/S2b status, interface bandwidth | | **IKEv2 Details** | 10 | Mes per second by type, histogram of request duration, delay in the 95th percentile, error by type, IKE SA life cycle | | **GTP Details** | 8 | GTPv2-C PGW messages, retransmissions, cause code errors, GTP-U (uplink/downlink) carriers | | **Diameter Details** | 7 | Number of application messages (SWm/SWx/S6b), duration of requests, state of watchdog timer, distribution of result codes, chronology of connection states | ==== Design for Network Management Center (NOC) ==== flowchart TB NOC["NOC Dashboard Layer"] NOC --> OVER["ePDG Overview
KPI Summary"] NOC --> IKE["IKEv2 Details
Drill-down"] NOC --> GTP["GTP Details
Drill-down"] NOC --> DIA["Diameter Details
Drill-down"] OVER -->|Click attach KPI| IKE OVER -->|Click session count| GTP OVER -->|Click peer status| DIA
* **Auto Update**: 15-second update period * **Adaptive color scheme**: green → yellow → red by threshold values * **Drill-down**: from Overview to Detail to Component * **Time-range selector**: 5 minutes to 30 days of history * **JSON provisioning**: dashboards are automatically deployed ===== 8. Integration into a single EPC Monitoring stack ===== ePDG monitoring is fully integrated into overall packet core monitoring: flowchart TB subgraph Common["Unified Monitoring Stack"] PROM["Prometheus"] GRAF["Grafana"] AM["Alertmanager"] end subgraph Sources["Sources of EPC metrics"] DPI["FastDPI
:9110"] SMF["SMF /metrics
:9090"] PCEF["fast-pcef /metrics
:9090"] PCRF["FastPCRF"] EPDG["fast-epdg
:9817"] end DPI --> PROM SMF --> PROM PCEF --> PROM PCRF --> PROM EPDG --> PROM PROM --> GRAF PROM --> AM
The NOC operator sees **all EPC components** (DPI, SMF, PCEF, FastPCRF, ePDG) in a single Grafana interface, with a single alarm system and notification routing through one Alertmanager. ===== 9. Coverage of metrics by OSI levels ===== graph LR L1["L1 Physical
NIC counters via system"] L2["L2 Data Link
MAC, VLAN"] L3["L3 Network
IP, IPSec ESP, GTP-U"] L4["L4 Transport
TCP/UDP/SCTP"] L5["L5 Session
GTPv2-C, IKEv2"] L6["L6 Presentation
IKEv2/IPSec encryption, EAP-AKA'"] L7["L7 Application
Diameter, service bearer ops"] Operations["Operations
KPI, SLA, Capacity"] CX["CX Level
Subscriber Experience"] L1 --> L2 --> L3 --> L4 --> L5 --> L6 --> L7 --> Operations --> CX style L1 fill:#e74c3c,color:#fff style L2 fill:#e67e22,color:#fff style L3 fill:#f39c12,color:#fff style L4 fill:#2ecc71,color:#fff style L5 fill:#1abc9c,color:#fff style L6 fill:#3498db,color:#fff style L7 fill:#9b59b6,color:#fff style Operations fill:#34495e,color:#fff style CX fill:#2c3e50,color:#fff
==== Detailing metrics by level ==== OSI model: ^ Level ^ Metrics ^ Examples ^ | **L1/L2 Physical / Data Link** | - | Covered by a separate node_exporter or equivalent at the OS level (not included in the ePDG metrics list) | | **L3 Network / IPSec tunnels** | 3 | ''epdg_gtpu_packets_total'', ''epdg_gtpu_bytes_total'', ''epdg_gtpu_errors_total'' — GTP-U data plane | | **L4 Transport** | 1 | ''epdg_network_connection_status'' — TCP connections to nodes (PGW/AAA/HSS) | | **L5 Session** | 3 | ''epdg_session_ike_sa_total'', ''epdg_session_child_sa_total'', ''epdg_session_gtp_sessions_total'' | | **L6 Presentation/Security** | 3 | ''epdg_ikev2_messages_total'', ''epdg_ikev2_request_duration_seconds'', ''epdg_ikev2_errors_total'' — IKEv2/IPSec encryption and EAP-AKA authentication | | **L7 Application** | 9 | ''epdg_diameter_*'' (SWm/SWx/S6b, 5 metrics), ''epdg_gtp_*'' (GTPv2-C, 4 metrics) | Operator level: ^ Level ^ Metrics ^ Examples ^ | **Operations** | 11 | ''epdg_service_availability'', ''epdg_service_uptime_seconds'', ''epdg_app_*'' (3), ''epdg_system_*'' (4), ''epdg_config_*'' (2) | | **Customer Experience** | 3 | ''epdg_service_attach_duration_seconds'' p95, ''epdg_service_attach_total'' (success rate), ''epdg_ikev2_request_duration_seconds'' p99 | ==== Level 9: Quality of VoWiFi service perception ==== ^ QoE indicator ^ Source metrics ^ Interpretation ^ | **VoWiFi connection time** | ''epdg_service_attach_duration_seconds'' p95 | > 3 seconds — subscriber notices delay when switching to WiFi | | **Continuity of service** | ''epdg_session_ike_sa_total'' delta | Mass discharge > 50 IKE SA = accessibility issue | | **Authentication success** | ''ePDG_High_Attach_Failure_Rate'' alert rate | > 5% = HSS/AAA node problem | | **Delayed appointment bearer** | ''epdg_gtp_request_duration_seconds{msg=create-session}'' p99 | > 500 ms — delayed availability of voice channel | | **GTP-U tunnel** | ''epdg_gtpu_errors_total'' rate / ''epdg_gtpu_packets_total'' | > 0.1% = degradation of voice quality | | **IKEv2-reliability** | ''epdg_ikev2_errors_total'' by type | NO_PROPOSAL_CHOSEN / AUTHENTICATION_FAILED — problems with certs / UE | ===== 10. Standards and compatibility ===== ^ Standard ^ Area ^ Application ^ | **3GPP TS 29.273** | SWx/S6b/SWm | Methodology for accounting for Diameter messages and resulting codes | | **3GPP TS 24.302** | SWu (IKEv2) | Definition of IKEv2 message types and error codes | | **3GPP TS 33.402** | 3GPP security for non-3GPP access | EAP-AKA'/IKEv2 security parameters | | **3GPP TS 23.402** | Non-3GPP access architecture | Interface Structure (SWu/SWm/SWx/S6b/S2b) | | **3GPP TS 32.421** | Performance Measurement | Collection methodology KPI | | **3GPP TS 32.409** | Performance measurement charging | Counter structure | | **IETF RFC 7296** | IKEv2 | Message types, error notifications, state SA | | **IETF RFC 6733** | Diameter | Command codes, Result-Codes | | **IETF RFC 4187** | EAP-AKA | Authentication via SIM | | **IETF RFC 3877** | ALARM MIB | Enterprise MIB structure for alarms | | **IETF RFC 3418** | SNMPv2 MIB | SNMP v2c compatibility | | **Prometheus Exposition Format** | Metrics (v0.0.4) | Export metric format | | **OpenMetrics** | CNCF Standard | Prospective compatibility | ===== 11. The deployment model ===== flowchart TB subgraph Host1["ePDG Server"] EPDG["fast-epdg
(VoWiFi gateway)"] PLUGIN["/metrics endpoint
:9817"] EPDG -.-> PLUGIN end subgraph Host2["Monitoring server"] PROM["Prometheus"] GRAF["Grafana"] AM["Alertmanager"] SNMPTRAP["SNMP Trap Sender
(webhook gateway)"] PROM --> GRAF PROM --> AM AM --> SNMPTRAP end subgraph Host3["External systems"] NMS["Операторская NMS
(HP OpenView /
NetAct / Tivoli)"] CHAT["ChatOps
(Telegram / PagerDuty)"] end PLUGIN -->|HTTP :9817/metrics| PROM SNMPTRAP -->|UDP 162| NMS AM -->|Webhook| CHAT
==== Deployment characteristics ==== ^ Parameter ^ Value ^ | **Metrics footprint** | Integrated (~2 MB memory overhead) | | **External dependencies** | The self-contained ''fast-epdg'' package (rpm) | | **Management** | ''fast-epdg.service'' systemd | | **Configuration** | The ''monitoring'' section in ''fast-epdg.conf'' | | **Update** | Updating the configuration without interrupting operations | | **OS** | Linux (RHEL/CentOS 8+, Ubuntu 22.04+) | | **Port** | 9817 TCP (listening on 0.0.0.0, configurable) | | **Deployment time** | < 5 minutes (enable the plugin in the config file + restart) | ==== Accommodation options ==== * **On-premise** — the plugin runs in the fast-epdg address space, zero resource consumption * **Co-located Prometheus** — Prometheus collects metrics from an application running on the same host * **Centralized** — a single Prometheus collects from all ePDG nodes ===== 12. Metric exporter configuration ===== The ''monitoring'' section in ''fast-epdg.conf'': monitoring { enabled = yes listen_port = 9817 listen_address = 0.0.0.0 update_interval = 10 metrics { ikev2 = yes gtp = yes diameter = yes service = yes session = yes app = yes system = yes } } Each group of metrics can be independently turned on/off without recompilation.