Guide: MongoDB Security

MongoDB Monitoring: 12 Metrics You Can’t Ignore

What Is MongoDB Monitoring?

MongoDB is a distributed database for document-oriented data. Monitoring is crucial to keep track of MongoDB database health and identify security incidents. 

MongoDB provides reports and basic monitoring features as part of its Community Edition. However, when running MongoDB in production, these will typically not be enough, and you should leverage MongoDB Cloud Manager, MongoDB Ops Manager, or a third-party monitoring tool. Monitoring should be a central part of your MongoDB security strategy.

In this article:

MongoDB Monitoring Options

MongoDB provides a number of ways to collect data about the state of a running MongoDB instance.

MongoDB Free Monitoring

MongoDB (Community Edition) provides free cloud monitoring for both standalone and replica sets. Free monitoring provides metrics including operation execution times, operation counts, memory usage, and CPU utilization. Note that the data retention period is only 24 hours.

To enable free database monitoring during runtime, call this method:

db.enableFreeMonitoring()
To disable it, call this method:
 db.disableFreeMonitoring().
To view monitoring metrics, call this method:
db.getFreeMonitoringStatus()

Alternatively, when you run serverStatus or helper db.serverStatus(), and free monitoring is enabled, these commands will show metrics in their output.

Retrieving Stats via MongoDB Commands and Logs

The mongostat and mongotop commands provide performance data you can use for monitoring purposes. MongoDB also has features that display metric data—such as  db.ServerStatus(), db.stats(), and the replSetGetStatus admin command.

 

For these commands, you will either have to collect and manage the data yourself or find a third-party tool for historical monitoring and visualization. Another option is to collect and analyze MongoDB server logs, which contain information about system performance and historical query performance.

Production Monitoring

The above utilities and commands are typically not suitable for monitoring databases in production. You have the following options for setting up full-featured monitoring for production MongoDB instances:

 

  • Use MongoDB Cloud Manager, a cloud-based toolset that provides monitoring, backup, and process automation capabilities.
  • Use MongoDB Ops Manager, the on-premise version of Cloud Manager.
  • Use third-party MongoDB monitoring tools.

12 MongoDB Metrics to Watch

MongoDB supports several techniques for monitoring clusters. You can configure the granularity level of logs from simple to complex entries suitable for development or debugging sessions.

 

However, not all metrics are appropriate for production environments. Finding the most suitable metrics requires analyzing each use case on its own merits.

 

Generally speaking, you should incorporate the following basic metrics:

 

  • Health checks—for availability and uptime counters.
  • Current load and limits—check the current load against the specified limits (e.g., approaching a connection or storage limit).
  • Collection-level performance—check how long the server takes to process read/write queries for certain collections.
  • Server/shard performance—inspect the server daemon’s actual performance and the cluster’s components.

MongoDB Cluster’s Operations and Connection Metrics

Pay attention to the following basic metrics for operations and connection monitoring:

  1. Opcounters—view the average rate of operations executed per second over a given timeframe. See the opcounter graph to check the velocity and breakdown of various operations for a specific instance.
  2. Operation execution times—check the average read/write operation time over a given timeframe. 
  3. Query executors and query targeting—check the average per-second rate of scanned documents over a given timeframe of queries and query plan evaluation. Query targeting describes the ratio of documents scanned to documents returned. A high ratio might indicate non-optimal operations with more documents scanned than returned.
  4. Connections—the number of open connections to an instance. Higher values or spikes may indicate an inefficient client-side connection strategy or an unresponsive server.
  5. Queues—monitor the number of read and write operations waiting for locks. Long queues might indicate suboptimal schema designs or conflicting writing paths, which may compete for database resources.

MongoDB Hardware Metrics

Pay attention to the following metrics for hardware monitoring:

  1. Normalized system CPU—check how much time (as a percentage) the CPU spens on system calls for a MongoDB process. This metric scales the time to a range of 0–100%, dividing it by CPU cores. It tracks CPU usage by various modules, including user, steal, kernel, and iowait. High values for user or kernel CPU may indicate that MongoDB has exhausted CPU resources. High values for iowait usually indicate exhaustion of storage resources, contributing to CPU exhaustion.
  2. Disk latency—check the read/write disk latency within milliseconds of MongoDB partitioning a disk. High values (>500 ms) indicate that the storage layer may be impacting MongoDB.
  3. System memory—check the amount of physical memory used (in bytes) vs the amount of unused, available memory. The available memory bytes metric indicates the system memory available for running a new application.

MongoDB Replication Metrics

Pay attention to the following metrics for monitoring replication:

  1. Replication lag—check the approximate time lag (in seconds) of secondary nodes after the primary node in a write operation. A high replication lag indicates that the secondary is struggling to replicate. This might impact the operations’ latency, given the read/write reliance of the connections.
  2. Replication oplog window—check the approximate number of available hours in the primary node’s replication oplog. Secondary nodes that lag beyond a certain threshold cannot catch up and may require full resynchronization.
  3. Replication headroom—gauge the difference between the replication oplog window of a primary node and the replication lag of a secondary node. Secondaries can go into RECOVERING mode if this metric reaches zero.
  4. Opcounters-repl—check the average per-second rate of replication operations over a given timeframe. The opcounters-repl metric reveals each instance’s operational velocity and operation type breakdown.

MongoDB Security with Satori

To learn more about how Satori helps secure your cloud data, go here. MongoDB support is coming soon, and we’d love to understand your needs.