Server Monitoring Tools - Complete Guide
Published: September 25, 2024 | Reading time: 20 minutes
Server Monitoring Overview
Effective server monitoring ensures system reliability and performance:
Monitoring Benefits
# Key Benefits
- Proactive issue detection
- Performance optimization
- Capacity planning
- SLA compliance
- Security monitoring
- Cost optimization
- User experience
Prometheus Setup
Prometheus Installation
Prometheus Configuration
# Install Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvfz prometheus-2.45.0.linux-amd64.tar.gz
cd prometheus-2.45.0.linux-amd64
# Create systemd service
sudo tee /etc/systemd/system/prometheus.service > /dev/null <
Prometheus Configuration
prometheus.yml
# /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "rules/*.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['localhost:9100']
- job_name: 'nginx-exporter'
static_configs:
- targets: ['localhost:9113']
- job_name: 'mysql-exporter'
static_configs:
- targets: ['localhost:9104']
- job_name: 'redis-exporter'
static_configs:
- targets: ['localhost:9121']
- job_name: 'docker-exporter'
static_configs:
- targets: ['localhost:9323']
- job_name: 'blackbox-exporter'
static_configs:
- targets: ['localhost:9115']
# Alert rules
# /etc/prometheus/rules/alerts.yml
groups:
- name: system_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 80% for more than 5 minutes"
- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: "High memory usage detected"
description: "Memory usage is above 90% for more than 5 minutes"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 10
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space"
description: "Disk space is below 10%"
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service is down"
description: "Service {{ $labels.instance }} is down"
Grafana Setup
Grafana Installation
Grafana Configuration
# Install Grafana
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get install grafana
# Start Grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
# Configure Grafana
sudo nano /etc/grafana/grafana.ini
# Key settings:
[server]
http_port = 3000
domain = localhost
root_url = http://localhost:3000/
[security]
admin_user = admin
admin_password = your_secure_password
[users]
allow_sign_up = false
# Add Prometheus data source
curl -X POST \
http://admin:your_secure_password@localhost:3000/api/datasources \
-H 'Content-Type: application/json' \
-d '{
"name": "Prometheus",
"type": "prometheus",
"url": "http://localhost:9090",
"access": "proxy",
"isDefault": true
}'
# Import dashboard
curl -X POST \
http://admin:your_secure_password@localhost:3000/api/dashboards/db \
-H 'Content-Type: application/json' \
-d @node-exporter-dashboard.json
Node Exporter
System Metrics Collection
Node Exporter Setup
# Install Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz
tar xvfz node_exporter-1.6.1.linux-amd64.tar.gz
cd node_exporter-1.6.1.linux-amd64
# Create systemd service
sudo tee /etc/systemd/system/node_exporter.service > /dev/null < /dev/null <
ELK Stack
Elasticsearch Setup
Elasticsearch Configuration
# Install Elasticsearch
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt-get update
sudo apt-get install elasticsearch
# Configure Elasticsearch
sudo nano /etc/elasticsearch/elasticsearch.yml
# Key settings:
cluster.name: my-cluster
node.name: node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.type: single-node
xpack.security.enabled: false
# Start Elasticsearch
sudo systemctl start elasticsearch
sudo systemctl enable elasticsearch
# Test Elasticsearch
curl -X GET "localhost:9200/"
# Install Logstash
sudo apt-get install logstash
# Configure Logstash
sudo nano /etc/logstash/conf.d/logstash.conf
# Logstash configuration:
input {
beats {
port => 5044
}
}
filter {
if [fields][log_type] == "nginx" {
grok {
match => { "message" => "%{NGINXACCESS}" }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
if [fields][log_type] == "application" {
json {
source => "message"
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logs-%{+YYYY.MM.dd}"
}
}
# Start Logstash
sudo systemctl start logstash
sudo systemctl enable logstash
Kibana Setup
Kibana Configuration
# Install Kibana
sudo apt-get install kibana
# Configure Kibana
sudo nano /etc/kibana/kibana.yml
# Key settings:
server.port: 5601
server.host: "0.0.0.0"
elasticsearch.hosts: ["http://localhost:9200"]
logging.appenders.file.type: file
logging.appenders.file.fileName: /var/log/kibana/kibana.log
# Start Kibana
sudo systemctl start kibana
sudo systemctl enable kibana
# Install Filebeat
sudo apt-get install filebeat
# Configure Filebeat
sudo nano /etc/filebeat/filebeat.yml
# Filebeat configuration:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/*.log
fields:
log_type: nginx
fields_under_root: true
- type: log
enabled: true
paths:
- /var/log/myapp/*.log
fields:
log_type: application
fields_under_root: true
output.logstash:
hosts: ["localhost:5044"]
# Start Filebeat
sudo systemctl start filebeat
sudo systemctl enable filebeat
Application Monitoring
Custom Metrics
Node.js Application Metrics
# Install prom-client
npm install prom-client
# metrics.js
const client = require('prom-client');
// Create a Registry
const register = new client.Registry();
// Add default metrics
client.collectDefaultMetrics({ register });
// Create custom metrics
const httpRequestDuration = new client.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10]
});
const httpRequestTotal = new client.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code']
});
const activeConnections = new client.Gauge({
name: 'active_connections',
help: 'Number of active connections'
});
const databaseConnections = new client.Gauge({
name: 'database_connections',
help: 'Number of database connections',
labelNames: ['state']
});
// Register metrics
register.registerMetric(httpRequestDuration);
register.registerMetric(httpRequestTotal);
register.registerMetric(activeConnections);
register.registerMetric(databaseConnections);
// Express middleware
const express = require('express');
const app = express();
// Metrics middleware
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
const labels = {
method: req.method,
route: req.route ? req.route.path : req.path,
status_code: res.statusCode
};
httpRequestDuration.observe(labels, duration);
httpRequestTotal.inc(labels);
});
next();
});
// Metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
// Health check endpoint
app.get('/health', (req, res) => {
res.json({
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
memory: process.memoryUsage(),
version: process.version
});
});
module.exports = { register, httpRequestDuration, httpRequestTotal, activeConnections, databaseConnections };
Alerting
Alertmanager Setup
Alertmanager Configuration
# Install Alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
tar xvfz alertmanager-0.25.0.linux-amd64.tar.gz
cd alertmanager-0.25.0.linux-amd64
# Create systemd service
sudo tee /etc/systemd/system/alertmanager.service > /dev/null <
Monitoring Best Practices
Monitoring Strategy
Key Metrics
- CPU utilization
- Memory usage
- Disk I/O
- Network traffic
- Response times
- Error rates
- Throughput
Alerting Rules
- Set appropriate thresholds
- Use multiple alert levels
- Avoid alert fatigue
- Test alerting regularly
- Document alert procedures
- Use runbooks
- Escalation policies
Summary
Server monitoring involves several key components:
- Metrics Collection: Prometheus, Node Exporter
- Visualization: Grafana dashboards
- Logging: ELK stack, centralized logging
- Alerting: Alertmanager, notification channels
- Application Metrics: Custom metrics, health checks
- Best Practices: Key metrics, alerting rules
Need More Help?
Struggling with server monitoring setup or need help implementing comprehensive monitoring solutions? Our DevOps experts can help you set up robust monitoring systems.
Get Monitoring Help