Linux Performance Analysis Guide (v2)

1. Introduction

This guide provides a comprehensive framework for diagnosing performance issues on a Linux server. By systematically analyzing the four core pillars of system performance—CPU, Memory, Disk I/O, and Network—you can effectively identify and resolve bottlenecks. This version includes specific guidance for web servers like NGINX.

2. Quick Command Reference

Category	Command	Purpose
CPU	`top` / `htop`	View live process activity and system load.
	`mpstat -P ALL 1`	Check utilization for each individual CPU core.
	`uptime`	See system load averages (1, 5, 15 min).
Memory	`free -h`	View used, free, and cached memory.
	`vmstat 1 5`	See memory usage, swap activity, and I/O wait.
Disk I/O	`iostat -xz 1 5`	Analyze disk activity, utilization (`%util`), and latency (`await`).
	`iotop`	Identify which processes are causing high disk I/O.
	`df -h`	Check disk space usage.
Network	`iftop` / `nload`	Monitor real-time network bandwidth usage.
	`netstat -i`	Check for network interface errors or dropped packets.
	`ss -s`	Get a summary of socket statistics (connections).
Combined	`dstat -cdngy 1 5`	Get a unified view of CPU, disk, network, and memory stats.

3. Deep Dive Analysis

3.1. CPU Analysis

Goal: Determine if the CPU is overloaded or waiting excessively.
Key Command: mpstat -P ALL 1
What to Look For:
- %usr: High user time often points to a specific application consuming CPU.
- %sys: High system time can indicate kernel-level issues, driver problems, or heavy I/O management.
- %iowait: Crucial metric. High values mean the CPU is idle but waiting for slow disk or network I/O to complete. This is often a disk problem, not a CPU problem.
- %idle: If this is low across all cores, your CPU is a bottleneck.
- Uneven Core Usage: If one core is at 100% while others are idle, it suggests a single-threaded application is the bottleneck.

3.2. Memory Analysis

Goal: Check for memory exhaustion or excessive swapping.
Key Command: free -h and vmstat 1
What to Look For:
- available memory (free -h): This is the most important number. If it's very low, the system is under memory pressure.
- swap used (free -h): Any significant swap usage indicates the system has run out of physical RAM and is using the much slower disk as memory. This is a major performance killer.
- si/so columns (vmstat): Non-zero values for swap-in (si) or swap-out (so) confirm that the system is actively swapping.

3.3. Disk I/O Analysis

Goal: Find slow storage or processes causing heavy disk activity.
Key Command: iostat -xz 1
What to Look For:
- %util: If this approaches 100%, the disk is saturated and cannot handle more requests. This is a clear bottleneck.
- await: The average time (in milliseconds) for a request to complete.
  - Good: < 20ms (SSD), < 100ms (HDD)
  - Bad: Consistently high values mean storage is slow.
- r/s and w/s: Use these to identify if the workload is read-heavy or write-heavy. Use iotop to see which process is responsible.

3.4. Network Analysis

Goal: Identify saturated links, packet loss, or connectivity issues.
Key Command: netstat -i and iftop
What to Look For:
- RX-ERR / TX-ERR / drop (netstat -i): These columns should be 0. Any value greater than zero indicates problems with the network hardware, cables, or drivers.
- Bandwidth Saturation (iftop): Check if traffic is hitting the limit of your network interface (e.g., 1 Gbit/s, 10 Gbit/s).

4. Specific Guide: Analyzing NGINX Performance

When your web server is slow, apply the general analysis above, but also check these NGINX-specific areas.

Check NGINX Worker Processes:
- Use ps -ef | grep nginx to see the worker processes.
- Use top or htop and filter for "nginx". Are any workers stuck at 100% CPU? This could indicate a bad configuration (e.g., a regex loop) or a single-threaded bottleneck in a backend service.
Analyze Disk I/O for Logs and Cache:
- NGINX writes access.log and error.log for every request. If traffic is high, this can cause a disk I/O bottleneck.
- Action: Use iotop to see if NGINX processes are causing high write I/O. Consider disabling access logs for static assets or using buffered logging.
- If you use NGINX caching, ensure the cache directory is on a fast disk (SSD is ideal).
Check File Descriptors:
- NGINX uses a file descriptor for each client connection and each connection to an upstream server.
- Symptom: "Too many open files" in error.log.
- Action: Check the system-wide limit with ulimit -n and the NGINX worker limit with cat /proc/$(pgrep -f "nginx: worker")/limits. You may need to increase the worker_rlimit_nofile directive in nginx.conf.
Review Upstream Connections:
- If NGINX is acting as a reverse proxy, slowness is often caused by the upstream service (e.g., a PHP, Node.js, or Python application).
- Action: Check netstat -tuna | grep ESTABLISHED to see the state of connections. Check your upstream application's logs and performance.

5. Troubleshooting Flow

Is it a CPU, Memory, Disk, or Network problem? Start with dstat or run top, free, and iostat simultaneously to get a quick overview.
Focus on the bottleneck. If %iowait is high, investigate disk with iotop. If CPU is high, use mpstat to check cores. If memory is low, check for processes with high RES in top.
Identify the responsible process. top, htop, and iotop will show you which application is consuming the resources.
Analyze the application. Once the process is identified (e.g., nginx, mysqld, php-fpm), move to application-specific diagnostics (logs, configuration, etc.).

System Tuning & Limits Guide

Understanding System Limits
File Descriptor Limits
Process & Memory Limits
Network Stack Tuning
Kernel Parameters
Application-Specific Tuning
Monitoring & Validation
Production Use Cases
Troubleshooting

Understanding System Limits

System Limit Hierarchy

Hardware Limits
    ↓
Kernel Limits (sysctl)
    ↓
Process Limits (ulimit/systemd)
    ↓
Application Limits
    ↓
User Limits

Types of Limits

Hard Limits: Maximum values that cannot be exceeded
Soft Limits: Current effective limits (can be increased up to hard limit)
System-wide Limits: Apply to entire system
Per-process Limits: Apply to individual processes
Per-user Limits: Apply to specific users

File Descriptor Limits

Understanding File Descriptors

File descriptors are used for:

Regular files
Network sockets
Pipes
Device files
Directories

Current Limits Check

# Check current limits for running process
ulimit -n        # Soft limit
ulimit -Hn       # Hard limit

# Check system-wide limits
cat /proc/sys/fs/file-max        # Maximum file descriptors system-wide
cat /proc/sys/fs/file-nr         # Current usage: allocated, unused, maximum

# Check per-process limit
cat /proc/sys/fs/nr_open         # Maximum FDs per process

# Check current usage for specific process
ls -la /proc/PID/fd | wc -l      # Count open FDs for process PID

Production File Descriptor Configuration

System-wide Configuration

# /etc/sysctl.conf or /etc/sysctl.d/99-file-limits.conf
sudo tee /etc/sysctl.d/99-file-limits.conf << 'EOF'
# Maximum number of file descriptors system-wide
fs.file-max = 2097152

# Maximum number of file descriptors per process
fs.nr_open = 1048576

# Inotify limits (for applications that watch files)
fs.inotify.max_user_watches = 524288
fs.inotify.max_user_instances = 256
fs.inotify.max_queued_events = 16384

# Directory entry cache
fs.dentry-state = 45000 40000 45 0 0 0
EOF

# Apply immediately
sudo sysctl -p /etc/sysctl.d/99-file-limits.conf

Per-user Limits Configuration

# /etc/security/limits.conf
sudo tee -a /etc/security/limits.conf << 'EOF'
# File descriptor limits
*               soft    nofile          65536
*               hard    nofile          65536
root            soft    nofile          65536
root            hard    nofile          65536

# Process limits
*               soft    nproc           32768
*               hard    nproc           32768

# Memory limits (in KB)
*               soft    memlock         unlimited
*               hard    memlock         unlimited

# Core dump size
*               soft    core            unlimited
*               hard    core            unlimited

# Stack size (8MB default, increase for some applications)
*               soft    stack           8192
*               hard    stack           8192

# Real-time priority
*               soft    rtprio          0
*               hard    rtprio          0

# Nice priority
*               soft    nice            0
*               hard    nice            0

# Maximum locked memory address space (KB)
*               soft    memlock         unlimited
*               hard    memlock         unlimited
EOF

Service-Specific Limits

# For high-performance applications
sudo tee /etc/security/limits.d/nginx.conf << 'EOF'
nginx           soft    nofile          100000
nginx           hard    nofile          100000
nginx           soft    nproc           32768
nginx           hard    nproc           32768
EOF

sudo tee /etc/security/limits.d/mysql.conf << 'EOF'
mysql           soft    nofile          65536
mysql           hard    nofile          65536
mysql           soft    memlock         unlimited
mysql           hard    memlock         unlimited
EOF

sudo tee /etc/security/limits.d/redis.conf << 'EOF'
redis           soft    nofile          65536
redis           hard    nofile          65536
redis           soft    memlock         unlimited
redis           hard    memlock         unlimited
EOF

Systemd Service Limits

# Example systemd service with custom limits
sudo tee /etc/systemd/system/high-performance-app.service << 'EOF'
[Unit]
Description=High Performance Application
After=network.target

[Service]
Type=simple
User=app
Group=app
ExecStart=/usr/local/bin/high-performance-app

# Resource limits
LimitNOFILE=100000
LimitNPROC=32768
LimitMEMLOCK=infinity
LimitCORE=infinity
LimitSTACK=8388608

# Memory settings
MemoryAccounting=true
MemoryMax=8G
MemorySwapMax=0

# CPU settings
CPUAccounting=true
CPUQuota=400%

# IO settings
IOAccounting=true
IOWeight=1000

# Process settings
TasksMax=16384

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload

Process & Memory Limits

Process Limits Deep Dive

Understanding Process Limits

# Check current process limits
cat /proc/PID/limits

# Check system-wide process limits
cat /proc/sys/kernel/pid_max        # Maximum PID value
cat /proc/sys/kernel/threads-max    # Maximum threads system-wide

# Check current usage
ps aux | wc -l                      # Current process count
cat /proc/loadavg                   # Load average

Advanced Process Configuration

# /etc/sysctl.d/99-process-limits.conf
sudo tee /etc/sysctl.d/99-process-limits.conf << 'EOF'
# Process limits
kernel.pid_max = 4194304
kernel.threads-max = 1048576

# Process scheduling
kernel.sched_min_granularity_ns = 2000000
kernel.sched_wakeup_granularity_ns = 3000000
kernel.sched_migration_cost_ns = 500000

# Process memory
kernel.shmmni = 4096
kernel.shmmax = 68719476736
kernel.shmall = 4294967296

# Semaphores (semmsl, semmns, semopm, semmni)
kernel.sem = 250 32000 100 128

# Message queues
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048

# Core dumps
kernel.core_pattern = /var/crash/core.%e.%p.%h.%t
kernel.core_uses_pid = 1
fs.suid_dumpable = 0
EOF

sudo sysctl -p /etc/sysctl.d/99-process-limits.conf

Memory Management Tuning

Virtual Memory Configuration

# /etc/sysctl.d/99-memory-tuning.conf
sudo tee /etc/sysctl.d/99-memory-tuning.conf << 'EOF'
# Virtual memory settings
vm.swappiness = 10                    # Reduce swap usage (0-100)
vm.dirty_ratio = 15                   # Percentage of memory that can be dirty
vm.dirty_background_ratio = 5         # Background writeback threshold
vm.dirty_expire_centisecs = 12000     # Time before dirty data is written (1/100 sec)
vm.dirty_writeback_centisecs = 1500   # Interval between writeback daemon runs

# Memory overcommit
vm.overcommit_memory = 1              # Allow overcommit
vm.overcommit_ratio = 50              # Percentage of physical RAM for overcommit

# Out of Memory (OOM) killer
vm.panic_on_oom = 0                   # Don't panic on OOM
vm.oom_kill_allocating_task = 0       # Kill the memory hogging task

# Transparent Huge Pages (for databases, disable for better latency)
# echo never > /sys/kernel/mm/transparent_hugepage/enabled
# echo never > /sys/kernel/mm/transparent_hugepage/defrag

# Memory zone reclaim
vm.zone_reclaim_mode = 0              # Disable zone reclaim

# Page cache and buffer cache
vm.vfs_cache_pressure = 100           # Pressure on VFS cache (default 100)
vm.min_free_kbytes = 65536           # Minimum free memory to maintain

# Shared memory
kernel.shmmax = 68719476736          # Maximum shared memory segment size
kernel.shmall = 4294967296           # Maximum shared memory pages
EOF

sudo sysctl -p /etc/sysctl.d/99-memory-tuning.conf

Memory Limits for Applications

# Configure systemd-logind for user sessions
sudo tee /etc/systemd/logind.conf.d/limits.conf << 'EOF'
[Login]
UserTasksMax=16384
EOF

# Configure cgroup memory limits
sudo tee /etc/systemd/system.conf.d/limits.conf << 'EOF'
[Manager]
DefaultTasksMax=16384
DefaultLimitNOFILE=65536
DefaultLimitNPROC=32768
DefaultLimitMEMLOCK=infinity
EOF

Network Stack Tuning

TCP/UDP Buffer Tuning

# /etc/sysctl.d/99-network-performance.conf
sudo tee /etc/sysctl.d/99-network-performance.conf << 'EOF'
# Network buffer sizes
net.core.rmem_default = 262144
net.core.rmem_max = 134217728
net.core.wmem_default = 262144
net.core.wmem_max = 134217728

# TCP buffer sizes (min, default, max)
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.ipv4.tcp_mem = 786432 1048576 26777216

# UDP buffer sizes
net.core.netdev_max_backlog = 30000
net.core.netdev_budget = 600

# Connection tracking
net.netfilter.nf_conntrack_max = 1048576
net.nf_conntrack_max = 1048576

# TCP connection handling
net.ipv4.tcp_max_syn_backlog = 8192
net.core.somaxconn = 65535
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_tw_buckets = 1440000

# TCP performance
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_keepalive_probes = 7
net.ipv4.tcp_keepalive_intvl = 30

# Network security and performance
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_rfc1337 = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_fack = 1

# IP forwarding and routing
net.ipv4.ip_forward = 0
net.ipv6.conf.all.forwarding = 0

# ICMP settings
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1

# ARP settings
net.ipv4.neigh.default.gc_thresh1 = 1024
net.ipv4.neigh.default.gc_thresh2 = 4096
net.ipv4.neigh.default.gc_thresh3 = 8192
EOF

sudo sysctl -p /etc/sysctl.d/99-network-performance.conf

Network Interface Tuning

# Check current network interface settings
ethtool -g eth0              # Ring buffer parameters
ethtool -c eth0              # Coalescing parameters
ethtool -k eth0              # Offload features

# Optimize network interface (example for eth0)
sudo tee /etc/systemd/network/99-eth0-tune.link << 'EOF'
[Match]
MACAddress=aa:bb:cc:dd:ee:ff

[Link]
# Ring buffer sizes
RxBufferSize=4096
TxBufferSize=4096

# Coalescing parameters
CoalescePacketLow=32
CoalescePacketHigh=128
EOF

# Alternative: Create script for interface optimization
sudo tee /usr/local/bin/tune-interface.sh << 'EOF'
#!/bin/bash

INTERFACE=${1:-eth0}

# Increase ring buffer sizes
ethtool -G $INTERFACE rx 4096 tx 4096

# Enable offload features
ethtool -K $INTERFACE gro on
ethtool -K $INTERFACE gso on
ethtool -K $INTERFACE tso on
ethtool -K $INTERFACE lro on

# Set interrupt coalescing
ethtool -C $INTERFACE adaptive-rx on adaptive-tx on

# Set multi-queue if supported
for i in /sys/class/net/$INTERFACE/queues/rx-*; do
    echo 2 > $i/rps_cpus
done

echo "Interface $INTERFACE tuned for performance"
EOF

chmod +x /usr/local/bin/tune-interface.sh

Kernel Parameters

Advanced Kernel Tuning

# /etc/sysctl.d/99-kernel-tuning.conf
sudo tee /etc/sysctl.d/99-kernel-tuning.conf << 'EOF'
# Kernel scheduler
kernel.sched_autogroup_enabled = 0
kernel.sched_child_runs_first = 1
kernel.sched_compat_yield = 0

# Kernel security
kernel.dmesg_restrict = 1
kernel.kptr_restrict = 2
kernel.yama.ptrace_scope = 1
kernel.randomize_va_space = 2

# System calls and debugging
kernel.core_pattern = |/usr/share/apport/apport %p %s %c %d %P %E
kernel.core_uses_pid = 1
fs.suid_dumpable = 0

# Random number generation
kernel.random.read_wakeup_threshold = 64
kernel.random.write_wakeup_threshold = 128

# System V IPC
kernel.shmmni = 4096
kernel.shmmax = 68719476736
kernel.shmall = 4294967296

# Semaphore limits (semmsl, semmns, semopm, semmni)
kernel.sem = 250 32000 100 128

# Message queue limits
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048

# User namespace
user.max_user_namespaces = 15000

# Watch limits
fs.inotify.max_user_watches = 1048576
fs.inotify.max_user_instances = 1024
fs.inotify.max_queued_events = 32768

# File system limits
fs.file-max = 2097152
fs.nr_open = 1048576

# AIO limits
fs.aio-max-nr = 1048576
EOF

sudo sysctl -p /etc/sysctl.d/99-kernel-tuning.conf

CPU Scaling and Power Management

# Check current CPU governor
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Set performance governor for all CPUs
sudo tee /usr/local/bin/set-cpu-performance.sh << 'EOF'
#!/bin/bash

# Set CPU governor to performance
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
    echo performance > $cpu
done

# Disable CPU idle states for lowest latency (optional)
# echo 1 > /sys/devices/system/cpu/cpu*/cpuidle/state*/disable

# Set CPU affinity for interrupts
echo 2 > /proc/irq/default_smp_affinity

echo "CPU performance settings applied"
EOF

chmod +x /usr/local/bin/set-cpu-performance.sh

# Create systemd service to apply at boot
sudo tee /etc/systemd/system/cpu-performance.service << 'EOF'
[Unit]
Description=Set CPU Performance Settings
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/set-cpu-performance.sh
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable cpu-performance.service

Application-Specific Tuning

Nginx Optimization

# /etc/security/limits.d/nginx.conf
sudo tee /etc/security/limits.d/nginx.conf << 'EOF'
nginx           soft    nofile          100000
nginx           hard    nofile          100000
nginx           soft    nproc           32768
nginx           hard    nproc           32768
nginx           soft    memlock         unlimited
nginx           hard    memlock         unlimited
EOF

# Nginx systemd override
sudo mkdir -p /etc/systemd/system/nginx.service.d
sudo tee /etc/systemd/system/nginx.service.d/limits.conf << 'EOF'
[Service]
LimitNOFILE=100000
LimitNPROC=32768
LimitMEMLOCK=infinity
EOF

# Nginx configuration optimization
sudo tee -a /etc/nginx/nginx.conf.d/performance.conf << 'EOF'
# Worker process optimization
worker_processes auto;
worker_rlimit_nofile 100000;
worker_cpu_affinity auto;

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
    accept_mutex off;
}

http {
    # Connection optimization
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 1000;
    
    # Buffer optimization
    client_body_buffer_size 128k;
    client_max_body_size 100m;
    client_header_buffer_size 3m;
    large_client_header_buffers 4 256k;
    
    # Output buffer optimization
    output_buffers 1 32k;
    postpone_output 1460;
}
EOF

MySQL/MariaDB Optimization

# /etc/security/limits.d/mysql.conf
sudo tee /etc/security/limits.d/mysql.conf << 'EOF'
mysql           soft    nofile          65536
mysql           hard    nofile          65536
mysql           soft    nproc           32768
mysql           hard    nproc           32768
mysql           soft    memlock         unlimited
mysql           hard    memlock         unlimited
mysql           soft    stack           8192
mysql           hard    stack           8192
EOF

# MySQL systemd override
sudo mkdir -p /etc/systemd/system/mysql.service.d
sudo tee /etc/systemd/system/mysql.service.d/limits.conf << 'EOF'
[Service]
LimitNOFILE=65536
LimitNPROC=32768
LimitMEMLOCK=infinity
EOF

# MySQL configuration optimization
sudo tee -a /etc/mysql/mysql.conf.d/performance.conf << 'EOF'
[mysqld]
# Connection limits
max_connections = 1000
max_connect_errors = 1000000
max_user_connections = 900

# Memory settings
innodb_buffer_pool_size = 4G
innodb_log_file_size = 256M
innodb_log_buffer_size = 64M
innodb_sort_buffer_size = 2M

# Thread settings
thread_cache_size = 100
thread_stack = 256K

# Query cache
query_cache_size = 128M
query_cache_limit = 2M

# Table settings
table_open_cache = 4000
table_definition_cache = 2000

# MyISAM settings
key_buffer_size = 256M
myisam_sort_buffer_size = 128M

# InnoDB settings
innodb_flush_method = O_DIRECT
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
EOF

Redis Optimization

# /etc/security/limits.d/redis.conf
sudo tee /etc/security/limits.d/redis.conf << 'EOF'
redis           soft    nofile          65536
redis           hard    nofile          65536
redis           soft    memlock         unlimited
redis           hard    memlock         unlimited
EOF

# Redis systemd override
sudo mkdir -p /etc/systemd/system/redis.service.d
sudo tee /etc/systemd/system/redis.service.d/limits.conf << 'EOF'
[Service]
LimitNOFILE=65536
LimitMEMLOCK=infinity
EOF

# Redis configuration optimization
sudo tee -a /etc/redis/redis.conf.d/performance.conf << 'EOF'
# Memory optimization
maxmemory 4gb
maxmemory-policy allkeys-lru

# Network optimization
tcp-keepalive 300
tcp-backlog 511

# Performance optimization
hz 10
rdbcompression yes
rdbchecksum yes

# Client optimization
timeout 300
EOF

# Disable transparent huge pages for Redis
sudo tee /etc/systemd/system/disable-thp.service << 'EOF'
[Unit]
Description=Disable Transparent Huge Pages
After=sysinit.target local-fs.target
Before=redis.service

[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'
ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/defrag'

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl enable disable-thp.service

Apache Optimization

# /etc/security/limits.d/apache.conf
sudo tee /etc/security/limits.d/apache.conf << 'EOF'
www-data        soft    nofile          65536
www-data        hard    nofile          65536
www-data        soft    nproc           32768
www-data        hard    nproc           32768
EOF

# Apache configuration optimization
sudo tee /etc/apache2/conf-available/performance.conf << 'EOF'
# MPM Prefork optimization
<IfModule mpm_prefork_module>
    StartServers            8
    MinSpareServers         5
    MaxSpareServers         20
    MaxRequestWorkers       400
    MaxConnectionsPerChild  10000
</IfModule>

# MPM Worker optimization
<IfModule mpm_worker_module>
    StartServers            4
    MinSpareThreads         25
    MaxSpareThreads         75
    ThreadsPerChild         25
    MaxRequestWorkers       400
    MaxConnectionsPerChild  10000
</IfModule>

# MPM Event optimization (recommended)
<IfModule mpm_event_module>
    StartServers            4
    MinSpareThreads         25
    MaxSpareThreads         75
    ThreadsPerChild         25
    MaxRequestWorkers       400
    MaxConnectionsPerChild  10000
    AsyncRequestWorkerFactor 2
</IfModule>

# Keep alive optimization
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 5
EOF

sudo a2enconf performance

Monitoring & Validation

Comprehensive Monitoring Script

sudo tee /usr/local/bin/system-limits-monitor.sh << 'EOF'
#!/bin/bash

REPORT_FILE="/var/log/system-limits-report-$(date +%Y%m%d-%H%M%S).log"

exec > >(tee -a "$REPORT_FILE")
exec 2>&1

echo "System Limits Monitoring Report"
echo "==============================="
echo "Date: $(date)"
echo "Hostname: $(hostname)"
echo ""

# File descriptor usage
echo "=== File Descriptor Usage ==="
echo "System-wide limit: $(cat /proc/sys/fs/file-max)"
echo "Current usage: $(cat /proc/sys/fs/file-nr | awk '{print $1}')"
echo "Peak usage: $(cat /proc/sys/fs/file-nr | awk '{print $2}')"
echo "Percentage used: $(cat /proc/sys/fs/file-nr | awk '{printf "%.2f%%", ($1/$3)*100}')"
echo ""

# Process limits
echo "=== Process Limits ==="
echo "Maximum PID: $(cat /proc/sys/kernel/pid_max)"
echo "Maximum threads: $(cat /proc/sys/kernel/threads-max)"
echo "Current processes: $(ps aux | wc -l)"
echo "Current threads: $(ps -eLf | wc -l)"
echo ""

# Memory usage
echo "=== Memory Usage ==="
free -h
echo ""
echo "Memory overcommit: $(cat /proc/sys/vm/overcommit_memory)"
echo "Overcommit ratio: $(cat /proc/sys/vm/overcommit_ratio)%"
echo ""

# Network limits
echo "=== Network Limits ==="
echo "Connection tracking max: $(cat /proc/sys/net/netfilter/nf_conntrack_max 2>/dev/null || echo 'N/A')"
echo "Current connections: $(cat /proc/net/nf_conntrack 2>/dev/null | wc -l || echo 'N/A')"
echo "TCP memory: $(cat /proc/sys/net/ipv4/tcp_mem)"
echo "Socket buffer max: $(cat /proc/sys/net/core/rmem_max) / $(cat /proc/sys/net/core/wmem_max)"
echo ""

# Top processes by file descriptor usage
echo "=== Top Processes by File Descriptor Usage ==="
echo "PID     FDs     COMMAND"
for pid in $(ps -eo pid --no-headers | head -20); do
    if [ -d "/proc/$pid/fd" ]; then
        fd_count=$(ls -1 /proc/$pid/fd 2>/dev/null | wc -l)
        cmd=$(ps -p $pid -o comm= 2>/dev/null)
        printf "%-8s %-8s %s\n" "$pid" "$fd_count" "$cmd"
    fi
done | sort -k2 -nr | head -10
echo ""

# Ulimit for current user
echo "=== Current User Limits ==="
ulimit -a
echo ""

# System load
echo "=== System Load ==="
uptime
echo ""
cat /proc/loadavg
echo ""

echo "Report saved to: $REPORT_FILE"
EOF

chmod +x /usr/local/bin/system-limits-monitor.sh

# Add to cron for regular monitoring
echo "0 */6 * * * /usr/local/bin/system-limits-monitor.sh" | sudo crontab -

Performance Validation Script

sudo tee /usr/local/bin/validate-system-tuning.sh << 'EOF'
#!/bin/bash

echo "System Tuning Validation"
echo "========================"

# Test file descriptor limits
test_fd_limits() {
    echo "Testing file descriptor limits..."
    
    # Create test script that opens many files
    cat > /tmp/fd_test.py << 'PYTHON'
import sys
import os

max_fds = 0
files = []

try:
    for i in range(100000):
        f = open('/dev/null', 'r')
        files.append(f)
        max_fds = i + 1
except OSError as e:
    print(f"Reached limit at {max_fds} file descriptors")
    print(f"Error: {e}")
finally:
    for f in files:
        f.close()
PYTHON

    python3 /tmp/fd_test.py
    rm -f /tmp/fd_test.py
}

# Test network performance
test_network_performance() {
    echo "Testing network performance..."
    
    # Check if we can create many connections
    python3 -c "
import socket
import threading
import time

def create_socket():
    try:
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.bind(('localhost', 0))
        s.listen(1)
        time.sleep(0.1)
        s.close()
    except Exception as e:
        print(f'Socket error: {e}')

threads = []
for i in range(100):
    t = threading.Thread(target=create_socket)
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print('Network socket test completed')
"
}

# Test memory allocation
test_memory_allocation() {
    echo "Testing memory allocation..."
    
    python3 -c "
import sys

# Test memory allocation
memory_mb = 0
data = []

try:
    for i in range(1000):  # Try to allocate 1GB in 1MB chunks
        chunk = bytearray(1024 * 1024)  # 1MB
        data.append(chunk)
        memory_mb += 1
        if memory_mb % 100 == 0:
            print(f'Allocated {memory_mb} MB')
except MemoryError:
    print(f'Memory allocation limit reached at {memory_mb} MB')
except Exception as e:
    print(f'Error: {e}')
"
}

# Run tests
echo "Running validation tests..."
echo ""

test_fd_limits
echo ""

test_network_performance
echo ""

test_memory_allocation
echo ""

echo "Validation completed"
EOF

chmod +x /usr/local/bin/validate-system-tuning.sh

Production Use Cases

Use Case 1: High-Traffic Web Server

# Web server optimization profile
sudo tee /etc/sysctl.d/99-webserver.conf << 'EOF'
# Network optimization for web servers
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 30000
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_slow_start_after_idle = 0

# Memory optimization
vm.swappiness = 1
vm.dirty_ratio = 10
vm.dirty_background_ratio = 3

# File descriptor limits
fs.file-max = 1000000
fs.nr_open = 1000000
EOF

# Nginx limits for high traffic
sudo tee /etc/security/limits.d/webserver.conf << 'EOF'
nginx           soft    nofile          200000
nginx           hard    nofile          200000
www-data        soft    nofile          200000
www-data        hard    nofile          200000
EOF

Use Case 2: Database Server

# Database server optimization
sudo tee /etc/sysctl.d/99-database.conf << 'EOF'
# Memory optimization for databases
vm.swappiness = 1
vm.dirty_ratio = 3
vm.dirty_background_ratio = 1
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100

# Shared memory for databases
kernel.shmmax = 137438953472  # 128GB
kernel.shmall = 33554432      # 128GB in pages
kernel.shmmni = 4096

# Semaphore limits for databases
kernel.sem = 250 32000 100 128

# File system optimization
fs.file-max = 6815744
fs.aio-max-nr = 1048576
EOF

# Database process limits
sudo tee /etc/security/limits.d/database.conf << 'EOF'
mysql           soft    nofile          65536
mysql           hard    nofile          65536
mysql           soft    memlock         unlimited
mysql           hard    memlock         unlimited
mysql           soft    stack           10240
mysql           hard    stack           32768
postgres        soft    nofile          65536
postgres        hard    nofile          65536
postgres        soft    memlock         unlimited
postgres        hard    memlock         unlimited
EOF

Use Case 3: Container Host

# Container host optimization
sudo tee /etc/sysctl.d/99-container-host.conf << 'EOF'
# Network optimization for containers
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
net.ipv6.conf.all.forwarding = 1

# Container networking
net.netfilter.nf_conntrack_max = 2097152
net.core.netdev_max_backlog = 16384

# Memory and process limits
vm.max_map_count = 262144
kernel.pid_max = 4194304
kernel.threads-max = 4194304

# File descriptor limits
fs.file-max = 9223372036854775807
fs.nr_open = 1048576
fs.inotify.max_user_watches = 1048576
fs.inotify.max_user_instances = 8192
EOF

# Container runtime limits
sudo tee /etc/security/limits.d/container.conf << 'EOF'
root            soft    nofile          1048576
root            hard    nofile          1048576
root            soft    nproc           unlimited
root            hard    nproc           unlimited
EOF

Use Case 4: High-Performance Computing (HPC)

# HPC node optimization
sudo tee /etc/sysctl.d/99-hpc.conf << 'EOF'
# CPU and scheduler optimization
kernel.sched_min_granularity_ns = 1000000
kernel.sched_wakeup_granularity_ns = 1500000
kernel.numa_balancing = 0

# Memory optimization for HPC
vm.zone_reclaim_mode = 1
vm.swappiness = 1
vm.dirty_ratio = 40
vm.dirty_background_ratio = 10

# Network optimization for HPC
net.core.rmem_max = 268435456
net.core.wmem_max = 268435456
net.ipv4.tcp_rmem = 4096 65536 268435456
net.ipv4.tcp_wmem = 4096 65536 268435456

# Process limits for parallel computing
kernel.pid_max = 4194304
kernel.threads-max = 2097152
EOF

# HPC user limits
sudo tee /etc/security/limits.d/hpc.conf << 'EOF'
*               soft    memlock         unlimited
*               hard    memlock         unlimited
*               soft    stack           unlimited
*               hard    stack           unlimited
*               soft    nofile          1048576
*               hard    nofile          1048576
*               soft    nproc           unlimited
*               hard    nproc           unlimited
EOF

Troubleshooting

Common Issues and Solutions

"Too many open files" Error

# Diagnosis
echo "Current process file descriptor usage:"
lsof -p $PID | wc -l

echo "Process limit:"
cat /proc/$PID/limits | grep "Max open files"

echo "System-wide usage:"
cat /proc/sys/fs/file-nr

# Solutions
# 1. Increase limits in /etc/security/limits.conf
# 2. Increase systemd service limits
# 3. Check for file descriptor leaks in application

"Cannot fork" Error

# Diagnosis
echo "Current process count:"
ps aux | wc -l

echo "Maximum processes:"
cat /proc/sys/kernel/pid_max

echo "Per-user process limits:"
ulimit -u

# Solutions
# 1. Increase kernel.pid_max
# 2. Increase per-user nproc limits
# 3. Check for process leaks

Memory Allocation Failures

# Diagnosis
echo "Memory usage:"
free -h

echo "Memory overcommit settings:"
cat /proc/sys/vm/overcommit_memory
cat /proc/sys/vm/overcommit_ratio

echo "OOM killer activity:"
dmesg | grep -i "killed process"

# Solutions
# 1. Increase physical memory
# 2. Adjust overcommit settings
# 3. Optimize application memory usage

System Limits Debugging Tool

sudo tee /usr/local/bin/debug-limits.sh << 'EOF'
#!/bin/bash

PID=${1:-$$}

echo "Debugging system limits for PID: $PID"
echo "======================================"

if [ ! -d "/proc/$PID" ]; then
    echo "Process $PID not found"
    exit 1
fi

echo "Process information:"
ps -p $PID -o pid,ppid,cmd

echo ""
echo "Current limits:"
cat /proc/$PID/limits

echo ""
echo "File descriptor usage:"
echo "Open FDs: $(ls -1 /proc/$PID/fd 2>/dev/null | wc -l)"
echo "FD limit: $(cat /proc/$PID/limits | grep "Max open files" | awk '{print $4}')"

echo ""
echo "Memory usage:"
cat /proc/$PID/status | grep -E "(VmPeak|VmSize|VmRSS|VmSwap)"

echo ""
echo "Thread usage:"
echo "Threads: $(cat /proc/$PID/status | grep Threads | awk '{print $2}')"

echo ""
echo "System-wide limits:"
echo "File descriptors: $(cat /proc/sys/fs/file-nr)"
echo "Processes: $(ps aux | wc -l) / $(cat /proc/sys/kernel/pid_max)"
echo "Memory: $(free -h | grep Mem)"

echo ""
echo "Network connections (if applicable):"
netstat -an | grep -E "(LISTEN|ESTABLISHED)" | wc -l

echo ""
echo "Load average:"
cat /proc/loadavg
EOF

chmod +x /usr/local/bin/debug-limits.sh

Performance Impact Assessment

sudo tee /usr/local/bin/assess-tuning-impact.sh << 'EOF'
#!/bin/bash

BASELINE_FILE="/tmp/baseline-performance.txt"
CURRENT_FILE="/tmp/current-performance.txt"

# Function to collect performance metrics
collect_metrics() {
    local output_file="$1"
    
    {
        echo "=== System Load ==="
        cat /proc/loadavg
        
        echo "=== Memory Usage ==="
        free
        
        echo "=== Network Connections ==="
        netstat -an | grep -c ESTABLISHED
        
        echo "=== File Descriptor Usage ==="
        cat /proc/sys/fs/file-nr
        
        echo "=== Context Switches ==="
        grep ctxt /proc/stat
        
        echo "=== Interrupts ==="
        grep intr /proc/stat | awk '{print $2}'
        
        echo "=== Network Interface Stats ==="
        cat /proc/net/dev | grep eth0
        
    } > "$output_file"
}

# Collect baseline if not exists
if [ ! -f "$BASELINE_FILE" ]; then
    echo "Collecting baseline metrics..."
    collect_metrics "$BASELINE_FILE"
    echo "Baseline collected. Run again after system changes to compare."
    exit 0
fi

# Collect current metrics
echo "Collecting current metrics..."
collect_metrics "$CURRENT_FILE"

# Compare metrics
echo "Performance comparison:"
echo "====================="

# Load average comparison
baseline_load=$(grep "loadavg" "$BASELINE_FILE" | awk '{print $1}')
current_load=$(grep "loadavg" "$CURRENT_FILE" | awk '{print $1}')
echo "Load average: $baseline_load → $current_load"

# Memory comparison
baseline_mem=$(grep "Mem:" "$BASELINE_FILE" | awk '{print $3}')
current_mem=$(grep "Mem:" "$CURRENT_FILE" | awk '{print $3}')
echo "Memory used: $baseline_mem → $current_mem"

# Network connections
baseline_conn=$(grep "ESTABLISHED" "$BASELINE_FILE")
current_conn=$(grep "ESTABLISHED" "$CURRENT_FILE")
echo "Network connections: $baseline_conn → $current_conn"

echo ""
echo "Files saved:"
echo "Baseline: $BASELINE_FILE"
echo "Current:  $CURRENT_FILE"
EOF

chmod +x /usr/local/bin/assess-tuning-impact.sh

This comprehensive guide provides system engineers with the deep knowledge needed to properly tune system limits for production environments. Each section includes practical examples, monitoring tools, and troubleshooting approaches specific to different use cases.

1. Introduction​

2. Quick Command Reference​

3. Deep Dive Analysis​

3.1. CPU Analysis​

3.2. Memory Analysis​

3.3. Disk I/O Analysis​

3.4. Network Analysis​

4. Specific Guide: Analyzing NGINX Performance​

5. Troubleshooting Flow​

System Tuning & Limits Guide

Table of Contents​

Understanding System Limits​

System Limit Hierarchy​

Types of Limits​

File Descriptor Limits​

Understanding File Descriptors​

Current Limits Check​

Production File Descriptor Configuration​

System-wide Configuration​

Per-user Limits Configuration​

Service-Specific Limits​

Systemd Service Limits​

Process & Memory Limits​

Process Limits Deep Dive​

Understanding Process Limits​

Advanced Process Configuration​

Memory Management Tuning​

Virtual Memory Configuration​

Memory Limits for Applications​

Network Stack Tuning​

TCP/UDP Buffer Tuning​

Network Interface Tuning​

Kernel Parameters​

Advanced Kernel Tuning​

CPU Scaling and Power Management​

Application-Specific Tuning​

Nginx Optimization​

MySQL/MariaDB Optimization​

Redis Optimization​

Apache Optimization​

Monitoring & Validation​

Comprehensive Monitoring Script​

Performance Validation Script​

Production Use Cases​

Use Case 1: High-Traffic Web Server​

Use Case 2: Database Server​

Use Case 3: Container Host​

Use Case 4: High-Performance Computing (HPC)​

Troubleshooting​

Common Issues and Solutions​

"Too many open files" Error​

"Cannot fork" Error​

Memory Allocation Failures​

System Limits Debugging Tool​

Performance Impact Assessment​