Investigating and Resolving Disk Pressure Caused by Docker [Ubuntu 22.04 Case Study]

Introduction

This article records a case where Elasticsearch errors occurred due to disk pressure caused by Docker containers and images, along with the investigation and resolution methods. We hope this serves as a reference for those facing similar issues.

Problem Occurrence

The following error occurred in a running Elasticsearch instance.

Initial investigation revealed that indices were in a close state, and insufficient disk space was suspected.

Investigating Disk Usage

Checking Root Directory Usage

First, we checked the overall disk usage of the system.

Output:

It was found that the /var directory was abnormally large at 50GB.

Detailed Investigation of /var Directory

Output:

Since /var/lib occupied nearly all the capacity, we investigated further.

Output:

Root cause identified: Docker data was occupying 49GB.

Analyzing Docker Disk Usage

We checked the detailed usage of Docker.

Output:

Analysis Results

Images: 33 out of 38 (approximately 36GB) were unused
Build Cache: All 3GB were deletable
Containers: Most were active and not eligible for deletion
Volumes: Mostly in use

Performing Cleanup

Bulk Cleanup Command

We deleted unused resources in bulk with the following command.

This command deletes:

Stopped containers
Unused images (including untagged ones with the -a option)
Unused networks
Unused volumes (with the --volumes option)
Build cache

Results

Approximately 39GB of free disk space was recovered.

Prevention Measures

Configuring Docker Log Rotation

To prevent Docker container logs from accumulating indefinitely, we edited /etc/docker/daemon.json.

Configuration explanation:

max-size: Maximum size of a single log file
max-file: Number of log files to retain

Applying the Configuration

Considering Periodic Cleanup

In production environments, automating periodic cleanup can also be considered.

Results and Lessons Learned

Resolution Results

Elasticsearch errors were resolved
Disk usage was reduced from 60GB to 21GB
System stability improved

Lessons Learned

Importance of regular monitoring: Regular monitoring of disk usage is necessary
Docker operations management: Unused resources tend to accumulate, especially in development environments
Importance of log management: Log rotation configuration is essential
Preventive maintenance: Periodic cleanup before problems occur is effective

Summary

In environments using Docker, images, containers, and build cache tend to accumulate, making regular cleanup important. We recommend implementing proper operational management using the investigation methods and solutions introduced in this article.

Through this response, we were able to restore stable server operation. We hope this helps others facing similar issues.

Reference Command List

Introduction#

Problem Occurrence#

Investigating Disk Usage#

Checking Root Directory Usage#

Detailed Investigation of /var Directory#

Analyzing Docker Disk Usage#

Analysis Results#

Performing Cleanup#

Bulk Cleanup Command#

Results#

Prevention Measures#

Configuring Docker Log Rotation#

Applying the Configuration#

Considering Periodic Cleanup#

Results and Lessons Learned#

Resolution Results#

Lessons Learned#

Summary#

Introduction

Problem Occurrence

Investigating Disk Usage

Checking Root Directory Usage

Detailed Investigation of /var Directory

Analyzing Docker Disk Usage

Analysis Results

Performing Cleanup

Bulk Cleanup Command

Results

Prevention Measures

Configuring Docker Log Rotation

Applying the Configuration

Considering Periodic Cleanup

Results and Lessons Learned

Resolution Results

Lessons Learned

Summary