Technical

Deployment

Docker Compose orchestration with certificate generation, JVM tuning, rolling restarts, and production-grade configuration.

Service Startup Order

1es-setup
2es01
3es02 / es03
4kibana

The es-setup init container generates TLS certificates and sets file permissions before any Elasticsearch node starts. Once es01 is healthy, es02 and es03 join the cluster. Kibana waits for es01 availability.

Certificate Generation

es-setup init containerbash
# Generate CA (if not exists)
elasticsearch-certutil ca \
  --silent --pem \
  -out /usr/share/elasticsearch/config/certs/ca.zip

# Generate node certificates
elasticsearch-certutil cert \
  --silent --pem \
  --ca-cert ca/ca.crt --ca-key ca/ca.key \
  -out /usr/share/elasticsearch/config/certs/certs.zip \
  --in instances.yml

# instances.yml defines:
#   - es01 (DNS: es01, localhost)
#   - es02 (DNS: es02, localhost)
#   - es03 (DNS: es03, localhost)

# Set permissions for elasticsearch user (uid 1000)
chown -R 1000:0 /usr/share/elasticsearch/config/certs

The certificate chain follows a standard CA hierarchy: a self-signed CA certificate signs individual node certificates. Each node's certificate includes its hostname as a Subject Alternative Name (SAN), which is required for TLS hostname verification during inter-node communication.

Key Configuration

docker-compose.yml — es01 serviceyaml
es01:
  image: docker.elastic.co/elasticsearch/elasticsearch:8.17.0
  container_name: es01
  environment:
    - node.name=es01
    - cluster.name=maclab-es
    - discovery.seed_hosts=es02,es03
    - cluster.initial_master_nodes=es01,es02,es03
    - bootstrap.memory_lock=true
    - ES_JAVA_OPTS=-Xms1g -Xmx1g
    - xpack.security.enabled=true
    - xpack.security.http.ssl.enabled=true
    - xpack.security.http.ssl.key=certs/es01/es01.key
    - xpack.security.http.ssl.certificate=certs/es01/es01.crt
    - xpack.security.http.ssl.certificate_authorities=certs/ca/ca.crt
    - xpack.security.transport.ssl.enabled=true
    - xpack.security.transport.ssl.key=certs/es01/es01.key
    - xpack.security.transport.ssl.certificate=certs/es01/es01.crt
    - xpack.security.transport.ssl.certificate_authorities=certs/ca/ca.crt
    - xpack.security.transport.ssl.verification_mode=certificate
  mem_limit: 2147483648  # 2GB
  ulimits:
    memlock:
      soft: -1
      hard: -1
    nofile:
      soft: 65536
      hard: 65536
  volumes:
    - certs:/usr/share/elasticsearch/config/certs
    - es01_data:/usr/share/elasticsearch/data
  networks:
    - maclab
    - data

Critical Settings Explained

bootstrap.memory_lock: true

Prevents the JVM heap from being swapped to disk. Swapping destroys Elasticsearch performance because the JVM garbage collector must interact with memory in a predictable time. The memlock ulimit must be set to unlimited (-1) for this to work.

vm.max_map_count = 262144

Elasticsearch uses memory-mapped files extensively for Lucene segment storage. The default Linux limit (65530) is insufficient. This must be set on the Docker host: sysctl -w vm.max_map_count=262144. On macOS with Docker Desktop, this is already set by the hypervisor.

discovery.seed_hosts

Lists the addresses of other master-eligible nodes for cluster formation. During initial bootstrap, combined with cluster.initial_master_nodes, this allows the nodes to discover each other and elect an initial master.

cluster.initial_master_nodes

Only used during initial cluster bootstrap. Lists the node names (not hostnames) of master-eligible nodes that should participate in the first election. After the cluster forms, this setting is ignored — the voting configuration is stored in the cluster state.

nofile ulimit: 65536

Elasticsearch opens many file descriptors — one for every Lucene segment, plus network connections. The default limit (1024) causes 'too many open files' errors under load.

Rolling Restart Procedure

When installing the Nori Korean analysis plugin, a rolling restart is required since plugins cannot be loaded at runtime. The procedure minimizes downtime:

Rolling restart for plugin installationbash
# 1. Disable shard allocation (prevents unnecessary rebalancing)
curl -X PUT "https://localhost:9200/_cluster/settings" \
  --cacert ca.crt -u elastic:$PASSWORD \
  -H 'Content-Type: application/json' -d '{
    "persistent": {
      "cluster.routing.allocation.enable": "primaries"
    }
  }'

# 2. Perform a synced flush (speeds up recovery)
curl -X POST "https://localhost:9200/_flush/synced" \
  --cacert ca.crt -u elastic:$PASSWORD

# 3. Stop one node, install plugin, restart
docker compose stop es02
docker compose run --rm es02 \
  elasticsearch-plugin install analysis-nori
docker compose start es02

# 4. Wait for node to rejoin, repeat for es03, then es01

# 5. Re-enable shard allocation
curl -X PUT "https://localhost:9200/_cluster/settings" \
  --cacert ca.crt -u elastic:$PASSWORD \
  -H 'Content-Type: application/json' -d '{
    "persistent": {
      "cluster.routing.allocation.enable": null
    }
  }'

Interview Note

Rolling restarts are one of the most common operational tasks in Elasticsearch. The key insight is disabling allocation first — otherwise, the cluster will start copying shards to other nodes the moment you stop a node, wasting I/O and network bandwidth. The synced flush step ensures that shard recovery after restart only needs to replay the transaction log rather than copying entire segments.

JVM Heap Configuration

Environment variable approachbash
# Set via ES_JAVA_OPTS environment variable
ES_JAVA_OPTS=-Xms1g -Xmx1g

# -Xms and -Xmx should ALWAYS be equal
# This prevents the JVM from resizing the heap at runtime,
# which causes GC pauses and performance spikes.

# Rule of thumb:
#   Container RAM: 2GB → Heap: 1GB (50%)
#   Container RAM: 8GB → Heap: 4GB (50%)
#   Container RAM: 64GB → Heap: 31GB (cap at compressed oops)
#   Never exceed 50% — Lucene needs the other half for caching