Customer Support Perspective
KCS methodology, troubleshooting playbooks, customer communication patterns, and APJ-specific considerations for the Senior Support Engineer role.
KCS: Knowledge Centered Service
KCS is a methodology for integrating knowledge creation and maintenance into the support workflow. Instead of treating knowledge base articles as a separate task, KCS makes knowledge capture a natural byproduct of solving customer issues. Elastic uses KCS as a core practice for their support organization.
Four KCS Principles
Abundance
Knowledge is not a scarce resource to be hoarded. The more you share, the more valuable it becomes. Everyone contributes, everyone benefits.
Create Value
Every interaction creates organizational value. Even if you can't solve the issue immediately, documenting the symptoms and investigation steps creates knowledge for the next engineer.
Demand-Driven
Create and maintain knowledge based on actual customer demand. Don't pre-write articles speculatively — write them when customers ask. The most-viewed articles are the most valuable.
Trust
Trust contributors to create and modify knowledge. Peer review happens naturally through reuse. Articles improve over time as more engineers encounter the same issue.
The Solve Loop
The Solve Loop is the operational heart of KCS. It happens with every customer interaction:
Capture
Document the customer's context and problem in their words as you work the case. Don't wait until resolution.
Structure
Use a consistent template (SPRE) so articles are scannable. Situation, Problem, Resolution, Environment.
Reuse
Before investigating, search the knowledge base. If an article exists, link it. If it's incomplete, improve it while solving.
Improve
Every touch improves an article. Add missing steps, correct errors, expand the environment section. Flag articles that are wrong.
The Evolve Loop
The Evolve Loop is the organizational layer that improves the KCS practice itself over time:
Content Health
Monitor article quality metrics: reuse rate, freshness, flagged articles. Retire stale content. Identify gaps in coverage.
Process Integration
KCS must be embedded in the workflow, not bolted on. Tools should prompt for knowledge capture. Search should surface articles in the case form.
Performance Assessment
Measure success by knowledge contribution quality, not just ticket count. Recognize engineers who improve the most articles.
Leadership & Communication
Leadership must model KCS behavior. Celebrate knowledge sharing. Invest in training. Communicate the business value of the knowledge base.
Article Lifecycle
SPRE Template
SITUATION
What is the customer experiencing? What are the symptoms?
Example: "Cluster health is RED after adding a 4th node"
PROBLEM
What is the root cause?
Example: "Disk watermark exceeded on new node, preventing shard allocation"
RESOLUTION
Step-by-step fix
Example:
1. Check disk watermarks: GET _cluster/settings
2. Free disk space or adjust watermarks
3. Re-enable allocation
4. Verify cluster health returns to GREEN
ENVIRONMENT
Version, OS, deployment type, relevant configuration
Example: "Elasticsearch 8.12.0, RHEL 8, self-managed, 4 nodes"Elastic + KCS + GenAI
Elastic has integrated GenAI into their KCS workflow with significant results:
6x
Increase in case deflection through AI-powered knowledge base search. Customers find answers before opening tickets.
23%
Improvement in Mean Time to First Response (MTFR). AI suggests relevant articles to support engineers, reducing research time.
KCS Terminology Cheat Sheet
| Term | Definition |
|---|---|
| Solve Loop | Capture-Structure-Reuse-Improve cycle during case work |
| Evolve Loop | Organizational improvement of KCS practices |
| SPRE | Situation-Problem-Resolution-Environment article template |
| Reuse | Using existing knowledge to solve new cases |
| Flagging | Marking articles as incorrect or incomplete |
| Content Standard | Quality criteria for articles at each lifecycle stage |
| KCS Coach | Peer mentor who helps engineers improve KCS practices |
| Deflection | Customer self-serves using knowledge base without opening a ticket |
Troubleshooting Methodology: RED THEN GREEN
Cluster is RED
# Check cluster health
curl -s https://localhost:9200/_cluster/health?pretty \
--cacert ca.crt -u elastic:$PASSWORD
# Find unassigned shards
curl -s https://localhost:9200/_cat/shards?v\&h=index,shard,prirep,state,unassigned.reason \
--cacert ca.crt -u elastic:$PASSWORD | grep UNASSIGNED
# Get allocation explanation for a specific shard
curl -s https://localhost:9200/_cluster/allocation/explain?pretty \
--cacert ca.crt -u elastic:$PASSWORD -H 'Content-Type: application/json' -d '{
"index": "support-tickets",
"shard": 0,
"primary": true
}'Common Causes of RED
Disk watermark exceeded (default: 85% low, 90% high, 95% flood)
Node holding the only copy of a primary shard is down
Corrupted shard data (Lucene segment corruption)
Insufficient master-eligible nodes for quorum
Allocation filtering rules preventing shard assignment
Search is Slow
# Enable slow logs
PUT /support-tickets/_settings
{
"index.search.slowlog.threshold.query.warn": "5s",
"index.search.slowlog.threshold.query.info": "2s",
"index.search.slowlog.threshold.fetch.warn": "1s"
}
# Profile a specific query
POST /support-tickets/_search
{
"profile": true,
"query": {
"match": { "description": "클러스터 상태" }
}
}
# Check hot threads (find CPU-heavy operations)
GET _nodes/hot_threadsData is Missing
Investigation Checklist
1. Check if the ingest pipeline is running: GET _ingest/pipeline/maclab-logs
2. Check for pipeline errors: GET _nodes/stats/ingest
3. Verify the index template matches the index pattern
4. Check if the index exists and is writeable (not read-only from watermark)
5. Verify the mapping accepts the field types being sent
6. Check for bulk indexing rejections in node stats
7. Verify the refresh interval — documents are not searchable until refreshed
Customer Communication
Lead with Empathy
Acknowledge the customer's frustration before diving into technical details. 'I understand this is impacting your production environment and I'm prioritizing this immediately.' In Korean: '프로덕션 환경에 영향을 주고 있다는 점 충분히 이해합니다. 즉시 최우선으로 대응하겠습니다.'
Set Expectations Early
Tell the customer what you're going to do, how long it might take, and when you'll next update them. Never leave a customer wondering if you're still working on their issue.
Explain, Don't Just Fix
A support engineer who fixes the issue AND explains what happened creates trust. 'The cluster went RED because disk usage exceeded the flood watermark at 95%. Here's how to prevent this in the future...'
Follow Up Proactively
After resolving the issue, check back in 24-48 hours. 'Hi, I wanted to confirm that your cluster health has remained GREEN since our last interaction. Did the disk monitoring alert we set up trigger correctly?'
APJ-Specific Considerations
Timezone Management
APJ spans UTC+5:30 (India) to UTC+13 (New Zealand). Korean business hours (KST, UTC+9) overlap well with Japan and Australia but require handoff coordination with India. As a Korean-based engineer, maintaining flexible hours for APAC-wide escalations is expected.
Cultural Sensitivity
Korean enterprise customers (Samsung, LG, SK, Hyundai) use formal honorific language (존댓말). Technical support in Korean requires proper formal register: "확인해 보겠습니다" (formal) not "확인해 볼게" (casual). Japanese customers similarly expect keigo (敬語). Understanding these nuances builds trust.
Korean Language Support
Providing support in Korean eliminates the translation barrier that adds resolution time. A Korean customer explaining "샤드가 할당되지 않습니다" (shards are not being allocated) should not need to translate their problem to English to get help. This is why the role requires native Korean.
Regional Compliance
Korean customers often operate under PIPA (개인정보보호법, Personal Information Protection Act). Data residency requirements may affect cluster architecture decisions — some customers require all data to remain within Korean borders, impacting snapshot repository locations and cross-region replication.