Building a Claude API Load Indicator for Your Terminal

By Jorge Saud · April 8, 2026

If you spend a lot of time in Claude Code or the Anthropic API, you’ve probably wondered: is it slow right now, or is it just me? This post walks through how I built a small zsh plugin that answers that question directly in my terminal prompt.

What It Does

The final result is a Powerlevel10k segment on the right side of my prompt that shows:

Icon	Color	Meaning
	Green	API responding normally
	Yellow	Elevated latency (baseline + 1σ)
	Red	Peak load (baseline + 2σ)
	Grey	API unreachable

It also shows the raw latency in milliseconds. The icon updates every minute automatically via a background macOS launchd job — no shell slowdown, no tokens consumed.

Architecture Overview

The whole thing is packaged as an oh-my-zsh custom plugin with five files:

~/.oh-my-zsh/custom/plugins/claude-status/
├── claude-status.plugin.zsh          # entry point — loaded by oh-my-zsh
├── statusline.sh                     # Claude Code statusline renderer
├── latency-common.sh                 # shared cache reader (p10k + statusline)
├── latency-sampler.sh                # background ping script
└── com.giorgiosaud.claude.latency.plist  # launchd job template

Two JSON files are written at runtime:

~/.claude/latency_log.json       # rolling 30-day history
~/.claude/latency_cache.json     # latest result, read by the prompt

Part 1: Measuring Latency

The sampler script (latency-sampler.sh) makes an unauthenticated HTTP POST to api.anthropic.com/v1/messages using curl. The server returns a 401 immediately — no API key needed, no tokens consumed — but the full TCP + TLS + HTTP stack is exercised:

curl_time=$(curl -s -o /dev/null -w "%{time_total}" \
  --max-time "$TIMEOUT" \
  -X POST \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-haiku-4-5-20251001","max_tokens":1,"messages":[{"role":"user","content":"h"}]}' \
  "https://api.anthropic.com/v1/messages" 2>/dev/null)

if [ $? -eq 0 ] && [ -n "$curl_time" ]; then
  ms=$(python3 -c "print(int(float('$curl_time') * 1000))")
fi

This measures the full HTTP stack — TCP handshake, TLS negotiation, and the server’s time-to-first-byte — making it a better proxy for API load than a bare TCP ping.

Part 2: Building a Baseline with Statistics

A single measurement tells you the current latency. To know if that’s fast or slow, you need context. The sampler builds a rolling baseline using the last 30 days of samples taken at the same hour (±1) on the same weekday:

samples = [
    e["ms"] for e in log
    if e["weekday"] == current_weekday
    and abs(e["hour"] - current_hour) <= 1
    and e["ts"] != current_ts
    and e["ms"] is not None
]

if len(samples) >= 30:
    baseline = statistics.median(samples)
    stdev = statistics.stdev(samples)
    warn_threshold = baseline + stdev       # ~84th percentile
    peak_threshold = baseline + 2 * stdev  # ~97.5th percentile

Rather than fixed ratio multipliers, the thresholds are expressed in standard deviations from the median. This adapts automatically to your network environment — a fast connection and a slow one will have appropriately different thresholds.

Level Thresholds

Condition	Level
ms ≤ baseline + σ	normal
baseline + σ < ms ≤ baseline + 2σ	warn
ms > baseline + 2σ	peak
curl timeout	unavailable
fewer than 30 samples	normal (no level indicator)

The minimum sample count is 30 (not 3) to ensure the standard deviation estimate is stable before it’s used for decisions.

Log Management

Every run appends one entry and prunes entries older than 30 days. At 1-minute intervals the log grows quickly, but the weekday+hour filter window means the effective working set for any baseline calculation is much smaller.

log.append({"ts": now, "ms": ms, "hour": hour, "weekday": weekday})
log = [e for e in log if e["ts"] >= cutoff]  # cutoff = now - 30 days

Part 3: Running It in the Background with launchd

On macOS, launchd is the right tool for recurring background jobs. It runs even when no terminal is open, survives reboots, and doesn’t require cron.

The plugin ships a plist template with __SAMPLER_PATH__ and __HOME__ as placeholders. The sampler runs every 60 seconds:

<key>Label</key>
<string>com.giorgiosaud.claude.latency</string>
<key>StartInterval</key>
<integer>60</integer>
<key>RunAtLoad</key>
<true/>

The plugin entry point renders the template with real paths and loads it silently on every shell start (idempotent — skips if already loaded):

sed \
  -e "s|__SAMPLER_PATH__|$_claude_sampler|g" \
  -e "s|__HOME__|$HOME|g" \
  "$_claude_plist_src" > "$_claude_plist_dest"

if ! launchctl list "$_claude_plist_label" &>/dev/null; then
  launchctl bootstrap "gui/$(id -u)" "$_claude_plist_dest" 2>/dev/null \
    || launchctl load "$_claude_plist_dest" 2>/dev/null \
    || true
fi

No manual setup required — install the plugin, reload your shell, and the sampler starts running.

Part 4: The Shared Cache Reader

Both the Powerlevel10k segment and the statusline renderer need to read the same cache file and resolve icons. Rather than duplicating that logic, a shared latency-common.sh sets three environment variables:

# latency-common.sh
claude_latency_read() {
  CLAUDE_LATENCY_LEVEL=$(jq -r '.level // "normal"' "$CLAUDE_LATENCY_CACHE")
  CLAUDE_LATENCY_MS=$(jq -r '.ms // ""' "$CLAUDE_LATENCY_CACHE")

  case "$CLAUDE_LATENCY_LEVEL" in
    normal)      CLAUDE_LATENCY_ICON=$(printf '\xef\x83\xa7') ;;   # U+F0E7 bolt
    warn)        CLAUDE_LATENCY_ICON=$(printf '\xef\x81\xb1') ;;   # U+F071 warning
    peak)        CLAUDE_LATENCY_ICON=$(printf '\xef\x88\x9e') ;;   # U+F21E heartbeat
    unavailable) CLAUDE_LATENCY_ICON=$(printf '\xef\xae\xa4') ;;   # U+FBA4 shield
  esac
}

Icons are encoded as raw UTF-8 byte sequences via printf rather than embedded literals — this avoids the invisible zero-width character issue that occurs when editors silently mangle Nerd Font codepoints.

Part 5: The Powerlevel10k Segment

The prompt segment sources the shared reader and uses foreground-only coloring (no background fill — cleaner on transparent terminals):

function prompt_claude_latency() {
  claude_latency_read || return

  local fg
  case "$CLAUDE_LATENCY_LEVEL" in
    normal)      fg=76  ;;   # green  (matches VCS_CLEAN)
    warn)        fg=178 ;;   # yellow (matches VCS_MODIFIED)
    peak)        fg=160 ;;   # red    (matches STATUS_ERROR)
    unavailable) fg=66  ;;   # grey   (matches TIME)
  esac

  if [[ -n "$CLAUDE_LATENCY_MS" && "$CLAUDE_LATENCY_MS" != "null" ]]; then
    p10k segment -f $fg -i "$CLAUDE_LATENCY_ICON" -t "${CLAUDE_LATENCY_MS}ms"
  else
    p10k segment -f $fg -i "$CLAUDE_LATENCY_ICON"
  fi
}

The segment auto-registers itself via a precmd hook — no manual .p10k.zsh edit required:

function _claude_status_register() {
  if (( ${#POWERLEVEL9K_RIGHT_PROMPT_ELEMENTS} )); then
    if [[ ${POWERLEVEL9K_RIGHT_PROMPT_ELEMENTS[(Ie)claude_latency]} -eq 0 ]]; then
      POWERLEVEL9K_RIGHT_PROMPT_ELEMENTS+=(claude_latency)
    fi
    add-zsh-hook -d precmd _claude_status_register
    unfunction _claude_status_register
  fi
}
autoload -Uz add-zsh-hook
add-zsh-hook precmd _claude_status_register

The hook fires once on the first prompt, appends the segment, then removes itself.

Color note: Use literal 256-color numbers with -f, not variables. Passing $POWERLEVEL9K_* color variables to p10k segment doesn’t work for custom segments.

Part 6: The Claude Code Statusline

Claude Code’s built-in statusline shows context window usage and rate limits. The plugin adds a richer statusline that integrates all of this alongside the latency indicator, git branch, and model name:

# Reads JSON piped from Claude Code on stdin
cwd=$(echo "$input"    | jq -r '.workspace.current_dir // .cwd // ""')
model=$(echo "$input"  | jq -r '.model.display_name // ""')
used_pct=$(echo "$input" | jq -r '.context_window.used_percentage // empty')
five_pct=$(echo "$input" | jq -r '.rate_limits.five_hour.used_percentage // empty')

Progress bars are rendered in pure bash:

make_bar() {
  local pct=$1
  local filled=$(( pct * 10 / 100 ))
  local empty=$(( 10 - filled ))
  local bar="" i
  for (( i=0; i<filled; i++ )); do bar="${bar}█"; done
  for (( i=0; i<empty;  i++ )); do bar="${bar}░"; done
  printf "%s" "$bar"
}

The final line looks like:

user@host  ~/projects/myapp  main  claude-sonnet-4-6 |  [████████░░] 82% |  [███░░░░░░░] 31% |  45ms

claude-status-register

This writes the statusLine key to ~/.claude/settings.json automatically via jq.

Gotchas I Hit

Nerd Font bytes not written correctly. When the agent wrote icon codepoints to the shell script, they were stored as invisible zero-width characters. The fix was using printf '\xNN\xNN\xNN' with explicit UTF-8 byte sequences instead of embedding the characters directly.

p10k segment colors need hardcoded numbers. Passing variable values to -f or -b doesn’t work for custom segments. Use literal 256-color numbers: -f 76.

Background vs. foreground coloring. Using -b (background fill) looks cluttered on transparent terminal backgrounds. Foreground-only (-f) blends better and still matches p10k’s standard color palette.

/dev/tcp is unreliable for timing in bash. curl -w "%{time_total}" is precise and portable across macOS bash versions.

TCP ping underestimates API load. A bare nc TCP handshake only measures network RTT — it doesn’t reflect TLS negotiation or server-side processing time. The curl approach captures the full stack and correlates much better with actual API latency under load.

Installation

Clone or copy the plugin into ~/.oh-my-zsh/custom/plugins/claude-status/
Add claude-status to your plugins=(...) list in your zshrc
Run exec zsh — the launchd job and p10k segment install themselves
To add the Claude Code statusline: run claude-status-register in your terminal

After ~60 seconds (RunAtLoad fires), ~/.claude/latency_cache.json will exist and both the p10k segment and statusline will show live data.

What’s Next

Color-coded ms text — yellow/red on the ms value itself when latency exceeds thresholds, even before a full baseline exists
Historical chart — claude-latency-history command to visualize the rolling log in the terminal
Notification on peak — osascript alert when ratio crosses 2σ during active work