Error Handling Policy
Exception Catching Rules
1. Catch Specific Exceptions for Known Operations
When calling a known external service or performing a specific I/O operation, catch the specific exception type for that operation:
| Operation | Specific Exception(s) |
|---|---|
| gRPC calls | grpc.aio.AioRpcError |
| Redis operations | redis.RedisError |
| HTTP requests (httpx) | httpx.RequestError, httpx.HTTPStatusError |
| WebSocket sends | WebSocketDisconnect, ConnectionError, RuntimeError |
| File I/O | FileNotFoundError, OSError |
| JSON parsing | json.JSONDecodeError, ValueError, KeyError |
2. Log with Exception Type Context
When catching specific exceptions, log details that aid diagnosis:
- gRPC: Log
e.code().nameande.details()to capture the gRPC status code. - Redis: Log the exception type and message.
- HTTP: Log the status code for
HTTPStatusError, the request URL forRequestError.
3. Keep except Exception as Safety Nets Only
Generic except Exception catches are allowed only in:
- Event loop top-level: The main
while Trueloop in tick processing, WebSocket event streaming, and similar long-running loops. These prevent a single unexpected error from crashing the service. - Startup/shutdown cleanup: Where failure should be logged but not propagate (e.g., closing connections during shutdown).
- CLI boundaries: Top-level entry points where any unhandled exception should produce a user-friendly error.
In all other locations, use specific exception types.
4. Never Silently Swallow Exceptions
Every except block must either:
- Log the exception (at minimum
logger.errororlogger.warning) - Re-raise the exception
- Return an explicit error value
The only exception is cleanup code (e.g., closing WebSockets during shutdown) where logging would be noise.
This rule also applies to client-side JavaScript: .catch(() => {}) on promise chains must at minimum use console.warn to log the failure.
5. Preserve except HTTPException: raise Guards
In FastAPI route handlers, when except Exception follows code that may raise HTTPException, always include except HTTPException: raise before the generic catch to avoid converting client errors into 500s.
6. Guard Uninitialized Resources
Service classes that hold connection pools, clients, or other resources that require explicit initialization (e.g., Database._pool, gRPC channels) must guard access with a null check. If the resource is None, raise a descriptive RuntimeError rather than allowing an opaque AttributeError to propagate.
# Good: Explicit guard with descriptive error
if self._pool is None:
raise RuntimeError("Database not connected. Call connect() first.")
async with self._pool.acquire(timeout=5) as conn:
...
# Bad: No guard — raises AttributeError: 'NoneType' object has no attribute 'acquire'
async with self._pool.acquire(timeout=5) as conn:
...
This applies to all methods that use the resource, not just the first one. A private helper method _ensure_connected() may be used to centralize the check.
Examples
Good: Specific gRPC catch
try:
response = await stub.ProcessTick(request, timeout=5)
except grpc.aio.AioRpcError as e:
logger.error("ProcessTick RPC failed", code=e.code().name, details=e.details())
raise HTTPException(status_code=502, detail="Physics service unavailable")
Good: Safety net in event loop
while not shutdown_event.is_set():
try:
await process_next_event()
except asyncio.CancelledError:
break
except Exception as e:
logger.error("Event loop error", error=str(e))
await asyncio.sleep(1)
Good: Safe Redis value parsing with defaults
# Redis returns all values as strings. Corrupted data must not crash the handler.
try:
tick = int(maneuver.get("started_tick", 0))
except (ValueError, TypeError):
logger.warning("Invalid Redis value for started_tick", raw=maneuver.get("started_tick"))
tick = 0
try:
inclination = float(maneuver["target_inclination"])
except (ValueError, TypeError):
logger.warning("Invalid Redis value for target_inclination", raw=maneuver.get("target_inclination"))
inclination = None
Rule: All int() and float() conversions of Redis string values must be wrapped in try/except (ValueError, TypeError) with:
- A sensible default (0 for ints, 0.0 for floats, or None where the field is optional)
- A
logger.warningcall identifying the field and raw value
This applies to both routes.py (WebSocket message handlers) and websocket_manager.py (event loop and state broadcasting).
Bad: Generic catch hiding error type
try:
response = await stub.GetStatus(request, timeout=5)
except Exception as e: # Hides whether this is a network error, timeout, or bug
logger.error("GetStatus failed", error=str(e))