13 - Observability, Focus on What Matters

May 20, 20254 min read

In early-stage startups, engineers often wear multiple hats—acting as full-stack developers, DevOps, and even product managers. We build and test backend services, assuming everything works fine. But without proper observability, subtle issues like perceived API slowness or intermittent errors can go undetected.

That is, until customer support starts receiving complaints—about slow apps, random errors, or pages that fail to load. As engineers, we’re expected to identify the root cause and provide clarity to stakeholders. Sometimes, the problem isn’t even within our system—it could be the user’s internet connection or ISP.

Still, the most important thing is being able to quickly pinpoint the root cause and quantify the impact:
How many users are affected? Which features are failing? Is it hurting conversions or revenue?

The key here is measurability. Every deployment should come with a clear monitoring strategy—tracking not just technical performance, but real user and business outcomes.

Ideally, we should be alerted before customers ever notice an issue. That’s what true predictability looks like.

Measurability

You can’t improve what you can’t measure.
You need visibility into system and user behavior through quantifiable data. To achieve this level of insight, here are a few key areas that matter:

Business KPIs: CVR, Add-to-Cart metrics, Product Impressions, etc.

Tools like Datadog or New Relic can help you instrument your code and set up custom metrics—both on the backend and frontend—to monitor what truly impacts your product and users.

Predictability

No surprises—systems behave as expected under varying conditions. You understand the baseline, detect anomalies early, and anticipate failures.

System SLA vs SLO vs SLI:
API latency (p50/p95/p99), error rate, availability (uptime %, e.g. 99.9%), etc.

Set up alerts to detect problems early—before users start complaining. Ensure these alerts are tied to SLOs and SLAs, focused on real impact, and optimized to reduce noise so engineers can take meaningful action.

These are standard monitoring practices that every team should have. However, in my experience, frontend monitoring is often underestimated. We tend to celebrate when backend latency is under 300ms, assuming that fast APIs automatically result in great user experience. But that’s not always the case.

Backend vs Frontend Latency

Let's take a look at the picture below:

Backend vs Frontend Latency

  • Backend (API) latency measures the time it takes for a request to be received, processed, and responded to by the server.
  • Frontend (client-side) latency covers a broader scope—from when the request is initiated by the client, through network conditions (like internet speed and routing), the backend API, and finally to when the response is received and rendered in the UI.

Even if your backend is blazing fast, users can still experience slowness due to factors like:

  • Large payloads
  • Poor rendering performance
  • Network instability

This is why frontend monitoring is just as critical—it captures the actual user experience from their device, not just what's happening on your servers. From my point of view, frontend monitoring is key to building a great user experience. You can implement custom metrics to capture user behavior, and many analytics tools offer features like:

  • Heatmaps
  • Session replay videos

These help you gain deeper insight into how users interact with your app.

In Conclusion: Track What Truly Matters

To build a great app experience, observability must go beyond backend performance. A comprehensive monitoring strategy should cover all layers of your system and the user journey. Here are the key metrics every team should track:

App Loading Time

  • App Download Latency (e.g., images, JS files)
  • Backend Latency

Backend Errors

  • API Error Rate (4xx/5xx)
  • Timeout or Retry Rate
  • Dependency Failures (e.g., database, external APIs)

Mobile Vitals

  • Application Not Responding (ANRs)
  • Crash-Free Sessions
  • Sessions with Blank Screens (UI failed to render)
  • Slow Rendering / Frozen Frames

Next post, we’ll discuss Feature Monitoring.


© 2025, Built with Gatsby by Andy Wiranata