Handling a 3am Orion alert from your phone

Last updated: 2026-05-24

The actual workflow when an Orion alert wakes you up. What to check in the first 60 seconds, what to defer to the laptop, and how to make the call without a desktop session.

The page wakes you up at 3:07am. Critical alert: a core switch interface is down. By the time you've found your phone, your brain has run through three scenarios — flapping link, hardware failure, somebody-tripped-over-a-cable — and you need data to pick between them. You have about 60 seconds before you commit to either "I'll fix this from here" or "I need to get to the laptop."

This post is about the 60 seconds.

Step 1: Open the alert in your monitoring tool

Whichever app gets there first — PocketNOC, the OnPage page that arrived a few seconds earlier, a Datadog notification, an email — the goal of step one is the same: read the actual alert text, not your prediction of it.

The alert text in Orion tells you which node, which interface, which alert rule triggered, and the timestamp. That's enough to start narrowing. If the alert is on a Gigabit Ethernet customer-facing port on an access switch in a remote office, the response is very different from the same alert on a 100G uplink between two core devices in the primary datacenter.

Step 2: Check the related node, not just the alerted object

The instinct is to open the failing interface immediately. Resist for 10 seconds. Open the node first. If the node itself shows critical, the interface is the wrong target — the node is. Whatever made the node unreachable also made every interface on it appear down. Don't waste a minute working an interface-level problem that's actually a node-level problem.

In PocketNOC: tap the alert → tap the node name in the alert detail → look at node status, response time, and the last 30 minutes of CPU/memory. In the web console: click through to the node detail page. Same shape, more clicks.

Step 3: Check correlated alerts

If three other nodes in the same rack also went unhealthy in the last 5 minutes, you're not looking at a single interface failure. You're looking at a rack-level problem — switch reboot, power event, top-of-rack fabric issue. The right response is "stop chasing the interface and find out what happened to the rack."

In Orion, the alerts list, sorted by recency, is where you spot this. PocketNOC's alerts screen does the same thing on a phone — scroll down a screen and a half. If the page shows "active alerts: 1" you're probably in a localized failure. If it shows "active alerts: 17, all in the last 4 minutes" you're in something bigger.

Step 4: Make the deferral call

After 60 seconds you should know roughly which bucket this is in:

The point of having the tool on the phone is not "I can fix everything from bed." It's "I can decide quickly whether this needs me to get out of bed." The latter is much more valuable.

What this looks like in PocketNOC specifically

When the push fires:

  1. Tap the notification — opens the alert detail.
  2. Top of screen: severity, alert name, node, interface, timestamp, current acknowledgment state.
  3. Tap the node → node detail with status, response time, recent performance chart, related interfaces, recent alerts on this node.
  4. Tap "Acknowledge" if you've decided not to engage right now — writes back to Orion via SWIS, alerts stop re-firing for the same condition.
  5. Back to alerts list — confirm whether this is isolated or part of a larger event.

Whole flow under 90 seconds with practice. The reason the app exists is the difference between that 90 seconds and the 8-12 minutes it takes to boot the laptop, connect to VPN, log into the web console, navigate to alerts, and load the node detail page in a browser.

What this does NOT replace

You're still going to want a laptop for:

The phone is the triage tool. The laptop is the workshop. The on-call engineer who treats them as such — phone first, laptop only when the triage tells them they need it — sleeps more, makes better calls, and burns out slower than the one who reaches for the laptop on every page.

Closing

The point of mobile Orion access isn't to eliminate the laptop. It's to make the decision about whether you need the laptop a 90-second decision instead of a 10-minute one. PocketNOC, OnPage, and the Orion Web Console mobile view all serve that goal in different ways. The one you pick depends on whether your bottleneck right now is the page never arriving (paging tool), or arriving but with nothing to look at (mobile monitoring viewer).

For most on-call rotations, the second one is the next gap to close.

Jason Lazerus — Founder, WeaveHub Technologies — 20+ years network and security engineering