We had a network issue between August 24, 11:23 UTC and 11:28 UTC. One of the app servers lost connectivity to the database, causing ping processing delay and dashboard outage for 33% of the traffic.
We had network issues between July 3, 23:22 UTC and 23:58 UTC. Packet loss between some of the servers caused some traffic getting rejected, and a queue of unprocessed pings building up on two affected web servers. The rejected traffic and the delayed processing caused a spike of false "down" notifications.