Helping NYC Power Through Future Disasters

This past month marked the third anniversary of Hurricane Sandy which "doubled failures in US Internet infrastructure", particularly in New York, where the "bullseye of the impact [was] the metro area". Four months after Sandy, 94 businesses in Lower Manhattan were still lacking phone and Internet service.

Many tech companies were not prepared, forcing thousands of residents and businesses to go without Internet and phones for days.  As an aerial network, however, Rainbow Broadband was unaffected, allowing all of its customer with power to continue working without interruption.

Here, Rainbow Broadband’s CTO Tom Martinson shares ways for businesses to be disaster-ready -- and how they can know if their communications providers will survive the next storm.

--

Why is it important for Rainbow Broadband to always be making progress on the strength and reliability of its network for disaster preparedness?

There are several things driving the continuous maintenance cycle that we go through. Most importantly is that we have this basically unattainable goal of perfection that we want to reach. Everything that we do is to get as close to perfect as possible.

So we get as close to perfect as we possibly can, and as technology and software advance that gives us an edge to get a little closer to perfect.

Can you define “perfect” as it pertains to a network?

A perfect network is a network that passes all traffic with no interruptions, no matter what happens. There can be multiple power outages, there might be a fire in one of the locations that damages or even destroys some of your equipment, lightning strikes – whatever you can think of – that causes issues. We want to make sure that those issues are never seen by a customer. We strive to be invisible.

Why is “perfect” impossible?

Because it’s not a perfect world. There’s the amount of time it takes to detect an issue until you find a way to bypass an issue, for example. A long time ago, technologies were in the 30-90 second range for how long it took you to detect an issue and find a way around it. Now some of the technology we use, like Ethernet Automatic Protection Switching, are in the tens of milliseconds average.

That’s a slight disruption, and the closer we get to zero, the better it is, with less impact to a customer. The perfect network is one where we can have outages happen all the time: We can take half the network offline and work on it, and the customer doesn’t know that that’s happening. They’re never aware that we’re doing maintenance.

So you’re talking not just about improvement of the network for disaster preparedness, but during normal conditions as well?

Those two topics are inseparable. How do you know if your disaster recovery is going to be efficient? You need to use the tools during your maintenance cycles, because that’s how we know if the tool is sharp and usable.

As an analogy, it would be terrible if you went to the trunk of your car and your spare tire was flat. So we’ll use the backups during normal conditions. All good networks use their backups periodically, because you want to make sure that your backups are working.

What are the parts of the carrier class network like Rainbow Broadband’s that need to be made fail-safe?

Data centers are one important point. Every one of our data centers has at least two connections in the data center to another data center. Our primary connections to the Internet are connected to three different data centers. If need be we can run a protocol where, should there be a failure between one of our links in the customer, it automatically fails over to the other location, with no change to the customer network.

We also look at power a lot. Our data centers all get multiple grid feeds, and there are generator and battery backups. We have a monitoring system so that we get alerts should there be a power issue. If the issue is at the customer’s location, we’re able to call the customer and ask them about their power situation – often before they even know that they’re having a power problem. We monitor the customer’s power 24/7, as well as our own.

Then there’s the human point. We have multiple people that are cross-trained in different disciplines. So if someone becomes ill, goes on vacation, or is busy working on another issue, it can be handed over to the next engineer in line to take care of.

What are the “a ha” moments that you have, as a network engineer?

That comes when we get to be an adopter of a new service or technology, and we get to do these things faster than other companies because of our corporate culture and our size. We can do engineering very quickly because we have the ability to get all of the decision-makers in the conference room together, and they’re all technically-minded people, right up to our President and Founder, Russ Hamm.

We can go from concept to conceptual design to proof of concept within days. We have a complete replication of an entire data center in our lab. To test ideas, we make ourselves a customer in our lab – our first field trial is with our sales team. They have to really like it before we’re ever going to show it to a customer.

What are the benchmarks you set for yourselves at Rainbow Broadband for performance?

We do the nines: five nines, or 99.999% of uptime is our goal, and then we want to be at six nines. The ultimate goal, of course, is 100%. That’s the perfection that we work for, and there’s not one person here who’s ever satisfied with anything less. That’s a healthy environment to have.