Automation

Network Automation – Shifting How We View Networks

John Partee, NTS Full Stack Machine Learning Engineer 

Cloud services are exciting and new, but many organizations running traditional on-premises networks feel left out. Software-defined networking is a potential solution for some folks, but network engineers have not seen the same sort of disruptive innovation become mainstream, particularly for those supporting air-gapped networks.

We’ve spent the last few months crafting a methodology that would allow us to scale our local networks as easily as the cloud. We’ve done so with a documentation-driven process, security understood to be a default instead of a goal, with far fewer permanent “temporary” solutions.

On top of the technical benefits, we can do most of this work with a small budget and a short learning curve.

First, we need to shift our perspective.

I was a network engineer. I spent eight years working across good and bad government networks, secure and nonsecure, with many “secure” networks that were secure in name only. There were a few big problems on our isolated networks, most of which came down to organizational culture.

Networks are precious! They are the backbone of IT, and we should treat them as members of the organization that we love and care for. Every device was a bit different, every device had slightly different security policies and controls, and every device existed in some degree of loving disrepair. When inspection time came, we would spend weeks meeting compliance requirements, which were then quickly forgotten the week after when urgent problems arose elsewhere.

This world is the norm for many infrastructure shops. This way of doing this is tenable until we need to hire more people to manage our increasingly complex network.

Does it have to be this way? Enter infrastructure as code.

“But my network engineers aren’t programmers!” I know reader, and neither was I. In programming, the idea of “abstraction” loosely means making things more straightforward to use.

There are two abstractions I’ll discuss specifically, NetBox and Ansible. Those two, in particular, made my team’s lives a lot easier. One of our clients manages a few hundred devices this way, with just a single actual Python script that their team doesn’t have to alter. The rest works via Ansible automation and only requires changing the documentation in NetBox to meet their customer’s needs!

Now for the shift in perspective. If we looked line-by-line, how different are your device configurations?

A lot of network engineers I’ve worked with scoff at this question. What do they share? Authentication configuration? Probably. Logging? Access control lists? A lot is shared from your data center to the farthest-flung customer access switch.

How about a building distribution switch and a local access switch? It’s likely that most of the configuration is shared. Chances are the amount truly shared between devices is lower than security would like as well.

How can we simplify the management of these devices?

We can start to solve problems by pulling out the parts of a configuration that are shared. Does your whole enterprise use the same authentication servers? Make a note. Does a branch office use the same access control list? Make a note.

Once all of these similarities are noted, we can start to document these standards in NetBox. The idea is to create a configuration hierarchy! With these overarching standards documented at the appropriate level (sites, tenants, tenant groups), we can spare ourselves from two headaches.

Once we know that the organizational policy is correct, we only have to document the differences between individual devices.

NetBox is your team’s new documentation source. We have no more Visio diagram version problems, no more digging through the shared drive to find the address of a troublesome device, and no more slow IPAM solutions. NetBox is our single source of truth.

With all of that information in one place, what is next? Automation.

The next step is to get the devices to look like the documentation. This may seem backward but stick with me. Typically, we aim to pull all of that documentation into Ansible and build our configurations with it. Ansible can automatically do this on changes in NetBox, or on a set schedule to enforce configuration compliance. Automation pulls network engineers off the command line for simple changes while ensuring fewer problems from configuration mistakes.

This process might seem slower on the surface, but think about how much time is spent updating documentation, closing tickets, and sending emails? Once port activations are automated in the system, why not let users request port changes in Remedy or ServiceNow? The documentation gets updated for us, the ticket queue shrinks, and customers get results faster.

What about our network engineers? Don’t they get less experience without hands-on-keyboard? Flatly, no. Do network engineers gain experience from repeating tasks?

We can better use our time by tackling harder problems or training junior members. One of our customers uses automation to stand up simulated network problems in a lab for their junior technicians, bringing them up to speed faster, with less senior input.

And what about legacy equipment? At the time of writing, I have managed devices as old as 12-years-old, with no real issues. It required the use slightly different templates, but templates only have to be written once. Now, we can keep running those old switches until they die, with no real reason to replace them if it has an SSH connection, we can use this methodology to automate it!

WITH AUTOMATION, THE POSSIBILITIES ARE ENDLESS

At NTS we recognize that automation is not a destination, but instead a journey. To learn more about NTS automation service offerings and the best practices for your organization to take on the road of automation success please contact sales@nextechsol.com.