Just under a year ago my employer started off a network refresh program on one of our internal core networks, we’re a type of ISP and this network was our main management core, so all the fairly important traffic goes over here like billing stats, element management traffic (Telnet, SSH and RDP), alarms (Syslog, Corba, SNMP), plus the management traffic from NMS systems communicating with their respective network elements for user service provisioning.
As we’re a service provider the management core is actually very dynamic, because we’re constantly upgrading the network capacity, hardware and features while also normally decommissioning the legacy equipment. As someone working on these projects you often find yourself troubleshooting the core management connectivity more than customer services, good for the customers, but a pain when you real troubleshooting equipment isn’t targeted for this network and is invested in the revenue generating networks.
The original design
The topology was fairly simple about a year ago, each site generally had a multiple of two switches (Cisco Layer 2) to provide local element access into the network with resiliency, then two core firewalls also operating in layer 2 mode, finally there were two Core MPLS routers which connected all sites together via a VPRN/VRF Layer 3 routed service, and these were the Layer 3 gateway for all the element’s/VLAN’s at that site.
Our future plan
As goes with most network upgrades we all sat down and decided over a cup (or two) of coffee what we wanted to get out of this network refresh, management set aside a decent budget for the work, but there wasn’t going to be any drastic changes or upgrades.
Access – The access layer is fairly simple and that’s the way we wanted it to stay, the older switches got replaced to make all the site’s capable of gigabit access, and all the configuration went through an intensive cleanup exercise to get rid of redundant configuration, plus a tighten up of access security and a general sanity check (such as ensuring all the switches are running the same version of spanning tree! no names to be mentioned).
Firewalls – When looking at firewalls there was a general decision to push these up the OSI stack to layer 3, mainly because the MPLS core configuration was getting quite busy with sub interfaces, with this change we also wanted some kind of routing protocol into the core to keep everything fairly dynamic, please no more static routes!
Vendor selection started off tricky but there was a clear winner in the end; historically we were always using Cisco PIX/ASA’s however this time we looked at Palo Alto mainly for the added security features, a lot of the elements on this network run proprietary operating systems where you can’t install any third party software, so having a firewall that could perform anti virus and spot threats on the network (like brute force detection) was very desirable, and when the costing/performance turned out to be within ~4% of the Cisco alternative we couldn’t really say no.
Core – As already mentioned the main change here was removing all of the access interfaces from the configuration, and setting up a routing adjacency (OSPF) with the firewall which was now hosting all of the local gateway addresses for each VLAN. There we’re no changes to the hardware/vendor at this layer as there were no real requirements.
The unexpected benefit
The original plan was fairly simple, to keep the network in support by means of hardware upgrades and cleanup the design slightly by moving routing down one step to the firewalls, at the vendor selection phase we also realised that our security could be greatly improved by going with the Palo Alto firewalls.
However the biggest advantage by far has been the logs that these firewalls create and push back to their management server (Panorama).
By design the firewall will create a start and end log for every session that passes through the firewall, with this you get all of the basic information like IP addresses, timestamps, ports. However you also get more information like URL logging, accurate application identification, packet and byte counts for both Rx and Tx. Plus unlike pushing your logs to a syslog server the management server has really good reporting and searching functionality built in.
The network engineers have fallen in-love with these firewalls purely because of the logs; for example say you have a server (10.1.1.1) at Site A that’s having problems communicating with a server (192.168.2.2) at Site B, in our design we know this traffic flow will pass a minimum of two firewalls (one at the edge of each site). So you simply log onto the firewall management server and search for related sessions “ip.addr in 10.1.1.1 and ip.addr 192.168.2.2”, then as quick as a google search (we’ll may be a few seconds slower) you get a detailed list of all the traffic flows between them two addresses.
In the above example it was fairly simple to see that traffic made it all the way through our core network to the other side, then back into the MPLS core however it never hit the next firewall in the path (proven by a lack of Rx traffic on that first firewall), a quick look at the MPLS core found some old static routes that needed to be fixed, and 3 minuets later we’re back in action, no need for pings, trace routes, packet captures etc…!
Security that works
As mentioned a big initial driver for using these Palo Alto firewalls was the extra security, fortunately when we turned them on we didn’t find a raft of security threats already in the network, in-fact the only threats we really see are people (from the internet and normally in China) trying to breach our DMZ where one of these firewall pair’s sits.
We did once get a SSH brute force alarm from the internal network which the firewalls instantly managed to filter out, however a quick search found that it was from one of our employee laptops (VPN Client), a phone call found out that he was playing around with a python script that logged into one of our core routers to process the configuration for a up and coming network migration, however the wrong credentials in the script and some not so great coding (causing a login loop) triggered a brute force alarm on the firewalls.
It changed us!
We’re finding network engineers are actually preferring to put in firewalls now instead of “simple routers” because of the extra visibility that you can leverage from the network logs, and the security capabilities are blowing our customers minds when you can track security breaches down in a matter of seconds.
As always, comments are welcomed and appreciated!