Ode to the External Node Classifier (ENC)
External Node Classifier, how do I love thee? Let me count... There has been a great deal of attention lately being paid to the backends that are available in my current configuration management tool of choice, Puppet. I'm sure Chef must have some similar types of constructs. The buzz is about Hiera, which is a pluggable hierarchical database for Puppet. Which means that when Puppet is looking up information about a node, it can look in multiple places. I think this is a great idea, and at Tagged we have been using something similar that we are in love with for a few years, the External Node Classifier (ENC).
What the ENC allows us to do is make a call to our centralized management database (CMDB) for each host that calls in for Puppet configuration. We return a bit of YAML from our Perl script, and Puppet uses that information to configure the node. Click on the link above to find out more about how it works. The powerful thing about this mechanism, is that we can return almost anything we want for Puppet to use. Each variable that we return in the YAML can be used as an actual variable in our Puppet manifests. This is what's so amazing about the ENC, it allows us to organize our network of hosts however we want it, with almost no preconceived notions of what we are going to want to build next (within reason of course).
---
classes:
- web
environment: production
parameters:
SpecId: 6
appType: web
cabLocation: 98b
cageLocation: 34
consolePort: 19
cores: 8
cpuSpeed: 2.53GHz
ganglia_cluster_name: Web
ganglia_ip: 172.16.11.34
ganglia_ip2: 172.16.11.35
ganglia_port: 8670
gen: 4
portNumber: 19
vendor: Dell
Sure we could pre-allocate large chunks of the network for web servers, and another chunk for security services. But what if we guess wrong? What if they don't all fit into a /24, or a /23. What if we allocate a /28 but it turns out we need a /25? This is a problem. It gets worse if you consider that you actually don't want your memcached servers to be in a different subnet than your web servers. In a datacenter environment, latency is important, and layer 2 is the only way to go for some applications. Routing will kill you.
So, what to do? Enter the ENC. Our ENC returns lots of information, but at a high level, it returns our Puppetclass and what we call an AppType, which is like a subclass. For example, the Puppetclass may be web and the AppType imageserver. Now, I can actually slice my hosts any way I want to. The same group of engineers should be able to login to the imageserver hosts? No problem, distribute an access control file based on AppType. All the imageservers should get the same Apache configuration? Again, not a problem. If an imageserver and a PHPserver are in sequential IPs, it does not matter. If they have a memcached host situated on an IP in between? Again, not a problem. Puppet will take care of ensuring each host gets the proper configuration.
But it actually get's better. Using the ENC, we can actually group hosts any way that we imagine. One thing we use very heavily at Tagged is Ganglia. Very simply, we could map Ganglia clusters to AppTypes. We don't even need to simply return the AppType, we could actually return the Ganglia configuration for each host and plug that into a Puppet ERB template. This is where it gets interesting. We actually combine multiple AppTypes into Ganglia clusters in some cases. For example, our security group has all kinds of different applications that they use to keep our users safe and secure. Some are on one server, some are on many, but it is very unlikely that our security group needs to get a large "cluster-wide" view of an application tier. Very often they are looking at the performance of individual hosts. If we were segmented by IP address, we would have to guess how many applications they would develop over some arbitrary time period. If we were segmented purely on AppType, we might have 10 different Ganglia clusters with one or two hosts each.
But because of the power of the External Node Classifier, we can actually slice and group our network of hosts any way that we choose, in ways that serve our purposes best. When we changed from collecting our system information from Ganglia gmond to Host sFlow, it was literally a change to a few variables and templates, and within 30 minutes, we had a completely different monitoring infrastructure. It was that simple.
If you haven't looked at the more capable backends to Puppet or your current configuration management tool of choice, you should. Just like "infrastructure as code", a little up front hacking, goes a long, long way.
Posted by Dave Mangot in Applications at 20120523Search This Site
Recent Entries
- DevOpsDays 2012: "Event Detection" Open Space
- DevOpsDays 2012: "Logging" Open Space
- Ode to the External Node Classifier (ENC)
- I'm speaking at Velocity 2012!
- Host-based sFlow: a drop-in cloud-friendly monitoring standard
- Graphite as presented to the LSPE Meetup 16 June 2011
- The Graphite CLI
- Back on the Blog Gang
- A framework for running anything on EC2: Terracotta tests on the Cloud - Part 1
- A Trade Show Booth: Part 2 - The Puppet Config
- Intstalling Fedora 10 on a Mac Mini
- A Trade Show booth with PF and OpenBSD
- EC2 Variability: The numbers revealed
- Linksys WET54G, a consumer product?
- Choosing Zimbra as told to ex-Taosers@groups.yahoo
- Information Security Magazine Chuckle
- A SysAdmin's impressions of MacOS Leopard
- Worlds collide: RMI vs. Linux localhost
- Hello World