Tag Archives: infrastructure

Nutanix Deployment Delight with “Zero-Touch” Foundation Central

With remote working now the new normal, it is challenging to send skilled IT professionals to data centers to install new equipment. Although Nutanix clusters have always been quick to install and configure, there was still a requirement to send a trained IT technician to site to run the Foundation software for deployment, usually connected to a local laptop. For large-scale or multiple sites, this can be a costly exercise.

How could we make this even easier in a ‘work from home’ world? 

With the launch of Nutanix Foundation Central 1.0 with Prism Central 5.17, this specialist requirement is now removed. 

Zero-touch deployments are now a reality for factory-shipped appliances from Nutanix, Lenovo, HPE, Dell, Fujitsu, and Inspur…. all will be Foundation Central ready out of the box. 

fc-blog-1

Nutanix Foundation Central is a service on Prism Central 5.17+

Foundation Central (FC) is an at-scale orchestrator of Nutanix deployment and imaging operations. After the initial network prerequisites are met, new nodes ordered from the factory can be connected to the network and receive Foundation Central’s IP address via DHCP assignment. Since nodes are shipping from the factory, they will have either a functional CVM (running the CVM Foundation service) or DiscoveryOS (for NX International shipments) inbuilt.  

The nodes (no matter the location) send their “I’m ready” heartbeat status to Foundation Central. 

“Location” can be within the same Data Center and/or a remote site as an example. 

Once the nodes are detected by Foundation Central, the administrator can create a deployment task and then send that task to the locations and the configuration job is conducted by the nodes themselves. The nodes send their job status back to Foundation Central for the administrator to monitor. 

fc-8node-01home

Foundation Central Home Screen with 15 discovered nodes. Nodes can be running different AOS and/or Hypervisor and can be re-deployed into clusters with a new AOS/Hypervisor of choice

Foundation Central never initiates a connection to the nodes. The nodes are the source for all communications and are awaiting to receive their orders on what task to do. 

fc-blog-2

Unconfigured nodes send heartbeats to Foundation Central

Foundation Central receives these node heartbeats and then will display the nodes as available to be configured. By default this could take up to 20 minutes to appear in the Foundation Central UI. Heartbeats are sent until the nodes are configured and part of a formed cluster. 

fc-blog-3

Unconfigured nodes send requests to FC and receive their configuration orders

Foundation Central is only receiving status updates until job completion. It receives these status updates from the coordinating node.

After a successful configuration/re-imaging process is done on one node, the original ‘coordinating node’ hands over to that new 100% completed node, and now this node takes over as the new (permanent) coordinating node for the remaining nodes.

If re-imaging to a different AOS or Hypervisor is required, Foundation Central will ask for the URL where these images can be found. These can be anywhere on the network, but given the file size it is recommended they be local to the nodes where possible. 

fc-blog-8

Changing AOS and Hypervisor Type if required

Once the administrator configures the Foundation Central jobs as desired, Foundation Central will await the specified nodes to request their configuration task. 

fc-blog-4

Imaging and Configuration tasks are always local to the nodes/location

Configuration tasks are then fully handed off to the local nodes and Foundation Central becomes a ‘passive partner’ in the process from here. The nodes elect a ‘coordinating node’ for the entire operation and will be responsible for keeping Foundation Central updated on the status of the tasks.

fc-blog-9

Deployment Complete in parallel with different AOS/Hypervisors no matter the location

 

Foundation Central Requirements 

  • Nodes must be running Foundation 4.5.1 or higher (bundled with a CVM or DiscoveryOS). It is advisable to run the latest Foundation on the nodes
    (upgrade using APIs is very easy before imaging) 
  • Networking requirements must be met (see below)
  • Prism Central must be minimum version 5.17  

Networking and DHCP Requirements to use Foundation Central

The factory nodes need to be connected to a network that has a DHCP scope defined which allows for specific DHCP options. This is to ensure the nodes automatically receive the Foundation Central IP address and API keys specific to your environment. 

  • DHCP server must be present and must be configured with vendor-specific-options:
    Vendor class: NutanixFC
    Vendor encapsulated options (DHCP option 43):

    • Option code: 200, Option name: fc_ip
    • Option code: 201, Option name: api_key
  • L3 connectivity from remote nodes to Foundation Central must be available 
  • L3 connectivity between ‘starting’ and ‘destination’ subnets if IP addresses are to be changed as part of the node configuration process
  • Remote CVMs must be configured in DHCP mode (default from factory) 
fc-blog-6

Option 1: Nodes are discovered and configured to remain in the same subnet

The factory nodes need to be connected to a network that has a DHCP scope defined which allows for specific DHCP options. This is to ensure the nodes automatically receive the Foundation Central IP address and API keys specific to your environment. 

fc-blog-5

Option 2: Nodes will be re-deployed to a different subnet. DHCP is not required for the 2nd subnet.

For more information contact your Nutanix SE or check the Foundation Central Guide on the Nutanix Support portal.

Special thanks to Foundation Engineering (Toms, Monica, Toshik, YJ and extended team) for the development of Foundation Central…they are already working on some improvements for v1.1 !

Please reach out with any feedback or suggestions and I trust this helps in making your working-from-home life a little easier.

The Boiled Frog in Enterprise IT

Ask yourself when the last time you went to www.google.com and the search results were “slow”. Indeed, have you noticed any real slowness over the last say 10 years?

I haven’t. Obviously the number of indexed web pages has gone from say 1 billion to well over a trillion in that time but the end user is still presented with search results as fast as ever (if not faster!). Scale is not a problem for Google.

Now look at your own Data Centre. I bet over the same period of time you’ve experienced growth and therefore scaling issues, especially with regard to compute and storage resources. Lots of money spent – possibly different vendors over that time and millions of dollars spent, users complain of ‘slowness’ and the infrastructure gets blamed….and you realise these issues are not going away anytime soon!

So why can Google get it right but Enterprises continue to struggle with this fundamental problem?

I could be flippant and say that the traditional storage vendors (I’ll pick on them due to the massive costs they impose on Enterprises) have no interest in improving the situation. Sure, they can say they will add Flash drives (PCIe or SSD based) to their arrays and the performance of their arrays would no doubt improve, but this is the same old dance you’ve done before (and of course they’ll charge you for it). The real problem is that the old architecture is being re-sold to you each time, so it the same problem will always reappear a few years down the track.

When your new flash array fills up or the controller hits it’s limit then you have to repeat the process again and get newer drives or more capacity controllers (forklift upgrade!)…..and the storage vendors love it don’t they! More of your cash on hardware and on professional services. The poor old Enterprise IT Manager has been the frog that has been boiled slowly over the years. Spending a LOT of money with your storage vendor has somehow become the norm and expected.

Remember too that even if you upgrade your array, your servers (compute) are still separated from it via a network of some kind. Reads and writes have to go over this network. To quote an old adage from the network world “It’s the latency, stupid!”. Latency kills performance. That will never change. So the only real thing you can do is to minimize it. So, not only do you have a latency issue, you also have to re-buy servers too when the time comes.

It does not have to be like this.

Google overcomes the latency issue by moving the storage and compute to the same ‘layer’. They ‘converged’ the compute and storage – essentially eliminating the network latency between those components. Clever. Then they scaled. Done.

Ask yourself why web players like Google don’t bother with SAN arrays. To put it simply they could see a long time ago that the above limitations and failings of that architecture. The scaling problems and cost implications were obvious too.

With regard to your own Data Centre – do you want your Data Centre to continue to look like 1990s technology like this?

olddc1

…Or would you like it to look something along the lines of a Google-like Data Centre?

googledc

 [Source: http://www.google.com/about/datacenters/gallery/#/tech]

Google had some very smart little vegemites working for them to come up with a file system that could scale to whatever they needed….and they did it well. To make it work for them they had to write their own custom applications to work on the infinitely scalable file system they created.

Of course, your own Enterprise apps within the Data Centre are not able to be re-written – but there is another way to take advantage of this architecture and bring it to the Enterprise and break out of the ‘boiled frog’ paradigm.

Imagine if someone could take the same Google-like idea, run it on commodity x86 servers, use a virtualisation hypervisor which we all know (eg. vSphere), combine all the storage per server in a single (if you want to) NFS datastore which scales infinitely – just keep adding more servers when you need to? Then you could run your apps on it (via your virtual machines) and never have to worry about forklift upgrades ever again.

If you run out of compute, add more nodes. If you ever run out of storage, add more nodes. Because the storage controllers are now in software, they scale as well at the same time…if the controller seems overwhelmed, add more vCPU or vRAM or add another node to share the burden.

Each node (or server) brings more and more resources to the cluster – quickly, easily and at a known cost and performance line. This is what the Enterprise has been waiting for….a system built from the ground up for virtualisation that once in place will scale forever only when you need it. Suddenly, your engineers can not worry about ‘business as usual’ tasks managing a data centre infrastructure and instead can start diverting their brain power to innovate in other areas – your IT department could be viewed as an enabler to the business rather than a cost drain. When you don’t have to worry about how something runs, only then you can improve. You just want to ‘turn a key’ and it ‘goes’.

Suddenly your data centre will start to look like Google and you’ve jumped out of the boiling water.

So, if you are still investing in stand-alone storage arrays in 2013 then you are suffering on two fronts: 1) Re-investing in old 1990s/2000s architecture and 2) missing the performance benefits, cost savings and scalability of converged compute and storage solutions bring.

Also, don’t be fooled by the big vendors pushing their ‘converged’ solutions. If they have any component that is a ‘hardware SAN’, they are simply taking what they sell individually now and trying to sell it to you in a different form factor…bundled with other compute and network items. Remember – if the compute component is in a different part of the ‘rack’ then it isn’t really converged is it?

If it still looks like the old array/data centre photo as above, then it is the same old dog…and you will still be the same old boiled Enterprise IT frog….as I was in my previous IT life before joining Nutanix.

I should point out that individually those old components from the big vendors are fine (as per the old architecture model) – my point is that when they try to shove it all together in one ‘rack’ and try to convince you it is ‘converged’ they are being sneaky and certainly not Google-like.

If you want to differentiate yourself from the competition and take advantage of the Software-Defined Data Centre – or SDDC – (mirroring the Google model) then you really need to look at Nutanix for any virtualisation project (server infrastructure, big data, VDI). By the way, some of those Google smart little vegemites I mentioned …they now work for Nutanix creating the software smarts to make this a reality.

Now in 10 years’ time, you will look back and realise your scaling issues have gone, performance never dropped and you saved money. You became your own Google in your own Data Centre.

Awesome.

The Rise and Rise of Nutanix

It’s a pretty exciting time in IT right now, but it has been a struggle recently it is fair to say.

My place of work has gone through several restructures in recent years resulting in headcount reductions. The standard rate of employee turnover was still there and continues to be. Additionally, the current economic environment means cost cutting is the new norm for IT budgets; an yet the modern data centre has some very complex and expensive systems that allow the business function as it should – and costing some serious coin to run!

All these things are combining to place pressure on the remaining staff to continue to deliver the IT outcomes the business has come to expect (perhaps even take for granted). But what about innovation? When is BAU and ‘keeping the lights on’ not going to be enough? Does IT management simply expect to keep the same old technology and renew maintenance contracts for these expensive systems without paying a much bigger price in say 2-3 years?

I was pondering this about 8 months ago when I saw where this trend was heading – and I didn’t like it. Something had to change.

The data centre world is converging in terms of technology *and* personnel – and it will benefit both the employee and business as a result. Good people are hard to find and even harder to keep. Consequently, the traditional idea of skill ‘silos’ must start to break down to drive efficiency (and to overcome the fact that there are less people to do the same work) but with the side effect that staff who are keen to learn new skills will get the opportunity to expand their knowledge – perhaps in areas they had not previously considered. Personally, I’ve never wanted to be ‘siloed’ (is that even a word?), nor wanted to be a one-skill engineer. All the best people I’ve worked with were of the same mindset.

While the people side of the equation is important, a lot of this is ultimately up to the individual to have the drive and ability to improve. The business can certainly help by creating a good environment – but that is a deep subject for another time. The point is that as much as the technology will converge, so too must people. Perhaps it is the age of the technology ‘generalist’ who can get the best out of the equipment for a reliable, functioning solution for the end-user within the business (so they can get on with making money).

But what about the technology?

I decided to find out about the technology options around ‘converged infrastructure’. The same-old way of providing compute, storage and networking really wasn’t delivering benefits in terms of cost and certainly not simplicity for the administrator. EMC/VMWare have been saying that they want to see one sysadmin for every 10,000 VM’s but with the current crop of technology from the big vendors I really could not see how that could be a reality.

Then I stumbled on a company that I thought might have the answer.

I’ve been following the rise and rise of Nutanix for a while now. I think they are on a winner. I’m very partial to their primary message of “NO SAN”, if nothing else that it promises to help eliminate a massive recurring and yet growing part of data centre budgets. No one is reducing their storage footprint even after staff are cut! When has your storage budget ever reduced in the last 10 years? Do you expect anything different? Didn’t think so. So when I saw a company saying they could ‘eliminate the SAN’ from your data centre I was going to pay attention.

Nutanix offers an all-in-one appliance. Think Cisco UCS for the compute side, but also heaps of storage integrated into the same appliance. And it has been designed from the ground up for virtual environments. And it provides a SAN-like distributed storage for that environment. And it is easy to administer (could this finally realise EMC/VMWare’s vision?). And it all comes in a 2RU block. And it has inbuilt data redundancy. And – well… I was keen to check it out.

If you’ve never heard of Nutanix, head over to their web site and view their 90 second summary video.

Last week I got my hands on our first Nutanix block for our End User Computing trail – one of the first in Australia. I’m proud to be involved.

A few things have impressed me on the Nutanix journey thus far.  I’ll be going into detail here in the coming weeks, but as a summary:

  1. Simplicity of the administration/setup of the block – up and running in minutes
  2. From sales to support staff, there is a level of responsiveness and eagerness to please I have not seen from a vendor in a long long time
  3. The technology solution is elegant, simple and priced well especially when you consider you get the storage included. It is modular, scales out extremely well if not infinitely (according to their whitepapers)

Nutanix is the duck’s nuts. Do yourself a favour and check them out. I’m lucky enough to be visiting their headquarters in San Jose to get the full story via some technical training and I’m looking forward to meeting some of the smart people behind it all.

I think it will be like visiting the future.