Tag Archives: nutanix

Nutanix Foundation 3.7

Here are some of the improvements in Nutanix Foundation 3.7 which was released on Feb 27, 2017.

Reorder of Blocks via Drag n Drop 

You can now reorder blocks so that they are in the desired positions whilst keeping the IP addressing sequence intact. This is especially useful for large deployments.

Screen Shot 2017-02-27 at 4.28.53 PM.png

Clicking ‘Reorder Blocks’ will show a drag n drop popup

A demo video illustrating the effects of reordering blocks (thanks YJ for the video):

IP Sequencing

Instead of automatic increments of IP addresses by 1, you can use a ‘+’ operand to create the sequence you desire. See below for an example.

Screen Shot 2017-02-27 at 4.33.40 PM.png

Using +2 to auto fill every 2nd IP address in the sequence

Timezone support

You can now optionally select the timezone for your cluster at the Global Configuration page.

Screenshot 2017-03-02 10.27.46.png

Timezone support

Image Selection Page Improvements

The Image Selection Page no longer forces you to upload a new images and instead use the ones already on the detected nodes.  You can also add/delete ISOs from the UI.

Screen Shot 2017-02-27 at 4.39.45 PM.png

Image Selection Page Redesigned

Updating Foundation from UI

You can now use the UI to upload a Foundation tarball and update Foundation itself when new versions are available. It is highly recommended to always use the latest version, so it is easier to update. You can get the latest release notes and Foundation files from the Nutanix Support Portal.

Screen Shot 2017-02-27 at 4.41.43 PM.png

screen-shot-2017-02-27-at-4-42-10-pm

Summary

This is a great release by the Foundation Engineering team at Nutanix with over 70 bug fixes and improvements such as the above included. There are several exciting features coming in future releases (approximately every month) so always check the Nutanix Support Portal for the latest updates. If there is a feature you’d like so see, feel free to contact me.

Nutanix 4.5 Cross-Hypervisor VM Conversion

One of the reasons people stay with a particular type of hypervisor is that it is too hard (or too costly) to migrate to another type. All that drama of converting, testing and making sure all is right and then the risk of having to move back if something went wrong.

Sure, there are separate software tools you can buy to do the conversion for you . . . but what if the virtualisation infrastructure itself – the thing that is actually providing your servers and storage – could do it as an in-built function? What if that could be done just by clicking a few buttons?

So in the demo video below, I take a running Windows VM on a Nutanix Cluster “A” running vSphere and then take a snapshot of it and send it to a second Nutanix Cluster “B” running Nutanix’s own free Hypervisor (AHV) and then start the VM. Job done. Easy.

Here’s the setup:

clustersetup

Basic lab setup using a flat L2 network. Production and DR deployments would use L3 networks – which is fine of course

..and here’s the demo:

For brevity, I cut out the initial one-off processes to set up the Replication. The full process was below (check out the Nutanix Index for articles describing setting up Replication):

1. Setup a Data Protection Remote Site ‘pair’ of clusters (so that they can replicate to each other) and test the connection.

Site A (ESXi cluster)
Site B (AHV cluster)

2. Set up a Protection Domain policy, add the VM you want to be a part of the replication policy and set a schedule.

3. On the Windows VM on ESXi on site A that you want to snap to Site B running AHV, make sure you install the Nutanix VM Mobility drivers MSI from the my.nutanix.com support portal. (These will soon be included in Nutanix Guest Tools (NGT) post Nutanix AOS 4.6 release, so by installing the NGT you will automatically get the VM Mobility drivers). The Nutanix VM Mobility installer deploys the drivers that are required at the destination AHV cluster. After you prepare the source VMs, they can be exported (snapped) to the AHV cluster.

4. Run the snapshot and restore operation as per the video. That’s it!

Word on keyboard

Almost as easy as clicking this button

A few points to note:

In the video I am just taking a crash-consistent snapshot, if you want a clean snap then shut down the source VM first, then snap, then restore. Live app-consistent snapshots will be coming in 4.6 for ESXi and AHV.

Obviously if your VMs have static IPs or to avoid computer naming issues, you should take care of these before joining the newly created AHV VM to the network. When you restore the VM on AHV, by default there is no virtual nic connected (so the risk is minimal if you just want to test). If you wanted it to connect to the network you would attach a nic to the restored VM on via Prism on the AHV cluster (go to the VM page).

Only 64-bit guest operating systems are supported at the time of writing (Nutanix AOS 4.5).

For Windows 7 and Windows 2008 R2 operating systems, you have to install SHA-2 code signing support patch before installing Nutanix VM Mobility installer. For more information, see https://technet.microsoft.com/en-us/library/security/3033929.

More info can be found in the Nutanix Prism Web Console Guide under the “Nutanix VM Mobility for Windows” section – which can be found on the Nutanix Support Portal.

Use cases:

A lot of people are trying AHV for the first time, and larger customers usually have a test/dev set of Nutanix nodes for testing. This method would be perfect to try snapping production VMs on AHV for testing and verify all is OK.

Also, I can see a use case where DR clusters could now use the in-built AHV on Nutanix clusters and save some licensing dollars.

It would also be possible to use Nutanix Community Edition as the AHV target – in case you had some spare hardware and wanted to just try this out without the need for a full Nutanix set of nodes.

Future software plans:

In a few weeks (early 2016), Nutanix will release AOS 4.6. With it, two-way VM conversion (ESXi<->AHV in either direction) should be included. In a future release AOS is expected to add support for Hyper-V, delta disks, and volume groups.

Yes, Nutanix will enable the ability to leave AHV and migrate your VMs *back* to ESXi (for example) should you choose. Put simply, the onus is on Nutanix to keep innovating to maintain your loyalty, rather than any technical or license ‘lock-in’. At the end of the day your workloads are just virtual machines – you should be free to move them wherever you see fit (even away from Nutanix if you choose).

There will be lots of improvement and extra features coming in future releases of course, which you will get by simply doing a standard Nutanix non-disruptive upgrade.

Conclusion: 

In essence, you can see why going Hyper-converged makes doing things like this almost trivial compared trying to do the same in a traditional 3-tier infrastructure (separate servers and storage layers). As the Nutanix software improves, your life gets easier each time. With each Nutanix release, more and more features like this will continue to be added and improved. Being 100% in-software is going to be a necessity in the next decade and beyond.

Thanks to @danmoz for letting me borrow his Dell XC cluster… and I treated it badly too (eg. multiple times I hard powered it off with no care – and it self-recovered every time).

shackles

Hypervisor lock-in is sooo 2007 :)

On Nutanix Resiliency – The Controller VM

saidnoceoever

Spot on! Invisible Infra means not babysitting old DC constructs, instead deliver *business* advantages. Source: https://twitter.com/vjswami/status/562298721942401026

If you make the claim that your DC infrastructure should be invisible, clearly you need to have a solid story around handling failures. Coping with failure scenarios is critical in any infrastructure. I get asked often on how Nutanix clusters cope in situations where things go wrong unexpectedly – and rightly so.

Nutanix has been designed to expect failures of hardware, hypervisor or Nutanix’s own NOS software. I don’t care how long it takes to deploy a Nutanix cluster (5 mins or 60 mins depending if you want to change hypervisor etc) – as to me that is a one-off occurrence and fairly uninteresting (although a lot faster than the old way of deploying servers and a SAN). What really matters is whether or not you are going to be called in at 2am or on weekends when things die – or even worse if something goes wrong at 9am on a weekday.

In my personal view, uptime trumps all else in this modern 24×7 world. Gone are the days where ‘outage windows’ are acceptable. Of course, every deployment is unique and depends on the requirements and circumstances.

I usually demo HA events (eg. motherboard failure, nic failure etc) but none of that is particularly exciting for virtualisation administrators – failures of this nature are *expected* to be handled OK in 2015. Nutanix is no different here as that is all handled by the hypervisor.

One demo that really gets people excited is what I want to talk about here.

I’m going to assume you know a little about the Nutanix terminology and architecture, if you don’t check out The Nutanix Bible by Steven Poitras : http://stevenpoitras.com/the-nutanix-bible/

To set the scene, we have a fully working NX-3450 block (4 nodes of 3050 series) – which is a 2U appliance. These four nodes will be connected to a standard 10GbE switch. The Nutanix architecture is such that there is a Controller VM (CVM) on each and *every* node in the cluster (here a NX-3450 would therefore have 4 CVMs). This setup gives approx 7.5TB usable (after replication) but before dedupe/compression/EC.

With Nutanix, the storage fabric and the compute fabric are independent of each other. For example, you can upgrade one without affecting the other. Therefore the failure domain is limited to one or the other.

Let’s focus on a single CVM on a node that is also hosting your guest VMs. Let’s assume that the hypervisor is healthy and OK. What if someone powers off the CVM hard by mistake, or deletes the CVM, or if the CVM kernel panics and simply “stops” in the middle of normal guest VM I/O operations? What are the effects? What will your users notice? Will all hell break loose? Let’s check out this worst case scenario up-front, where the Nutanix “Data Path Redundancy” feature kicks into action.

I’ll demo this via this short video:

So what happened here? The CVM was killed with live guest VMs reading and writing data. This is NOT a usual scenario – but it is a great example of the robustness of the Nutanix distributed fabric. What we see is that the hypervisor is trying to write to the NFS datastore. When the CVM dies, the hypervisor on that node can no longer communicate with the NFS datastore and re-tries again and again. The hypervisor will continue to do so as per normal NFS timeout values. But here, the CVMs on other nodes were already taking steps to fix the situation – before the hypervisor knows what’s going on! After 30 seconds, the hypervisor on the affected node is told by one of the other CVMs to redirect I/O to one of the remaining healthy CVMs and things carry on as normal. This is well within the hypervisor’s (in this case ESXi 5.5) default NFS timeout period before it marks the NFS datastore as unavailable (125 seconds is mentioned here: http://cormachogan.com/2012/11/27/nfs-best-practices-part-2-advanced-settings/). Cool.

So what does all that mean to you and the end users of the guest VMs when a CVM is killed?

No data loss.
No loss in guest VM network connectivity.
No guest VM BSOD.
No hypervisor PSOD.
No vMotion / No HA event required. 

….just a temporary 30 second ‘pause’ in I/O because the NFS datastore became unresponsive for 30 seconds – but it recovered before the hypervisor’s own timeout function – so the guests keep going from where they last tried to write data. Job done!

Obviously the same situation would be true if your Nutanix cluster was instead running Hyper-V (SMB) or Acropolis Hypervisor (iSCSI). Remember, to Nutanix the hypervisor itself is ‘above’ the distributed storage fabric (designed deliberately so). If it wasn’t then there would have needed to be a full outage of the VMs and a HA event.

nutanix-choice

Abstracting the physical disks away from the hypervisor means the Nutanix CVMs can present the protocol of choice to your hypervisor of choice. Treat storage just like a application VM….If virtualisation is good enough for production Oracle and SQL, then so it should be for storage (and it is :)

If you still are a bit hazy on how Data Path Redundancy works, check out this nu.school video: https://www.youtube.com/watch?v=9cigloapOXw

Please keep in mind that these I/O ‘pause’ effects would not be seen in a normal scenario such as a rolling one-click upgrade of Nutanix NOS software versions – where there would be no interruption of I/O at all (due to the fact the affected CVM can pro-actively redirect it’s hypervisor I/O to another CVM in preparation for a controlled reboot). The same is true for the 1-click hypervisor upgrades – because the guest VMs are vMotioned anyway before CVM shutdown (because the hypervisor itself is restarted).

This is why people can expand/upgrade their production Nutanix clusters in the middle of the day:

Very cool stuff. I remember wasting a Christmas Day in 2008 upgrading firmware on an old IBM blade chassis. I wish Nutanix had existed then…. I could have done the same on Nutanix nodes over Christmas lunch…from home!

BTW, don’t worry if you have more nodes (than the 4 shown in the video) or want more protection than simply handling a single CVM failure. If you have enough nodes or appliances you can lose entire blocks or racks of Nutanix and keep working (ie. lose more than one CVM simultaneously). Check out the Nutanix Bible for ‘Availability Domains’ and RF3 for examples and check the minimum requirements for these situations. Try that with a dual-controller SAN. 

I hope this post has shown you one of the most powerful aspects of a building your virtualisation infrastructure using Nutanix. In a lot of traditional environments, losing 6 disks at once would mean you’d be having a very bad day, maybe even having to invoke a DR strategy or restore from backups, perhaps with a lot of manual steps too. Who needs that drama in 2015? Go invisible and then go home and crack open a beer! Let the software do all the work for you, and with your free time you could maybe learn some new skills instead of babysitting infrastructure. Win/win!

Thanks to Josh Odgers (@josh_odgers), Matt Northam (@twickersmatt) and Matt Day (@idiomatically … a champion Nutanix customer!) for reviewing this post.