Monthly Archives: October 2012

Basic configuration of the Nutanix 2RU block

Winter is coming for the old SANThought I’d describe what you get when the shiny new 2RU Nutanix block gets to your door and how to get it configured to a basic level from the point of view as a new user of the Nutanix solution. Obviously, talk to your Nutanix partner before diving in, but this should give you a bit more of an idea on what you need to do to prepare.

In my case the block was comprised of four nodes and I ordered it with 192 GB RAM per node (768 GB total). I won’t go into the tech specs in detail as you can get that from their web site : http://www.nutanix.com/products.html#Specs

The rail kit is a snap-on type, easy to install, but don’t be a goose (like I was) and try to lift/install it into your rack by yourself. It isn’t light as the disks are pre-installed. After a short trip to the emergency room I continued with the config… :)

The next thing I did was send an email to Nutanix support to get registered as a customer when then allows you access to the knowledge base. Within about 10 minutes I had a reply and login to their support portal.  I searched for ‘setup’ and got the setup guide. If you aren’t sure what version to get, contact support – they are very fast to respond. In my case it was v2.6.2 (latest for Oct 2012).

Physical cabling was easy: 2 power cables (carve off to separate UPS power feeds); Each node has 1 x 10GigE network interface and 2 x 1 GigE interfaces for failover of the 10GigE as well one IPMI (lights out management) 1 GigE interface. That’s it. I assigned the IPMI as access ports and the 10Gig and 1Gig uplink-to-network ports as trunks.

The guide itself is pretty straight forward and easy to follow and the easiest method is to use IPv6 to configure the cluster. I used a Win7 laptop with IPv6 enabled, Bonjour for Windows, Google Chrome (with DNSSD extension) and your typical dumb gigabit switch to hook it all up in an isolated environment initially (if you want to isolate it). The cluster uses the 1 gig interfaces as a failover to the 10 gig nics, so that’s why it works on a basic 1 GigE switch. The setup process assigns IPv4 addresses to the cluster components so don’t worry about needing to be an IPv6 guru – you don’t. You don’t have to use Win7, other OS options are ok too. I didn’t try any other OS/browser combo so YMMV.

In my case I’ve assigned a complete /24 subnet in the DC for Nutanix infrastructure. It is recommended that all the components are in the same L2 domain but it is not mandatory. Addresses will be split out between the ESXi hosts, IPMI adapters and Nutanix controller virtual machines. Do not use 192.168.5.x/24 as this is reserved for the cluster’s internal communication.

I reserved addresses in segments so that when I get more Nutanix blocks, the expansion is contiguous. You don’t have to follow my example of course, but here it is:

Block/ Node ID  ESXi host   Controller VM    IPMI interface
Block 1 Node A     10.1.1.21     10.1.1.121          10.1.1.201
Block 1 Node B     10.1.1.22     10.1.1.122          10.1.1.202
Block 1 Node C     10.1.1.23     10.1.1.123          10.1.1.203
Block 1 Node D     10.1.1.24     10.1.1.124          10.1.1.204

As you can see, any future blocks can continue the sequence. eg:

Block 2 Node A     10.1.1.25     10.1.1.125          10.1.1.205  … and so on for any future nodes.

I don’t think I’ll ever have 54 nodes so that sequencing should be ok :) The controller virtual machine is where all the magic happens. There is one of these per node and the key to the whole Nutanix solution is the software and processes that run within the Controller vm; keeping everything in check and optimised; even in the event of failure.

The block ships with evaluation versions of vSphere ESXi hypervisor, vCenter server, Windows 2008 R2 Enterprise (for vCenter), MS SQL 2008 Enterprise (for the vCenter database). You can apply your own licenses as appropriate. Each host has its own local datastore (stored on the SATA SSD) and the distributed NFS datastore is comprised of the FusionIO drive (PCIe SSD) and the SATA disks. There are also ‘diagnostic’ vm’s pre-deployed which are used to benchmark the performance of the cluster.

You do not have to use the vCenter and you can decide to use your pre-existing one (it will save you a license). At this stage I’ll probably keep a separate vCenter for the VDI deployment but that is up to your own individual deployment scenario.

Once the cluster is ‘up’ you can then follow the guide and configure NTP and DNS from the CLI, configure the vMA, configure the cluster and hosts in vCenter, install the VAAI plugin and the NFS storage.

I also added an external non-Nutanix NFS datastore to all ESXi hosts so that I could use it as a transfer mechanism to get vm’s and templates from existing vSphere infrastructure to the Nutanix block should I want to. Note that there is a way to allow external-to-Nutanix ESXi hosts to connect to the internal Nutanix NFS datastore, however I think it is easier and better to keep the only hosts that can write to the Nutanix NFS datastore as the Nutanix nodes themselves.

When you take into account picking up the box from the loading dock, unpacking, lifting/racking, cabling, getting your Win7 laptop ready, cluster and vSphere configuration, DC network configuration, moving from isolated to production, installing the VAAI plugin, configuring NFS storage and final checks I’d say you were looking at a few hours in total to complete. Obviously adding any more blocks will take significantly less time given most of the clustering components are already done.

The ease of configuration and administration of the Nutanix block has been great. The other thing to keep in mind is that the support team from Nutanix (and their partners) can assist you with the above deployment process if you like.

So, at the end, you have a complete storage and compute modular building block that is easy to deploy and scale out when you require. In the coming weeks I’ll provide updates on the experience on using the block for our VDI project, as well as going into some detail on how the block has been designed from the ground up to handle a lot of different failure scenarios.

Be sure to check out some of the Nutanix YouTube videos as well if you haven’t done so: http://www.youtube.com/user/Nutanix and get ready for a life without a SAN in your DC.

The Nutanix box arrives in Australia

WAAS and Isilon

While I’ve been in the USA there’s been some interesting developments in an EMC Isilon/Cisco WAAS interoperability issue the guys back in Sydney have been dealing with.

A few weeks ago we upgraded Isilon from 6.0 to 6.5 level. Since then, users who were saving files to Isilon shares and exiting their application, but if they went straight back into the application and tried to open the file they were presented with an error. Secondly, the directory where the file was located had multiple .tmp files being generated and not being deleted per save and free space was now being used quickly. These .tmp files are expected behaviour in an MS Office saving operation (see below).

We bypassed CIFS acceleration and the problem went away (native TFO, LZ and DRE was still active). Therefore, as a workaround the CIFS accelerator was disabled to the Isilon nodes only to give Cisco/Isilon enough time to work out what was happening. That in itself took longer as it should have to get some action, but eventually some analysis of the wireshark captues by the support teams from the vendors came to some interesting findings. Obviously, we would prefer if the CIFS accelerator could work as we expect.

The analysis came back and noted there was a change in the file path passed during the save operation from a backslash ‘\’ to a forward slash ‘/’.

I’ll quote the case notes here:

 ===Quoted from an Isilon support representative===

— Client deletes temp file

— Client renames xls to temp file

— Cisco core kicks in and does an enumeration for /temp file

— Cluster returns file not found (because Isilon doesn’t support the /)

— Client opens the file \temp

— Cisco returns file not found because they got back /temp did not exist on the cluster

— At this point, since the client failed to open \temp file, it stops working on the file and ends up not deleting the file.

So that brings us to what changed for the upgrade?

Well, according to Cisco, the reason the WAAS sends a / instead of a \ to Isilon is because during the Session Setup Response, Isilon returns the field “Native OS: Unix”.  Cisco have never tested against Isilon but they have tested against Samba which works when you pass /.

Since we upgraded from 6.0 (samba implementation) to 6.5 (likewise implementation) that is the change.  Both our 6.0 and 6.5 releases set Native OS: Unix; however, while the 6.0 version supported / the 6.5. version does not.

As for why Windows works; it is because when Windows sets the Session Setup Response of Native OS: Windows …., Cisco WAAS leaves the \ as \.

At this point, both Cisco and Isilon are investigating what can be changed from a code perspective.

===End of case note===

Isilon upgrades are currently one-way. There is no rollback option. So, we must await the vendors to fix it. We are now in the forth week since the upgrade; and it has taken a fair bit of ‘prodding’ by the storage engineer on our staff to things this far along.

Not only has this highlighted the need for us to improve our UAT testing prior to deployment to Production (e.g. saving instead of opening/copying only – in this case only if you have a 2nd isilon cluster available to test!), but of course you cannot take what vendors say as gospel. That is, look at the following link: http://www.cisco.com/en/US/netsol/ns962/index.html and note the statement:  “Isilon’s entire product line is certified to work with Cisco Wide Area Application Services (WAAS)”. You could argue that I am taking that line too literally, but that’s probably giving these vendors more credit than they deserve. You can’t say that a product is certified and then never revisit testing between the products after any future software upgrade of either product. That’s ordinary.

The Rise and Rise of Nutanix

It’s a pretty exciting time in IT right now, but it has been a struggle recently it is fair to say.

My place of work has gone through several restructures in recent years resulting in headcount reductions. The standard rate of employee turnover was still there and continues to be. Additionally, the current economic environment means cost cutting is the new norm for IT budgets; an yet the modern data centre has some very complex and expensive systems that allow the business function as it should – and costing some serious coin to run!

All these things are combining to place pressure on the remaining staff to continue to deliver the IT outcomes the business has come to expect (perhaps even take for granted). But what about innovation? When is BAU and ‘keeping the lights on’ not going to be enough? Does IT management simply expect to keep the same old technology and renew maintenance contracts for these expensive systems without paying a much bigger price in say 2-3 years?

I was pondering this about 8 months ago when I saw where this trend was heading – and I didn’t like it. Something had to change.

The data centre world is converging in terms of technology *and* personnel – and it will benefit both the employee and business as a result. Good people are hard to find and even harder to keep. Consequently, the traditional idea of skill ‘silos’ must start to break down to drive efficiency (and to overcome the fact that there are less people to do the same work) but with the side effect that staff who are keen to learn new skills will get the opportunity to expand their knowledge – perhaps in areas they had not previously considered. Personally, I’ve never wanted to be ‘siloed’ (is that even a word?), nor wanted to be a one-skill engineer. All the best people I’ve worked with were of the same mindset.

While the people side of the equation is important, a lot of this is ultimately up to the individual to have the drive and ability to improve. The business can certainly help by creating a good environment – but that is a deep subject for another time. The point is that as much as the technology will converge, so too must people. Perhaps it is the age of the technology ‘generalist’ who can get the best out of the equipment for a reliable, functioning solution for the end-user within the business (so they can get on with making money).

But what about the technology?

I decided to find out about the technology options around ‘converged infrastructure’. The same-old way of providing compute, storage and networking really wasn’t delivering benefits in terms of cost and certainly not simplicity for the administrator. EMC/VMWare have been saying that they want to see one sysadmin for every 10,000 VM’s but with the current crop of technology from the big vendors I really could not see how that could be a reality.

Then I stumbled on a company that I thought might have the answer.

I’ve been following the rise and rise of Nutanix for a while now. I think they are on a winner. I’m very partial to their primary message of “NO SAN”, if nothing else that it promises to help eliminate a massive recurring and yet growing part of data centre budgets. No one is reducing their storage footprint even after staff are cut! When has your storage budget ever reduced in the last 10 years? Do you expect anything different? Didn’t think so. So when I saw a company saying they could ‘eliminate the SAN’ from your data centre I was going to pay attention.

Nutanix offers an all-in-one appliance. Think Cisco UCS for the compute side, but also heaps of storage integrated into the same appliance. And it has been designed from the ground up for virtual environments. And it provides a SAN-like distributed storage for that environment. And it is easy to administer (could this finally realise EMC/VMWare’s vision?). And it all comes in a 2RU block. And it has inbuilt data redundancy. And – well… I was keen to check it out.

If you’ve never heard of Nutanix, head over to their web site and view their 90 second summary video.

Last week I got my hands on our first Nutanix block for our End User Computing trail – one of the first in Australia. I’m proud to be involved.

A few things have impressed me on the Nutanix journey thus far.  I’ll be going into detail here in the coming weeks, but as a summary:

  1. Simplicity of the administration/setup of the block – up and running in minutes
  2. From sales to support staff, there is a level of responsiveness and eagerness to please I have not seen from a vendor in a long long time
  3. The technology solution is elegant, simple and priced well especially when you consider you get the storage included. It is modular, scales out extremely well if not infinitely (according to their whitepapers)

Nutanix is the duck’s nuts. Do yourself a favour and check them out. I’m lucky enough to be visiting their headquarters in San Jose to get the full story via some technical training and I’m looking forward to meeting some of the smart people behind it all.

I think it will be like visiting the future.