best practice to upgrade SPP and SPS

recently i upgraded SPP and SPS followind the admin guide. The classic process is to upload a file for both architecture but we had a issue with a customer and also the support didn't know the root cause. So as partner we are thinking for a much safer process. Which is the best suggestion to upgrade doing all the possible to prevent any issue about the upgrade process? We are thinking for ex for SPP to upgrade only one node (replica making it a primary node) and disconnecting other two. After a couple of days upgrading other two or one at a time. For SPS the same, first for ex upgrade the central management disconnecting the managed host for a couple of days and then doing the upgrade also for this node and join it again. Something like that. Is it possible? is it something that we can think about it or is better not because the infrastructure need all the join between the SPP and SPS and is not possible to unjoin a join again many times or in a bad way? thanks a lot

  • Hi Dario,

    I would recommend to have a test environment where upgrades or changes in general can be tested before being implementing in the production environment. However, I do understand that this might not always be possible for every customer. We are aware of this particular issue and it is being addressed.

    Thanks!

  • While I can understand your wish to make the patching process as easy and safe as possible for your customer as possible and your suggested solution would work, personally I really do not like your idea.

    Now based on your description I am assuming that your customer has a 3 nod cluster.

    So for me first issue you are taking away solution resilience by reducing it to a 2 node cluster. There is a good reason that the minimum cluster configuration for SPP is 3 nodes. If a noe in the 2 node cluster you create were to fail you could potentially cause yourself a lot of problems and work. While it is not the best description of possible problems, if you look at the desaster recover section of the SPP manual you can extrapolate a number of "what if's" from the steps neede to recover.

    Second how are you going to avoid a split brain?

    Once you upgrade an appliance to the new code how will you test it? If You  allow it to be used for password requests on the live network how will you maintain the logs of these requests? . You cannot re-join it to the remaining nodes until they are upgraded to the same release version and when you do any changes on the unit you made stand alone will be lost..

    If you were to make the stand alone node the Primary and power off the remaining 2 nodes you avoid the split brain but then again you are reducing the resilience of the solution even more

    In an ideal world the customer would have a test network that is a close mirror to their production network with SPP appliances to match to that they could carry out testing without risk to the production environment. Unfortunately in the real world it is only the very big organisations that can  afford this luxury.

    For me I need to be happy with the upgrade process before I get it anywhere near my customers.

    I run a virtual test lab on my laptop (i7 32Gb ram external 2Tb SSD usb3 drive with virtual machines) that allows me to carry out a lot of testing while I am onsite. I also have a server infrastructure that I can use to extend this when I am in the office. This lets me carry out a lot of testing and PoC work. Also useful for training.

    I use this to prove the upgrade process and make sure I am happy with the steps and that it works.

    If your customer does not have a test network or deep enough pockets to afford a test appliance to use for this kind of testing, as a partner maybe you could look at obtaining an NFR virtual or physical appliance (or appliances) that you could use to verify the upgrade process for your customer.

    You could test the upgrade at your office and also take it to your customer site and test on their network. You could get a backup of their data and restore it to your NFR appliance so that they are testing with their data. The when they are happy carry out the upgrade on their appliances.

    There is probably a service there that you could sell them...

    I have been working with SPP for quite a while now and have been lucky as the only issues I have had with an upgrade was down to Network latency between one of the nodes.

    Hope this helps a little

    Tim