Click Here to read the first part of this series.
Availability in Oracle Engineered Systems
One of the prominent features of any Engineered System offered by oracle is its exceptional availability. The products , especially the flagship product “Exadata” comes with the label of Maximum Availability Architecture (MAA). The Oracle Engineered System (OES) products claim to have fully fault tolerant architecture with no single point of failure. Each hardware layer is architected with redundancy which offers extreme high availability to the customers.
There is no doubt that, the MAA is an extremely attractive solution for any enterprise platforms which run mission critical systems. Following are the key points the decision makers need to keep in mind when selecting OES products.
- Clearly assess the end user / business availability requirements. If the availability requirements are anything below 99.9% , drop the plan for OES then and there itself. Its not worth the effort and investment you aim at. On the other hand, if your system is super critical , involves huge revenue earning modules for the customer which eventually increase the risks when the system becomes unavailable, then, OES is a good candidate to consider.
- If the availability requirements are greater than or equal to 99.9%, then following are the key things you need to consider
- Single Box Deployment (Referred by Oracle as Bronze/Silver): This is a model where you deploy a OES box (Eg : a Single Exadata box) and deploy the consolidated workloads to that. This is the most cost effective model. As Oracle claim, the box has component level redundancy at all levels except a switch (which is non-critical) and protect against SPOF to a certain level. However, you should seriously think whether your systems can take Quarterly or half yearly or annual maintenance downtimes to perform maintenances in the OES? In other words, do your customers allow you to take such downtimes? Reason being, OES , if deployed in smaller configuration and get upgraded to next levels (Eg : 1/8th Rack to Quarter Rack of Exadata), requires considerable downtimes to do hardware upgrades, firmware upgrades, break-fix maintenance etc. Whatever Oracle claim on its HA doesn’t matter once the box is in your data center. They dictate terms with you for this high-priced machine and you may not have any voice to oppose to it as you will be under huge pressure to fix the issues or meet the customer deadlines to provide required capacity.
- Multi-Box deployment (referred by oracle as Gold / Platinum) : This is a model in which you purchase two or more boxes of the same kind of OES to build the Maximum Availability Architecture (MAA). An example is to deploy One Exadata machine in Site A and another one in Site B and implement replication mechanisms between these two and add other OES like Zero Data Loss Appliance to take care of the backups or protect against data corruptions. This is a highly costly solution as it throws huge money into infrastructure to attain exceptional level of availability (>99.95%).
- The decision makers need to meticulously evaluate their solution’s capability to reap the returns from customers for hosting the same on such high investment systems. If the investment seems to have high impact on their margins or foresee a shaky revenue stream, OES is NOT a good option to consider.
- During its pre-sales presentations, Oracle Executives will surely attract you with seemingly cognitive data on ROI or reduced TCO. It is highly likely that, they bundle together multiple OES products like Exadata, Private Cloud Appliance, ZDLRA and link the solution to your requirements. As a decision maker, it is highly recommended to build your own data/perspective on the ROI and TCO rather than taking and digesting workouts given by Oracle Executives. This will ensure that you wont get into countless arguments with your executive management at later point in time on high platform cost which affect the margins on the solutions offered to customers. Use your wisdom to vet the solution and meticulously decide on the number of types and boxes you need, the needed capacity at various phases etc to get the required availability. Translate your requirements to the solution and get help from Oracle Executives to arrive at the same.
- One of the most important points! Once these high cost systems are running in your data center, you may realize that, it is a “Black Box”, by all means.
- Your engineers are not supposed to touch it; which means they can not do any component level configuration changes which will create conflict of interest with Oracle Platinum Support team’s responsibilities. The OES is a highly integrated set of hardware components including servers, storage, network components, switches and complex wiring. Whatever happen to them need to be fixed by Oracle support engineers. There is the so called Platinum Support tagged along with OES. But, sometimes, the default platinum support is not enough to maintain high availability , for may real world scenarios. So, if you are really serious about the availability, you may need to plan for some more extra dollars (not a small sum) to provision Advanced Support from Oracle ACS. Else, your team and Oracle Support will be pointing fingers each other and ultimately your customers will suffer.
- Sometimes, Oracle Support OR Oracle Advanced Consulting Services (ACS) who are responsible to do troubleshooting and maintenance of OES struggle to do so. They depend on the runbooks and SOPs and follow through them to troubleshoot but often get stuck on certain exceptions. This is no different from ordinary Oracle support pattern and you may need to depend on different manual escalation mechanisms or procedures to get the right level of attention.
- Many a times, High Availability on a High Cost System is a myth. Eventually
- when you run your mission critical systems on top of an OES like Exadata
- when both of your Oracle real Application Cluster instances goes down in the Exadata box – – which causes full downtime to application
- when Oracle Support either cite an undocumented bug for the same or rather sitting on the SR for several months despite your outcries,
you may wonder, in which moment you committed this investment and may feel to pull your hair yourself. Use your wisdom to evaluate all the scenarios , all the risks, all the mitigation steps and all the associated costs upfront when you assess the Investment Vs Qualitative Benefits Vs Returns
- Sometimes, decision makers tend to fall for some specific features projected by Pre-Sales executives. What if, after several months of purchasing the premium priced engineering system boxes, you and Oracle realise that, those specific features are in fact creating more issues?
That’s also a reality. One such example is the hype of 40Gb/s connectivity between Exadata and Exalogic with Infiniband Switch over SDP protocol (Rather than TCP ). Later, when the SDP caused extended downtimes in the systems, Oracle published a bug and instructed to migrate back to TCP (References available)
To be continued
Next : Consolidation in Oracle Engineered Systems ( To be published)
Sethunath U N
This is Part 2 of Many in this series which discusses about Oracle Engineered Systems. If you think of considering Oracle Engineered Systems adoption or want to evaluate your decision, do contact us at email@example.com