Designing a Data Platform for the Future

Enterprises have high demands of the software running in their data centers for their software. Performance, robustness, availability, scalability, and ease of use are important factors for any enterprise software, but it takes time to build them into the product. And unfortunately, the market isn’t kind enough to wait. As the years go by, the market can change unpredictably, and in the worst case, a product might be released to a market that’s already moved past it. This problem exists in all industries, but it’s particularly present in the technology startup field, which is filled with now defunct companies who released products that were no longer relevant. When innovation and disruption are the names of the game, organizations focusing on the present get left behind. That’s why it’s important to look many years ahead, to anticipate market trends and shifts in the landscape, to design a product that will still be useful by the time it’s ready for release. And that’s exactly what Xcalar made a commitment to do, before even writing the first line of Xcalar Data Platform’s code. This article will discuss two aspects of our product that make it a perfect fit for the modern data center.

1. Separation of Storage and Compute

When Xcalar was in its conceptual phase, the general view in the big data space was that Hadoop and the concept of distributed file systems as a whole were the inevitable solutions to big data. But distributed file systems come with their own baggage: idle CPUs. These systems work off of clusters of computers, with each computer having its own CPUs. These CPUs can be underutilized by the tasks being asked of them (basic compute functions and storage IO), so they tend to idle, wasting money and energy. Worse yet, this problem is magnified when you account for real-world use cases. An organization’s workload requirements can change dramatically over time, and trying to be prudent by accounting for the worst case scenario means having the maximum number of potentially needed CPUs available at all times. This is a cost and energy inefficient solution that’s bad for both the bottom line and the environment. And with the amount of data in the world continuing to increase, we expected this problem to worsen over time.

That’s why when Xcalar was designed, we made sure to isolate the compute layer from the storage layer. We make it possible for organizations to independently scale their compute and storage resources to meet the demands of a given workload, as well as take full advantage of the cloud. When compute is independent of storage, a Xcalar compute node on one cloud can access storage located on a different cloud, or even an on-premise repository. Organizations thus have the freedom to pursue whatever data center configuration works best, a freedom we sought to expand by ensuring Xcalar was compatible with all three major clouds: Amazon, Google, and Microsoft. With Xcalar, you can eliminate the risk of idle CPUs entirely by using the cloud to dynamically and separately adjust the size of your compute and storage clusters. Say one workload requires 2000 nodes to store a petabyte of data and 100 nodes to process it, while another workload goes the opposite way and requires 256 nodes to process a terabyte of data. Instead of having to maintain a 2100 node cluster for both, you can leverage the cloud’s elasticity to expand either your compute or storage cluster as needed. As such, you only pay for the exact amount of storage and compute resources each workload requires. So regardless of how much data there might be to analyze, you will always be cost and energy efficient.

2. Frictionless Adoption

An enterprise can’t force widespread adoption of its product, there needs to be both an infrastructure to support it and people with the expertise to use it. If either of these aspects is lacking, then the product encounters friction when it enters the market: a lack of infrastructure limits use cases, while a lack of expertise necessitates user retraining. Yet many cutting edge new technologies require a rip and replace, the complete replacement of the existing infrastructure with a newer one that users may not be as familiar with. Most organizations are unable to afford the wide organization impact of this process, and as a result, the product’s rate of adoption slows down. This is one of the main reasons existing data analytics platforms haven’t been able to live up to their full potential, as organizations adopting them had to both redesign their data centers and retrain their data administrators in new computer languages, operational logic, and other subjects with steep learning curves.

But conversely, a product that leverages existing infrastructure and expertise can be widely adopted with ease. With that in mind, we designed Xcalar Data Platform to utilize as much of an organization’s existing investments as possible. Instead of forcing a rip and replace, our product seamlessly fits into existing big data infrastructures. And instead of mandatory retraining, our product lets DBAs and developers leverage their existing skill sets in industry standards like SQL and Python. We foresaw that many organizations, having already invested heavily in their data centers, would be hesitant to adopt rip and replace solutions. That’s why we built Xcalar to let you solve your new problems with the infrastructure and expertise you already have. Our product’s ability to seamlessly fit across entire organizations without displacing what is already present is one of the major reasons our product has gained traction with customers.

The two concepts discussed above are not the only outcomes of our company looking ahead, but what separation of storage and compute and frictionless adoption illustrate is the increasing importance of flexibility in today’s data centers. Organizations today want to be able to cost-efficiently scale their data centers to meet workload demands and seamlessly incorporate new products into their data pipelines to get immediate insights into their data. Xcalar Data Platform was designed to meet these new demands, making it the ideal operational data lake solution for the modern market.