Demystify Your Data Lake With Xcalar

Technology improves in cycles of innovation and refinement. The data lake was the innovation, and our product is the refinement.

The data lake has surged in popularity since its inception in 2011. To understand why, it’s helpful to examine its predecessor. The traditional data warehouse was thought up in the 1980s, the era of floppy disks and refrigerator-sized hard drives, where it made sense to prioritize storing only the most important data because a gigabyte of storage cost tens of thousands of dollars. That may have meant sacrificing any insights the less important data could have revealed, but that was an acceptable loss given how high cost per gigabyte was. Fast forward a few decades, however, and the situation changes. Cost per gigabyte has plummeted, missing out on potential insights is no longer acceptable, and there is more data (structured or unstructured) than ever before. This is not the environment the data warehouse was designed for, so it is understandably no longer as effective. Hence the rise of a new kind of data repository to replace it: the data lake.



A full explanation of a data lake is beyond the scope of this article, but the gist is that it’s a single, shared repository of massive amounts and varieties of data. It’s typically stored in a distributed file system, as those can leverage the cheap prices of storage and servers to store petabyte-scale data. A data lake can ingest data at different stages of processing, eliminating the need for lengthy data transformations. And barring security requirements, users have access to the entire shared pool of data at once, avoiding the data silo issue entirely. The end result is a sandbox that users can sift through to rapidly analyze information, discover insights, and solve problems. Data lakes enable the convenient storage of and access to the exploding amounts of varied data that modern organizations extract business insights from.

But as helpful as the data lake may be, it has a drawback: it necessitates the retraining of the workforce. Where using a data warehouse only required the moderately straightforward task of learning SQL and BI tools, using a data lake requires acquiring, learning, and deploying a host of new technologies; a typical analytics team must be familiar with Pig, Hive, Scala, Kafka, and various other facets of the Apache Spark and Hadoop environment. Considering that each of these technologies has its own steep learning curve, it’s understandable why there’s a severe shortage of developers trained in these new skills. The end result is that many organizations have poorly organized, unusable data lakes, a problem that has grown prevalent enough that a term was coined for it: the data swamp.

Xcalar addresses the problem head on. We provide a powerful visual design studio that lets you use visual programming, structured programming languages, and SQL to easily use your data lake. Whether it be data orchestration, aggregation, assimilation, blending, preparation, or quality control, Xcalar lets you do it rapidly on large amounts of data, regardless of your skillset. Whether you’re a full-fledged DBA or a business analyst who knows how to work a spreadsheet, we let you leverage the skills you already have on a data lake. This ease of use empowers large numbers of people to become data analysts, and makes the dream of self-service analytics a reality. With our product, a front office administrator could find the insights they need directly off a data lake with no assistance from an engineer or a programmer. Xcalar’s data platform makes it possible for multiple users across many different use cases, lines of business, user personas, and skill sets to collaborate with a single set of powerful tools.

Technology improves in cycles of innovation and refinement. The data lake was the innovation, and our product is the refinement. Instead of worrying about how to use their data lakes, our customers can focus on actually using them. Our product demystifies the data lake, and makes it the easily accessible sandbox it was conceived to be.