Data Prep and Quality

>    Data Prep and Quality

Xcalar Data Platform accelerates your analytics cycle from the first view of a data source, through prepping the data and refining data quality, to the operational use of the data across your organization.

Both data prep and data quality result from iterative analytics cycles. The analytics cycle for data prep consists of the following steps:

  • Profiling data fields

  • Cleansing data fields

  • Transforming data fields for use

  • Combining disparate fields and data sets

In addition, enterprise data quality results from the following secondary cycle:

  • Applying your business rules to data at scale

  • Surfacing data anomalies

  • Using data lineage to diagnose anomaly sources

  • Refining business rules to resolve anomalies

Typically, analytics departments use a suite of tools for analytics. This results in slower analytics cycles, due to unnecessary data copies, inefficient processing, and broken data lineage.

Xcalar Data Platform is a hybrid cloud and on-prem solution that provides users with visual programming tools to rapidly build a data model. Creating a model results in a dataflow, which provides the following:

  • The data’s lineage, which is the necessary information to follow schema evolution and trace data anomalies back to the source

  • The algorithm to apply the data model efficiently at scale to meet SLAs

    Xcalar Data Platform makes developers more productive by focusing their use of custom code for new data formats, proprietary logic, and machine learning algorithms. Through the use of visual modeling tools and focused custom code, Xcalar Data Platform increases the velocity of each analytics cycle, resulting in more iterations in less time and, therefore, higher data quality.

    True Data in Place

    Xcalar Data Platform works directly with source data files using metadata, without copying data into an internal format.

    Data format-agnostic

    Xcalar Data Platform works with structured, semi-structured, or unstructured source data of any format, from file or streaming sources.

    Data profiling

    Xcalar Data Platform provides efficient data profiling on terabyte scale data with statistical summaries, histograms, and data correlations.

    Visual programming with lineage

    Users interactively create data models using a spreadsheet-like user interface; resulting models track lineage of data from sources through its transformation.

    Business rules

    Users apply business logic on data in Xcalar Data Platform by writing and applying a list of rules to surface anticipated data anomalies.

    Finding data anomalies

    While processing each data operation, Xcalar Data Platform surfaces the anomalies inherent in the data; users can triage these at any stage of analytics work.

    Ad hoc analytics/modeling

    Xcalar Data Platform’s responsive interface performs interactive analysis using relational operators on up to 100 billion rows.

    Operational machine learning and analytics

    Data scientists can train and deploy ML or predictive algorithms across petabytes of data at any stage of the data pipeline.


    The algorithm resulting from modeling is displayed as an auditable graphical diagram of operations; it can be saved, operationalized, and scheduled to run on production data.

    Exceptional scalability and performance

    Xcalar Data Platform processes read and write operations with near linear scalability while maintaining strong data consistency, across cloud-scale clusters.

    Integration with BI apps

    Analysts pull data via optimized JDBC queries using BI applications, such as Tableau, Qlik, and Power BI for visualization of data.

    Structured programming

    Visual programming provides well-articulated opportunities to apply proprietary logic to data models, using Python.

    Operational workload management

    Large scale analytics workloads are run in high-throughput mode to meet performance goals.  Xcalar Data Platform allows dynamic skew detection and dynamic WL management.

    Security and authentication

    Xcalar Data Platform supports integration with Kerberos, LDAP, OAUTH, and custom authentication services for authentication and user management.


    Xcalar Data Platform users can easily share workbooks, datasets, and custom code to jointly solve problems.

    Reliability and fault tolerance

    All operations, including user code, are run in separate containers for robustness and stability; restart workloads using automated system recovery logs.


    Data Engineer

    Data Scientist

    Business Analyst

    Related Product Offering

    Xcalar Data Platform

    Use Cases

    10X performance improvement by decoupling compute from storage

    Read more

    Time to value reduced from 3 months to 4 days               

    Read more

    3X increase in analyst productivity

    Read more
    Data Prep and Quality

    Refine your data and accelerate your analytics cycle.

    Learn more
    Data Warehousing

    Meet your SLAs when processing micro-batch updates.

    Learn more
    Live Data for BI and Reporting

    Unlock your data lake for analysts and data scientists.

    Learn more
    Data Virtualization

    Access all your data without data movement.

    Learn more

    Let's Schedule a Demo!