Creation date Minimum version
2017-8-16 1.2.2

Sharing data modeling results between Xcalar users

Users often want to collaborate when modeling. For example, multiple users might individually develop a portion of an overall model and need to pass intermediate results between themselves. This article discusses how Xcalar Design provides several methods for splitting the overall modeling workload among Xcalar users.

In this article, User A performs data cleansing on a dataset and passes the results to User B to develop business insights.

Method 1: Sharing a batch dataflow

NOTE: User A and User B must be on the same cluster to use this method. Xcalar recommends this method when both users are using the same Xcalar cluster, as the dataflow is available to User B as soon as User A creates it.

To share a batch dataflow:

  1. User A completes data cleansing and creates a batch dataflow.
  2. User A passes the name of the batch dataflow, which is accessible cluster-wide to User B.
  3. User B, using the advanced option, runs the batch dataflow to export the results to a table.
  4. User B starts modeling from this point.


    Figure 1: Xcalar Design Batch Dataflow Example

Starting from Xcalar 1.2.2, data lineage will be available for a table exported by a batch dataflow.

Method 2: Sharing a dataflow file

NOTE: User A and User B may be on different clusters. Xcalar recommends this method when User A and User B are using different clusters, as the creation of an intermediate CSV file is not required.

To share a dataflow file:

  1. User A completes data cleansing, creates a batch dataflow, and downloads the dataflow to a dataflow file.
  2. User A passes the dataflow file location to User B.
  3. User B uploads the dataflow file to a cluster and runs the batch dataflow, which exports the results to a table.
  4. User B starts modeling from this point.

Starting from Xcalar 1.2.2 data lineage will be available for a table exported by a batch dataflow.

Method 3: Exporting a table to a CSV file

To share a CSV file exported from a table:

  1. User A completes data cleansing and exports the resultant table to a CSV file.
  2. User A gives the pathname of the CSV file to User B. The path to the exported file might vary from one Xcalar installation to another. To determine the location:
    1. In the Xcalar Design toolbar, click the Export Targets icon (under the Datasets icon).

    2. In the Export Targets panel, click the icon for the export target named Default.

  3. User B imports the CSV file into a dataset and starts modeling from this point.
NOTE:
  • This method cannot be used if the number of columns exceeds the 128 export limit.
  • User A and User B can be using different clusters.
  • If multiple users on the same cluster want to use the same CSV file, they can share the dataset imported from the CSV file.
  • User B does not have the lineage of the data in the CSV file.

Method 4: Converting a table to a dataset

To share a dataset converted from a table:

  1. User A completes data cleansing, exports the resultant table to a CSV file, and then imports the CSV file into a dataset.
  2. User A gives the name of the dataset to User B
  3. User B starts modeling.
NOTE:
  • User A and User B must be using the same cluster.
  • User B does not have the lineage of the data in the dataset.

Summary

Xcalar Design allows multiple users to share a single model.

Table 1 summarizes the benefits of each method.

Method

Entity passed

Supports sharing between clusters

Maintains data lineage

Share a batch dataflow

Batch dataflow name

No

Yes

Share a dataflow file

Dataflow file name

Yes

Yes

Export to a CSV file

CSV pathname

Yes

No

Export to a dataset

Dataset name

No

No

Table 1: Comparing the different methods of sharing a data model

Go to top