Making source data accessible

Importing a data source means telling Xcalar Design to derive metadata from one or multiple files containing raw data. These files must be stored at a location accessible to all Xcalar nodes for Xcalar Design to import them.

IMPORTANT: After you import data from a file to Xcalar, changing the path to the file renders your data inaccessible after a cluster restart. For example, if you rename the source data file or the directory containing the file, and then restart the cluster, Xcalar Design can no longer locate the data source when trying to import the data. This failure means that you cannot use the workbook containing tables created from the data. Therefore, name carefully each component of the path to the source data file so that the path does not require any changes after the import.

This topic describes the steps for ensuring that the source data files are accessible for Xcalar to import. Some of these steps require configuring the storage system where the source data files reside. If you do not have the appropriate access privileges, contact your system administrator to set up the source data files for you.

Protocols used for accessing source data files

You must prepare the data source so that all nodes of the cluster can access the same data. The exact procedure for preparing the data source depends on the protocol for data access.

NOTE: For some protocols, only the Xcalar cluster administrator, who has the appropriate access privileges to the cluster, can carry out the procedure described in this section.

Xcalar Design supports the protocols described in the following table.

Protocol shown in Xcalar Design Description
file:/// File
hdfs:// Hadoop Distributed File System (HDFS)
s3:// Amazon Simple Storage Service (S3)
azblob:// Microsoft Azure
mapr:// MapR

The following sections describe what you must do for your data to be accessible to all nodes of the Xcalar cluster.

Using file:///

If you select file:/// to access data in shared storage that is visible to all nodes (for example, through a Network File System (NFS) in a network attached storage (NAS) environment), verify that the directory structure for accessing the shared storage is identical across all nodes.

For example, if your data resides in an NFS file system, all nodes must mount the same file system using the same path name as the mount point. That is, the same file system must be mounted on all nodes, and the mount point on all nodes must be the same (for example, /mnt/datasets). Similarly, if the source data files are local files, they must be stored under the same name and they must have identical contents. For example, to import a local file named /datasets/file1, Xcalar Design must be able to find /datasets/file1 on each node, and /datasets/file1 must contain the same contents across all nodes.

The following list describes the possible scenarios where importing a shared file system fails:

  • The nodes use different mount points for the same NFS file system. For example, some nodes mount the file system as /mnt/dataset1 and some nodes mount the file system as /mnt/dataset2.
  • All nodes have the same mount point (/mnt/datasets) but it points to different file systems. For example, /mnt/datasets on some nodes point to a file system on host1 and the same mount point on other nodes point to a file system on host2.

Using hdfs://

If you select hdfs:// to access data in an HDFS file system (for example, on a Hortonworks or Cloudera platform), verify that the host with the HDFS file system is accessible to all nodes of the cluster. Contact your Hadoop administrator to obtain the host name of the machine with the HDFS file system. You cannot use Xcalar Design to list all the hosts on which an HDFS file system is installed.

Using s3://

The S3 instance needs to be configured for S3 access. This is because Xcalar connects not only to your local S3 bucket, but also to any other S3 accounts that you have access to.

To configure S3, establish an SSH connection with each Xcalar cluster node. Then follow these steps:

  1. Make sure that you have installed the Amazon Web Services (AWS) Command Line Interface (CLI). If not, follow the instructions to install it at https://aws.amazon.com/cli/.
  2. To set up your credentials and default region, enter the following command:
  3. aws configure

    The command displays several prompts for you to enter the access key, default region name, and so on. After you finish responding to the prompts, configuration files are generated in the correct locations for you.

For detailed information and other ways to configure S3, visit the following website:

http://boto3.readthedocs.io/en/latest/guide/configuration.html#configuring-credentials

Using azblob://

NOTE: This section is required if you store your source data files in your own Azure Storage account. If you deploy Xcalar from Microsoft Azure Marketplace, Xcalar automatically creates an Azure Storage account for you. For Xcalar Design to access the files in the Xcalar-created storage account, you do not need to follow the steps in this section.

If your data is in Microsoft Azure Storage, for each storage account, follow these steps to make the data in the account accessible to Xcalar:

  1. Use a web browser to go to your Azure Storage accounts management web page.
  2. Click your storage account name.
  3. Navigate to the shared access signature (SAS) page.
  4. Set the SAS parameters as follows to generate the SAS token. Leave all other parameters not listed in the following table at their defaults.

    Parameter Setting
    Start and expiry date/time Start time and end time for the time period when your Xcalar cluster can access the data
    Allowed protocols HTTPS only
  5. After the SAS token is generated, copy the token (not the signature) to your computer's clipboard. You will need to paste the token in the next step.

    The following is an example of a SAS token with a Start Time set to 2017-09-18 00:00;00 and End Time set to 2017-12-01 00:00:00:

    ?sv=2017-04-17&ss=b&srt=co&sp=rl&se=2017-12-01T00:00:00Z&st=2017-09-18T00:00:00Z&spr=https&sig=xO0OwmPaIe27S0jy%2Fqr%2FbRs6WXj3%2BP84fGr4%2Bh%2BE5Qs%3D

  6. Enter the following string in the /etc/xcalar/default.cfg file on each Xcalar cluster node. The SAS token is the token you copied in the preceding step.

    AzBlob.<Azure Storage account name>.sasToken=<SAS token>

    For example, enter the following string in the /etc/xcalar/default.cfg file:

    AzBlob.myaccount.sasToken=?sv=2017-04-17&ss=b&srt=co&sp=rl&se=2017-12-01T00:00:00Z&st=2017-09-18T00:00:00Z&spr=https&sig=xO0OwmPaIe27S0jy%2Fqr%2FbRs6WXj3%2BP84fGr4%2Bh%2BE5Qs%3D

    In this example, the Azure Storage account name is myaccount. Be sure to use the appropriate account name for your Azure Storage.

  7. After you finish adding the SAS token for each storage account, save the /etc/xcalar/default.cfg file.
  8. Restart the cluster to make the change in the /etc/xcalar/default.cfg file take effect. For information about restarting the cluster, see Using Setup (Xcalar admin only)

Using mapr://

If the data source is on a MapR cluster, follow these steps to enable authorized MapR cluster users to access the data from Xcalar Design:

  1. Verify that the Xcalar users who want to access data on a MapR cluster have their usernames and passwords for the MapR cluster.
  2. For each Xcalar cluster node, follow these steps to set up the MapR client:
    1. Install the appropriate MapR client package from the MapR website. For example, install the following package from MapR:

      mapr-client-5.2.0.39122.GA-1.x86_64.rpm

      Make sure that the client package version matches your MapR cluster version.

    2. Enter the following command, which uses the MapR configure script, on the Xcalar cluster node to configure the MapR client against the MapR cluste:

      /opt/mapr/server/configure.sh -c -C mapr-cldb-node -secure -N my-mapr-cluster

      In the actual command, substitute your MapR cluster name for my-mapr-cluster.

    3. Enter the following command to copy the cluster ssl_truststore to the client node:

      scp admin@mapr-cldb-node:/opt/mapr/conf/ssl_truststore /opt/mapr/conf

  3. (Optional) Follow these steps to verify that each user can log in to the MapR cluster:

    1. Enter the maprlogin. The following example is for a user named myuser:

      maprlogin password -user myuser

      In the actual command, substitute the MapR user's username for myuser.

    2. Enter the MapR user's password for the MapR cluster.
    3. If the login is successful, you can be sure that the user can access the data on the MapR cluster from Xcalar Design.
    4. Enter the following command to log out of the MapR cluster:

      maprlogin logout

If data source is not shared

If your data is stored in a shared-nothing environment, you can still create a dataset on Xcalar for the data by using the Xcalar SDK. For more information about the SDK, contact Xcalar technical support.

Next step

Prepare the data in the source data files by following the instructions in this topic:

Preparing the data source

Go to top