Making source data accessible
Importing a data source means telling Xcalar Design to derive metadata from one or multiple files containing raw data. These files must be stored at a location accessible to all Xcalar nodes for Xcalar Design to import them.
This topic describes the steps for ensuring that the source data files are accessible for Xcalar to import. Some of these steps require configuring the storage system where the source data files reside. If you do not have the appropriate access privileges, contact your system administrator to set up the source data files for you.
You must prepare the data source so that all nodes of the cluster can access the same data. The exact procedure for preparing the data source depends on the protocol for data access.
Xcalar Design supports the protocols described in the following table.
|Protocol shown in Xcalar Design||Description|
|hdfs://||Hadoop Distributed File System (HDFS)|
|s3://||Amazon Simple Storage Service (S3)|
The following sections describe what you must do for your data to be accessible to all nodes of the Xcalar cluster.
If you select file:/// to access data in shared storage that is visible to all nodes (for example, through a Network File System (NFS) in a network attached storage (NAS) environment), verify that the directory structure for accessing the shared storage is identical across all nodes.
For example, if your data resides in an NFS file system, all nodes must mount the same file system using the same path name as the mount point. That is, the same file system must be mounted on all nodes, and the mount point on all nodes must be the same (for example, /mnt/datasets). Similarly, if the source data files are local files, they must be stored under the same name and they must have identical contents. For example, to import a local file named /datasets/file1, Xcalar Design must be able to find /datasets/file1 on each node, and /datasets/file1 must contain the same contents across all nodes.
The following list describes the possible scenarios where importing a shared file system fails:
- The nodes use different mount points for the same NFS file system. For example, some nodes mount the file system as /mnt/dataset1 and some nodes mount the file system as /mnt/dataset2.
- All nodes have the same mount point (/mnt/datasets) but it points to different file systems. For example, /mnt/datasets on some nodes point to a file system on host1 and the same mount point on other nodes point to a file system on host2.
If you select hdfs:// to access data in an HDFS file system (for example, on a Hortonworks or Cloudera platform), verify that the host with the HDFS file system is accessible to all nodes of the cluster. Contact your Hadoop administrator to obtain the host name of the machine with the HDFS file system. You cannot use Xcalar Design to list all the hosts on which an HDFS file system is installed.
The S3 instance needs to be configured for S3 access. This is because Xcalar connects not only to your local S3 bucket, but also to any other S3 accounts that you have access to.
To configure S3, establish an SSH connection with each Xcalar cluster node. Then follow these steps:
- Make sure that you have installed the Amazon Web Services (AWS) Command Line Interface (CLI). If not, follow the instructions to install it at https://aws.amazon.com/cli/.
- To set up your credentials and default region, enter the following command:
The command displays several prompts for you to enter the access key, default region name, and so on. After you finish responding to the prompts, configuration files are generated in the correct locations for you.
For detailed information and other ways to configure S3, visit the following website:
If your data is in Microsoft Azure Storage, for each storage account, follow these steps to make the data in the account accessible to Xcalar:
- Use a web browser to go to your Azure Storage accounts management web page.
- Click your storage account name.
- Navigate to the shared access signature (SAS) page.
Set the SAS parameters as follows to generate the SAS token. Leave all other parameters not listed in the following table at their defaults.
Parameter Setting Start and expiry date/time Start time and end time for the time period when your Xcalar cluster can access the data Allowed protocols HTTPS only
After the SAS token is generated, copy the token (not the signature) to your computer's clipboard. You will need to paste the token in the next step.
The following is an example of a SAS token with a Start Time set to 2017-09-18 00:00;00 and End Time set to 2017-12-01 00:00:00:
Enter the following string in the /etc/xcalar/default.cfg file on each Xcalar cluster node. The SAS token is the token you copied in the preceding step.
AzBlob.<Azure Storage account name>.sasToken=<SAS token>
For example, enter the following string in the /etc/xcalar/default.cfg file:
In this example, the Azure Storage account name is myaccount. Be sure to use the appropriate account name for your Azure Storage.
- After you finish adding the SAS token for each storage account, save the /etc/xcalar/default.cfg file.
- Restart the cluster to make the change in the /etc/xcalar/default.cfg file take effect. For information about restarting the cluster, see Using Setup (Xcalar admin only)
If the data source is on a MapR cluster, follow these steps to enable authorized MapR cluster users to access the data from Xcalar Design:
- Verify that the Xcalar users who want to access data on a MapR cluster have their usernames and passwords for the MapR cluster.
- For each Xcalar cluster node, follow these steps to set up the MapR client:
Install the appropriate MapR client package from the MapR website. For example, install the following package from MapR:
Make sure that the client package version matches your MapR cluster version.
Enter the following command, which uses the MapR configure script, on the Xcalar cluster node to configure the MapR client against the MapR cluste:
/opt/mapr/server/configure.sh -c -C mapr-cldb-node -secure -N my-mapr-cluster
In the actual command, substitute your MapR cluster name for my-mapr-cluster.
Enter the following command to copy the cluster ssl_truststore to the client node:
scp admin@mapr-cldb-node:/opt/mapr/conf/ssl_truststore /opt/mapr/conf
(Optional) Follow these steps to verify that each user can log in to the MapR cluster:
Enter the maprlogin. The following example is for a user named myuser:
maprlogin password -user myuser
In the actual command, substitute the MapR user's username for myuser.
- Enter the MapR user's password for the MapR cluster.
- If the login is successful, you can be sure that the user can access the data on the MapR cluster from Xcalar Design.
Enter the following command to log out of the MapR cluster:
If data source is not shared
If your data is stored in a shared-nothing environment, you can still create a dataset on Xcalar for the data by using the Xcalar SDK. For more information about the SDK, contact Xcalar technical support.
Prepare the data in the source data files by following the instructions in this topic: