Managing datasets

When you import data, Xcalar Design creates a dataset (typically in memory), which is the representation of raw data and metadata. You can preview the data in the dataset and create tables with all or selected fields in the dataset.

Understanding dataset names

Xcalar Design provides a default name for each dataset when you import data. The name is based on the name of the source file or directory. You can enter a different name or accept the default. You cannot rename the dataset at a later time.

Understanding how datasets are shared

Datasets are shared by all users. The Datasets panel displays all the datasets available for you to use. The following partial screenshot shows an example of the Datasets list.

The rest of this topic discusses what you can and cannot do to datasets created by other users.

Organizing datasets in folders

In the Datasets panel, you can organize datasets created by you in folders. For example, you can create a folder named Finance for datasets that import data sources from the Finance department, another folder named Engineering for datasets that import data sources from the Engineering department, and so on.

Understanding dataset locking

Any user can lock and unlock a dataset. A locked dataset is a dataset that cannot be deleted by any user, including the creator. The following icon is for a locked dataset named airlines:

While datasets are shared, users manage their own locks independently. For example, User1 can place a lock on the airlines dataset, but User2 can choose not to place a lock on it.

A dataset icon that does not include a lock means that you have not locked the dataset. It does not indicate whether other users have locked it.

You place a lock on a dataset to prevent other users from deleting it. The reason for doing so is that while you currently do not have tables based on the dataset, you want to ensure that it is available for future use.

When a dataset becomes locked

After a user creates a dataset, by default, it is locked. The creator, who owns the lock, can choose to keep or remove the lock.

Because datasets are shared by all users, other users can see and use this dataset, which is shown as unlocked initially. They can choose to lock it or keep it unlocked.

Xcalar automatically locks a dataset when a user creates a table from it. As long as you have a table (active, temporary, or hidden) or an aggregate dependent on a dataset, the dataset is automatically locked for you. You cannot remove the lock unless you remove all tables and aggregates that depend on the dataset.

Determining who has a lock on a dataset

As long as one lock exists for a dataset, the dataset cannot be deleted by any user. To view who has a lock on a dataset, right click the dataset icon and select Get info in the menu.

Understanding dataset deletion

You can delete a dataset whether or not you are the creator of the dataset. The only prerequisite for dataset deletion is that no one has a lock on the dataset.

Effects of workbook status on dataset locks

Suppose the user who locks a dataset has only inactive workbooks, other users can successfully delete the dataset. The dataset lock is not in effect when the user with the lock has only inactive workbooks. However, after the user with the locked dataset activates a workbook, the dataset becomes available again.

If the user who locks a dataset has a paused workbook, the lock is in effect as usual. No users can delete the locked dataset.

Locking and unlocking a dataset

Follow these steps to lock a dataset:

  1. In the Datasets panel, locate the dataset in your folder or another user's folder.
  2. If the Datasets panel is in grid view, right click the dataset icon and click Lock dataset. If the Datasets panel is in list view, click the closed lock icon for the dataset.

Follow these steps to unlock a dataset:

  1. Verify that you do not have tables or aggregates dependent on the dataset.
  2. In the Datasets panel, locate the dataset in your folder or another user's folder.
  3. If the Datasets panel is in grid view, right click the dataset icon and click Unlock dataset. If the Datasets panel is in list view, click the open lock icon for the dataset.

Deleting a dataset

Follow these steps to delete a dataset:

  1. In the Datasets panel, locate the dataset in your folder or another user's folder.
  2. Right click the icon or name of the dataset and then click Get info in the drop-down list to determine if the dataset is locked by any user.
  3. From the drop-down list, click Delete.
  4. When a confirmation message is displayed, click CONFIRM if you are sure that you no longer need the dataset.

What you cannot do to other users' datasets

You can preview and create tables from other users' datasets. You can also delete other users' datasets provided that they are not locked by anyone. However, you cannot perform the following tasks:

  • Renaming the folder containing other users' datasets.
  • Moving other users' datasets to another folder.
  • Creating a folder in another user's dataset folder.

Effect of cluster restart on datasets

After the cluster is restarted, some datasets previously in your list of datasets might not be displayed. Only the datasets used in your active workbook are shown immediately after the cluster was restarted.

NOTE: The datasets still exist; they are available for you to use again when needed.

A dataset appears in the list again when:

  • you activate a workbook that uses the dataset, or
  • a batch dataflow is run against the dataset
EXAMPLE: You create a dataset called airlines, which is used in workbook1, and a dataset called carrier, which is used in workbook2. Both datasets are visible in the list of datasets, regardless of which workbook is active. After a cluster restart, if workbook1 is active, only the dataset named airlines is displayed. To also display the dataset named carrier, activate workbook2.

Go to top