Scheduling batch dataflows

Xcalar in modeling mode cannot run batch dataflows. This topic applies to Xcalar in operational mode only.

After you create a batch dataflow, you can run the dataflow at regular intervals.

NOTE:  When you manually run a batch dataflow, the result can be an exported target or table. However, running a batch dataflow automatically based on a schedule can only result in an exported target; it cannot create a table in the active worksheet.

Preparing a batch dataflow to run on schedule involves two steps:

  1. Parameterize the export operation.
  2. Create a schedule for the batch dataflow.

Parameterizing the export operation

If Xcalar detects that an export file of the same name already exists, the batch dataflow fails to run. Therefore, you must parameterize the export operation so that repeated runs of the batch dataflow can create export files under new names.

Follow these steps to parameterize the export operation:

  1. In the dataflow list, click the name of the batch dataflow to display it.
  2. Right click the last icon of the batch dataflow to display a pop-up menu. This icon represents the result of the batch dataflow.
  3. Click Create Parameterized Operation in the pop-up menu.
  4. If the Export As field is not filled in, click the Default value button to automatically add the default export file name in the field, as shown in the following screenshot. Alternatively, you can copy and paste the original folder name.

  5. In the Parameterize Operation window, click and drag the N parameter to the file name shown in the Export As field. This parameter is a system parameter. Its initial value is 0 and is incremented by 1 each time the operation is run.

    EXAMPLE: If the export file name is export-airlines#XS366.csv, you can drag the N parameter to the name so that the name becomes export-airlines<N>#XS366.csv. Running the batch dataflow the first time creates a file named export-airlines0#XS366.csv, running the batch dataflow the second time creates a file named export-airlines1#XS366.csv, and so on.
  6. If the Target field is empty, click the Default value button to automatically add the default target name in the field. Alternatively, you an copy and paste the original target name.

  7. Click SAVE.

For more information about parameterization, see Parameterizing operations.

Specifying a schedule

Follow these steps to create a schedule for a batch dataflow:

  1. In the dataflow list, click for the selected batch dataflow as illustrated in the following screenshot.

    The Create New Schedule panel is displayed.

  2. You can specify a schedule in either simple mode, as described in this step, or in advanced mode, as described in step 3. (Advanced mode is recommended only for users familiar with the cron utility on UNIX systems.)

    Specify the following information in the Create New Schedule panel. For more information about how to specify the information, see the screenshot following the list.

    • The start time, which is set by a calendar and a 12-hour clock. It determines the first time the batch dataflow will be run. The time is in UTC; it is not your browser time. You must specify a time in the future. To check the current UTC time, see the time displayed in the upper right corner of the window.
    • How often the batch dataflow is run.

    Go to step 4.

  3. To specify a schedule in advanced mode, enter a cron expression in the Schedule field. Then click SIMULATE. The Last Run and Next Run fields display the times when two consecutive runs can take place. These times help you visualize the interval between any two runs if the batch dataflow is executed according to your schedule.

    IMPORTANT: The cron expression in this window allows only asterisk, digit, forward slash, comma, and hyphen. Do not use other characters even though they are valid on other UNIX systems.

    In the following example, the batch dataflow runs every minute, every day, starting at 4 p.m. until 4:59 p.m. The time specified in the cron expression is the server time.

  4. Click SAVE. The Schedule Detail window is displayed. This schedule is now associated with the batch dataflow, and it takes effect immediately. The schedule is persistent across cluster restarts, but is not retained when you download the batch dataflow.
NOTE: After you create a schedule, you cannot modify it. If you want to apply a different schedule to the batch dataflow, delete the schedule and create a new one.
IMPORTANT: The batch dataflow can run only if the export file name is parameterized. If you have not parameterized the name already, follow the instructions in Parameterizing the export operation.

Pausing a batch dataflow schedule

After you create a batch dataflow schedule, you can click PAUSE in the Schedule Detail window to suspend the schedule. If you click RESUME after the schedule is paused, the schedule takes effect again.

Deleting a batch dataflow schedule

Follow these steps to delete a batch dataflow schedule:

  1. In the dataflow list, click the name of the batch dataflow.
  2. Click DELETE in the Schedule Detail window.
  3. When the confirmation message is displayed, click CONFIRM. (You cannot undo a schedule deletion.)

Displaying run information

If a schedule exists for a batch dataflow, clicking the batch dataflow name in the Dataflows panel displays a history of the runs.

The following screenshot shows an example of the run history. To update the result, click REFRESH.

For each run, a color dot precedes the run number to indicate status. The following list explains the meaning of the color:

  • Green means the run is successful.
  • Orange means the run is in progress.
  • Red means the run fails.

Go to top