Parameterizing operations

This topic is applicable for Xcalar in operational mode only.

Xcalar Design supports parameterization of operations. This feature enables you to re-use an existing dataflow with customized values, eliminating the need for building a dataflow from scratch. Specifically, you can take advantage of an existing batch dataflow to accomplish the following goals:

  • apply the entire series of operations defined by a batch dataflow to another dataset.
  • substitute a filter function for an existing one in a batch dataflow.
  • name the export file created by a batch dataflow.

This section assumes that you have already created a batch dataflow and you are familiar with the concept of exporting a table. The following topics provide information about dataflows and exporting tables:

General steps for parameterizing an operation

If you want to use a system parameter, simply include the parameter in the operation. If you want to use a user-defined parameter, follow these general steps:

  1. Create a parameter.
  2. Pass the parameter to the operation.
  3. Assign a value to the parameter.

The exact steps might vary depending on the type of operation being parameterized.

Changing a batch dataflow without a parameter

 Xcalar Design allows you to customize a batch dataflow without creating a parameter. For information about customizing a batch dataflow through parameterization, see Passing a parameter . If you do not want to create a parameter, you can type the desired value where a parameter should be placed.

Recommendation: Define a parameter instead of entering the value directly because having a parameter provides a convenient way to update a particular value. You can also re-use the same parameter for other items in the dataflow. For example, you can use the same parameter in the data source path name and in the export file name to maintain consistency.

Parameter naming guidelines

When creating a parameter, remember to choose a name that describes the meaning of the parameter.

Creating a parameter

Parameters are created for and associated with each batch dataflow. They are not shared among batch dataflows. You can create for each batch dataflow as many parameters as needed.

Follow these steps to create a parameter:

  1. Click to display the Dataflows panel. Then select the batch dataflow that you want to parameterize.  The dataflow graph for the selected batch dataflow is displayed.
  2. In the dataflow graph, locate the operation you want to parameterize, right click the icon, and select Create parameterized operation.

    The Parameterize Operation modal window is displayed, as shown in the following screenshot.

  3. In the section for the current parameter list, type the name of the new parameter.
  4. Click SAVE.

Initially, the value for this newly created parameter is undefined. When you pass it to an operation, you must assign a value to it or specify that it has no value. For information about how to use a parameter, see Passing a parameter .

Displaying parameters for a batch dataflow

In the Parameterize Operation modal window, you can see the list of parameters created for the batch dataflow. Alternatively, follow these steps to display the parameters and their values:

  1. Click to display the Dataflows panel. Then select the batch dataflow to display the dataflow graph.
  2. Click Parameters in the dataflow graph's title bar to display the parameters table.

Deleting a parameter

NOTE: You cannot delete a parameter that is currently used for parameterizing an operation in a batch dataflow. If a parameter is shown with a value in the Parameterize Operation modal window, it is being used in an operation.

You can delete a parameter from the parameters table or from the Parameterize Operation modal window.

From the parameters table

Follow these steps to delete a parameter from the parameters table:

  1. Follow the steps in Displaying parameters for a batch dataflow to display the parameters table.
  2. Locate the parameter to be deleted and then click the Delete icon in the Action column.

From the Parameterize Operation modal window

Follow these steps to delete a parameter from the Parameterize Operation modal window:

  1. Click to display the Dataflows panel. Then select the batch dataflow that you want to parameterize.  The dataflow graph for the selected batch dataflow is displayed.
  2. In the dataflow graph, locate an operation that can be parameterized, right click the icon, and select Create parameterized operation.
  3. In the current parameters list, locate the parameter to be deleted.
  4. Move the cursor over the parameter name button, and then click the Delete icon in the upper right corner of the name button.

Passing a parameter

When you pass a parameter, you substitute the parameter value for the value originally used in the batch dataflow. You can pass a parameter at different stages of a batch dataflow:

  • when importing a data source
  • when performing a filter operation
  • when exporting a table to a file.

The exact steps for passing a parameter depend on what you want to parameterize.

Follow these steps to pass a parameter:

  1. In the batch dataflow graph, click one of the following icons and then the Create Parameterized Operation pop-up:

    • dataset icon.
    • filter operation icon.
    • export result icon.

    The Parameterize Operation modal window is displayed. The exact contents of the modal window depend on the icon from which you launch the modal window.

  2. Verify that the current parameter list contains the parameter you want to use. If not, create the parameter as described in Creating a parameter.
  3. In the Parameterize Operation section, drag and drop the parameter from the Parameter List. Where you drop the parameter depends on where you want the parameter value substitution to occur.
  4. Assign a value to each parameter that you drag and drop, as described in Assigning a value to a parameter.

    NOTE: Specifying an assigned value is mandatory for applying a parameter to an operation.
  5. Click SAVE.
TIP: You can drag and drop more than one parameter. Suppose you have a parameter named Month and a parameter named Year. To parameterize the export operation, you can drag and drop both parameters to the Export As field so that the month and year are concatenated to form a part of the export file name (such as to form a file name prefix).

Removing a parameter from an operation

Follow these steps to remove a parameter from an operation:

  1. Click to display the Dataflows panel. Then select the batch dataflow that you want to parameterize.  The dataflow graph for the selected batch dataflow is displayed.
  2. In the dataflow graph, locate the operation that uses the parameter, right click the icon, and select Create parameterized operation.

  3. In the Parameterize Operation modal window, locate the parameter in the Parameterize Operation section.
  4. Delete the parameter name in the operation. For example, to remove the Month parameter, delete the following string from the operation:

    <Month>

    After you delete the parameter from the operation, the parameter and its value are also removed from the parameters table in the modal window.

  5. Click SAVE.

Assigning a value to a parameter

The value assigned to a parameter is the value that will be used when the batch dataflow is run, instead of the original value used when the batch dataflow was defined.

EXAMPLE:  Suppose you created a parameter named Month, you can assign a value to it depending on the month in which the data in the dataset is collected. If the data is from the month of October, you can assign the string OCTOBER to the Month parameter, and then apply the Month parameter as the export file name. In this way the export file can contain the string OCTOBER (for example, export-airlines-OCTOBER.csv), which makes it obvious that the export file is created by a dataflow based on data collected in October.

Follow these steps to assign a value to a parameter:

  1. In the Parameterize Operation modal window, for each parameter that is applied to an operation, enter the parameter value in the Value column.

    For example, for the Month parameter, you can enter OCTOBER.

    IMPORTANT: When defining a parameter for a filter function, you can specify a string as the parameter value. For a string to be interpreted correctly, it must be enclosed in double quote marks. For example, if a column's data type is string, and you want to filter the string 10, you must enter "10" as the parameter value. Without the quote marks, the value is interpreted as an integer and the filter operation does not produce the expected results.

    If you want the parameter to have no value, leave the Value column blank and click the check box in the No Value column.

  2. Click SAVE.

EXAMPLE: To parameterize the export folder name, you can define two parameters for the name suffix. If you create firstDigit and secondDigit as the parameters, you can specify the export folder name in the following format: 

airlines<firstDigit><secondDigit>

When the suffix requires only one digit, you assign no value to firstDigit and a numeral to secondDigit. Doing so creates folders named airlines1, airlines2, and so on. When the suffix requires two digits, you assign a numeral to both firstDigit and secondDigit. Doing so creates folders named airlines11, airlines12, and so on.

Example of parameterization

The following example illustrates how a batch dataflow is parameterized to create different export files based on the same batch dataflow.

The batch dataflow named FlightInfo consists of these operations:

  • Xcalar Design imports a data source containing flight information from various air carriers and creates a table with four columns, such as day of the week, air time, and so on.
  • Smart type casting is used to cast the correct data type for each column.
  • A filter is used to include only rows from Sunday (1 in the DayOfWeek_integer column).
  • An export file is created with all the columns.

Parameterizing the data source

You can parameterize the data source in a batch dataflow, which enables you to substitute a new pathname to the data source for the one currently used. You must, however, make sure that the parameterized data source has the same schema as the original data source.

Follow these steps to parameterize the data source:

  1. In the FlightInfo dataflow graph, click the first icon (dataset icon) and then click Create parameterized operation. The Parameterize Operation modal window is displayed.
  2. In the Parameterize Operation modal window, create a parameter named AnnualAirlinesData.
  3. In the Parameterize Operation modal window, drag and drop the AnnualAirlinesData parameter to the File Path field in the Parameterized Operation section, which is used as the file name suffix. The file from this path will be the source file that Xcalar imports when running the batch dataflow.
  4. Assign an appropriate value to the AnnualAirlinesData parameter. For example, if you want to import data from a source file with 2016 as the file name suffix, enter 2016 as the parameter value.
  5. (Optional) If the original dataflow includes a matching pattern for selecting files in a data source folder during the import, the pattern is displayed in the Pattern field. You can remove or modify the pattern for the parameterized batch dataflow. To use a regular expression as the pattern, precede the pattern with the re: string.

    EXAMPLE: Suppose the batch dataflow imports from the folder at file:///datasets/flights and you want to import only from files with names matching a regular expression, enter re:airlines(A|B) as the pattern. When the batch dataflow is run, only files named airlinesA.csv and airlinesB.csv are imported to Xcalar; files named airlinesU.csv and airlinesX.csv are not. The following screenshot illustrates how to enter a regular expression when parameterizing a data source.

  6. Click SAVE. The first icon in the batch dataflow graph is displayed with a green border to indicate that it is parameterized.

Parameterizing the filter operation

To run the same batch dataflow but with other filtering criteria, follow these steps:

  1. In the FlightInfo dataflow graph, click the filter operation icon and then click Create parameterized operation. The Parameterize Operation modal window is displayed.
  2. In the FlightInfo dataflow graph, click the filter operation icon and then click Create parameterized operation. The following modal window is displayed:

  3. In the Parameterized Operation section, define the filter operation you want to use for this batch dataflow as follows:

    1. Select the table column for the filtering. In this example, use the default value, which is the same table column used by the current batch dataflow.
    2. Click the field following the word by. A list of filter functions is displayed. Select eq from the list.
    3. Drag the Day parameter from the parameter list and drop it in the last field of this section.

    This operation filters the DayOfWeek_integer column to keep only those rows with a value equal to the value of the Day parameter.

  4. For the Day parameter, type a number representing the day of week in the Value column. In this example, type 2 for Monday.
  5. Click SAVE. The filter operation icon in the batch dataflow graph is displayed with a green boarder to indicate that it is parameterized.

Parameterizing the export operation

You might want to parameterize the export operation so that the export file is named in a way meaningful to you.  You can name the export file based on the type of filter operation done in the batch dataflow. For example, if the exported table contains only rows for flight data from Monday, you can parameterize the export operation to include the string Monday in the export file name.

Follow these steps to parameterize the export operation:

  1. In the FlightInfo dataflow graph, create a parameter named DayOfWeek.
  2. In the FlightInfo dataflow graph, click the exported operation icon and then click Create parameterized operation. The following modal window is displayed:

  3. Drag and drop the DayOfWeek parameter to the Export As field, which determines the export file name. You can include other characters in this field in addition to the parameter. For example, you can specify the following in the Export As field:

    airlines-<DayOfWeek>.csv

  4. For the DayOfWeek parameter, type the string that will be used as a substitution in the export file name. In this example, type Monday.
  5. Click SAVE. The exported table icon in the batch dataflow graph is displayed with a green border to indicate that it is parameterized. When you run this parameterized batch dataflow, the resultant table contains only flight information for Monday, and the table is exported to the file with a parameterized name.
Recommendation: Before you run the batch dataflow, click the Create parameterized operation pop-up for each parameterized icon to verify that the parameters are defined properly.

Go to top