User-defined functions (UDFs) are programs that accept parameters and perform an operation to produce a value. Some built-in UDFs are included in Xcalar default modules for you to enhance your operations. You can create additional UDFs to further extend the functions of Xcalar.
When to run UDFs
You can run UDFs in these circumstances:
- When you execute a Map operation, you can specify a UDF as a Map function.
- When you import a data source, you run a UDF to parse the data. A UDF run during an import is called an import UDF.
When you export data from Xcalar, you run a UDF to process the data as it is written to a target. A UDF run when data is exported is called an export UDF.
Understanding UDF names
Functions are defined in modules. Therefore, to uniquely identify a UDF, both the module name and function name are needed. Xcalar Design shows UDF names in the following format:
For example, if a module named calendar contains a function named getWeekday, the name for the UDF is calendar:getWeekday.
Xcalar Design does not have restrictions on function names, but modules must follow the module naming conventions.
- Characters must be a-z, 0-9, hyphen (-), and underscore (_).
- Letters must be lowercase.
- First character must be a letter or an underscore.
Limitations and restrictions
The following list describes the UDF limitations and restrictions:
- UDFs must be written in Python.
- The Python version supported by Xcalar is 2.7. Do not use code in the UDF that requires a Python 3 interpreter.
- The data type for values returned by a UDF is always String. (UDFs, however, can accept arguments of different data types.)
- Each UDF can have up to 16 parameters.
- Python code errors occurring during operations are not reported in Xcalar Design.
- UDFs cannot be used for aggregate operations.
About importing Python libraries
Python libraries imported in the UDF must be installed on every node before you can upload the UDF. For example, if your UDF imports the PDFminer package, the package must be installed on every node of the cluster. Otherwise, your attempt to upload the UDF results in an upload error, indicating that there is no module named PDFminer.
About standard output in UDFs
All functions and statements that involve writing to standard output (stdout) are ignored by the Python interpreter. Do not use the print statement or the sys.stdout.write function to print debugging information in your UDF. If collecting debugging information is important to you, Xcalar recommends that you use the Python logging facility, which is a standard Python library module. See your Python documentation for information about the Python logging facility.
Invoking modules or functions in a module
You can invoke third-party Python modules in UDFs, but they must be installed at a location accessible to the Python interpreter. That is, they must be on the Python search path.
The following list describes the restrictions on invoking UDFs:
- In a UDF, you cannot invoke another uploaded module. For example, if you created and uploaded a module named calendar, you cannot invoke calendar in another module named calendar2.
- You can nest functions in a Python module. However, the nested function cannot be called as a UDF in Xcalar Design. For example, if the calendar module defines a function named function1, which in turn defines a function named function2, the UDF named calendar:function1 is available for you to use in Xcalar Design, but function2 is not.
How to enter input for a UDF
A UDF can accept input just as other Map functions, which you enter in the MAP panel.
- By default, the name of the column from which you start the function is displayed as an argument. If you want to use another column as an argument, you can change the argument to the name of another column.
- When typing a column name, always start with the dollar sign ($).
- When typing a column name, always include the prefix if one exists. (A derived column does not have a column name prefix.)
- If the panel calls for the resultant column name, type the name without preceding it with the dollar sign ($).
- To avoid mistakes when typing a column name as an operand, place the insertion point in the argument field, move your cursor to the active table and click anywhere in the desired column. The column's name is automatically entered in the operand field.
- If an operand accepts a column name, it also accepts the aggregate name that you created when you calculated the aggregate value for a column.
- When typing a decimal number for an operand, do not start with the decimal point. For example, type 0.5 instead of .5.
- If necessary, click in the panel to bring the specified table into focus. You can then scroll the table to make the desired column visible.
- When typing a string, do not enclose the string with double quote marks (").
When typing a string as input, escape the following characters with a backslash (\) if you want the characters to be interpreted literally:
- backslash (\)
- double quote mark (")
- dollar sign ($) if it is the first character
- caret (^) if it is the first character
- You can undo any typing with the undo key or key combination that you normally use in your browser.
When you enter a string that cannot be interpreted as another data type (for example, airlines), do not surround the string with quote marks. If the input can be interpreted as a numeric or Boolean value, surround the input with quote marks to indicate that the input is meant be a string. In the following examples, the input is interpreted as a string:
Without the quote marks, 3 is interpreted as an integer, and true is interpreted as a Boolean value.