Understanding UDFs

User-defined functions (UDFs) are programs that accept parameters and perform an operation to produce a value. Some built-in UDFs are included in Xcalar default modules for you to enhance your operations. You can create additional UDFs to further extend the functions of Xcalar.

When to run UDFs

You can run UDFs in these circumstances:

  • When you execute a Map operation, you can specify a UDF as a Map function.
  • When you import a data source, you run a UDF to parse the data. A UDF run during an import is called an import UDF.
  • EXAMPLE: If you want to import data in a format not natively supported by Xcalar, run an import UDF to convert that particular format into a natively supported format.
    EXAMPLE: If your data source uses a comma in field names, which is an illegal field name character in Xcalar Design, run an import UDF to replace the comma when importing the data source.
  • When you export data from Xcalar, you run a UDF to process the data as it is written to a target. A UDF run when data is exported is called an export UDF.

Understanding UDF names

Functions are defined in modules. Therefore, to uniquely identify a UDF, both the module name and function name are needed. Xcalar Design shows UDF names in the following format:

module_name:function_name

For example, if a module named calendar contains a function named getWeekday, the name for the UDF is calendar:getWeekday.

IMPORTANT: For an export UDF, the function you want to run during the export must be named main. For example, you can have a module named my_module containing several functions. But there must be a function named main in my_module that will be executed when you export data to the UDF. If the module does not contain a function named main, you cannot select the module as an export target.

UDF naming conventions

Xcalar Design does not have restrictions on function names, but modules must follow the module naming conventions.

Limitations and restrictions

The following list describes the UDF limitations and restrictions:

  • UDFs must be written in Python.
  • The Python version supported by Xcalar is 2.7. Do not use code in the UDF that requires a Python 3 interpreter.
  • The data type for values returned by a UDF is always String. (UDFs, however, can accept arguments of different data types.)
  • Each UDF can have up to 16 parameters.
  • Python code errors occurring during operations are not reported in Xcalar Design.
  • UDFs cannot be used for aggregate operations.
NOTE: UDFs have the same read and write privileges as the user that starts Xcalar. UDFs do not need to have read or write access to the file system including the data source from which Xcalar imports data.

About importing Python libraries

Python libraries imported in the UDF must be installed on every node before you can upload the UDF. For example, if your UDF imports the PDFminer package, the package must be installed on every node of the cluster. Otherwise, your attempt to upload the UDF results in an upload error, indicating that there is no module named PDFminer.

About standard output in UDFs

All functions and statements that involve writing to standard output (stdout) are ignored by the Python interpreter. Do not use the print statement or the sys.stdout.write function to print debugging information in your UDF. If collecting debugging information is important to you, Xcalar recommends that you use the Python logging facility, which is a standard Python library module. See your Python documentation for information about the Python logging facility.

Invoking modules or functions in a module

You can invoke third-party Python modules in UDFs, but they must be installed at a location accessible to the Python interpreter. That is, they must be on the Python search path.

The following list describes the restrictions on invoking UDFs:

  • In a UDF, you cannot invoke another uploaded module. For example, if you created and uploaded a module named calendar, you cannot invoke calendar in another module named calendar2.
  • You can nest functions in a Python module. However, the nested function cannot be called as a UDF in Xcalar Design. For example, if the calendar module defines a function named function1, which in turn defines a function named function2, the UDF named calendar:function1 is available for you to use in Xcalar Design, but function2 is not.

How to enter input for a UDF

A UDF can accept input just as other Map functions, which you enter in the MAP panel.

When you enter a string that cannot be interpreted as another data type (for example, airlines), do not surround the string with quote marks. If the input can be interpreted as a numeric or Boolean value, surround the input with quote marks to indicate that the input is meant be a string. In the following examples, the input is interpreted as a string:

  • "3"
  • "true"

Without the quote marks, 3 is interpreted as an integer, and true is interpreted as a Boolean value.

 

Go to top