User-defined function (UDF) examples
This section provides examples for these UDF typesf:
- UDF for a database operation.
- UDF for importing data, which is called an import UDF.
- UDF for exporting data, which is called an export UDF.
Example of a UDF for a database operation
Suppose you have the following table in Xcalar Design showing some events and their dates, and you want to add a column that shows the day of week for each event. You can write a UDF to create the Weekday column.
Coding the UDF
Write the following code in a module named calendar:
def getWeekday(column, format):
parsedTime = time.strptime(column, format)
weekday = time.strftime("%A", parsedTime)
The getWeekday function takes a column and a date-time format string as input. The Python built-in time module, which is imported into calendar, parses the input. The output from calendar is a new table with the Weekday column showing the day of week for each event.
After you upload the calendar module, it is available for all Xcalar Design users to use.
Running the UDF
In Xcalar Design, start a Map operation from the Date column. Select user-defined as the category and then select calendar:getWeekday as the function. Specify the Date column ($Date) as the first argument and "%Y-%m-%d" as the second argument.
The process for running a UDF is the same as other database functions. For more information about the Map operation, see Using the Map function to create new values.
Result of running the UDF
The following table is created with an additional column showing the day of week for each date:
Example of using an import UDF when importing
If you want to import data not natively supported by Xcalar, you can run an import UDF that converts the data format as Xcalar imports the data. In the following example, the data source is an XML file, which you can convert into JSON with an import UDF.
Coding the import UDF
Write the following code in a module named convertxmltojson. The source data to be converted is shown as comments in this example to illustrate how the UDF works.
# Sample XML document # <?xml version="1.0" encoding="UTF-8"?> # <bookstore> # <book category="cooking"> # <title lang="en">Everyday Italian</title> # <author>Giada De Laurentiis</author> # <year>2005</year> # <price>30.00</price> # </book> # <book category="children"> # <title lang="en">Harry Potter</title> # <author>J K. Rowling</author> # <year>2005</year> # <price>29.99</price> # </book> # <book category="web"> # <title lang="en">Learning XML</title> # <author>Erik T. Ray</author> # <year>2003</year> # <price>39.95</price> # </book> # </bookstore> import xmltodict import json def xmlToJson1(fullPath, xmlStream): allJson = xmltodict.parse(xmlStream.read()) for book in allJson["bookstore"]["book"]: yield book
This module imports a third-party module named xmltodict and defines a function named xmlToJson1. This function has two parameters:
- The xmlStream parameter is a string containing the full contents of the data source.
- The fullPath parameter is a string, which is the full pathname to the data source.
The return value of the function will be used in the place of the actual file contents.
Upload this module so that it is available for execution.
Running the import UDF
In the Import Data Source window, after you specify the pathname to the data source, follow these steps:
- Select the Parse Data With UDF check box.
- Select the convertxmltojson module.
- Select the xmlToJson1 function.
For more information about importing a data source, see Importing data from a data source.
Result of running the import UDF
Suppose the data source is a file named bookstore. Its path, which you specified in the Data Source Path field, is passed to the UDF as the fullPath parameter. The contents of bookstore are passed to the UDF in xmlStream. The function returns a string containing the corresponding JSON data being imported. If you create a table with all columns from the dataset, your worksheet contains the following table:
The following example illustrates how to create and run a UDF to export a table (either from a worksheet or a batch dataflow) to Excel files instead of csv files,
Coding the export UDF
The sample code provided in this section can be saved in a module (for example, write_to_excel). The following list provides information explaining the code, which helps you write your own export UDF:
- There must be a function named main in the module. This function is run when you export data to this UDF.
- The main function must take one argument, which is the stringified JSON with these fields: filePath and fileContents.
- /nas/exports_excel/carriers is the path specified in the Additional Information section of the Create Export Target modal window. It specifies the folder containing the export files.
- carriers (after the last path separator) is the same as the folder name.
- nXXXX-cXXXX-pXXXX is an autogenerated string to ensure that each export file name in the folder is unique. The X in this example is replaced with a numeral in the actual name.
- csv is the default file extension. The sample UDF in this section replaces the csv extension with xlsx.
- The fileContents parameter is a string representing the output string as though it were to be written as a csv.
The filePath parameter takes the export file system path specified during the export target creation and then creates an export file name based on the path. The following path is an example of an export file name:
The following list explains the components of the path:
For example, if you were to write a UDF that creates a file at filePath and writes the file contents into it without modifications, the file would be identical to the csv file exported directly without a UDF.
The line and field delimiters in the fileContents string obey the specifications in the Advanced Options section of the EXPORT TABLE panel, which you entered before starting the export operation.
import openpyxl import json import os def main(inStruct): # Get the fileName and fileContents from the argument s = json.loads(inStruct) fileName = s["filePath"] fileContents = s["fileContents"] # Create excel workbook wb = openpyxl.Workbook() # File name removes the .csv and replaces it with .xlsx folderName = fileName[:fileName.rfind("/")]; # Create the folder if it doesn't already exist if not os.path.exists(folderName): os.makedirs(folderName) fileName = fileName.replace(".csv", ".xlsx") ws = wb.active fieldDelim = "\t" # Change this to match the delimiter that you chose in # the export panel rowNum = 0 for row in fileContents.split("\n"): rowNum += 1 colNum = 0 for col in row.split(fieldDelim): colNum += 1 ws.cell(column=colNum, row=rowNum, value=str(col)) wb.save(filename=fileName)
Creating the export target
For detailed information about how to create an export target, see Creating an export target. The following list provides an outline of the steps:
- Click the Export Targets icon to display the Export Targets panel.
- Type the target name. For example, you can specify ExcelTarget as the target name.
- Select UDF as the export target.
- In the Additional Information section, enter the path to the file system in which the UDF will write its output. For example, enter /nas/exports_excel, which is a shared directory that
all nodes of the Xcalar clustercan write to.
- Select the write_to_excel module, and leave the UDF function name as main.
- Click ADD.
Running the export UDF
The UDF is run when you export data from Xcalar to the target named ExcelTarget.
As a result of exporting data from Xcalar to Exceltarget, files are written to your specified file system. In this example, one or multiple files with the .xlsx file extension are created in the /nas/exports_excel folder. Each Excel file contains a number of rows from the Xcalar table. (A Xcalar table can be much larger than a typical Excel file. Splitting the table into smaller files not only speeds up the export operation, but also makes the exported data easier to read.)