|Creation date||Minimum version|
Before the data modeling process, you might want to determine whether all records in your dataset comply with the business rules set forth by your organization. A business rule is a statement specifying the constraint that a record must not violate for the record to be considered valid. For example, for analyzing flight information, your organization might have a business rule stating that data collected on Sunday is not valid, or that a carrier code from a defunct airline must not be analyzed, and so on. A business rule inventory is a collection of rules.
After separating out the records that violate the business rules, you can focus your data modeling effort on records that are compliant. Alternatively, you can choose to fix the errors in the records that violate the rules.
The BRI extension creates two UDFs for you to use as Map functions. One UDF identifies the records in a table that violate the business rules defined in a data file. Another UDF displays a description for each violation according to the rule specifications. The latter helps you determine what further actions should be taken for the violations.
This document describes how to use the BRI extension available in the Extensions Marketplace.
The major steps for using a business rule inventory against your data are as follows:
- Create a BRI data file that specifies the business rules.
- Create a table in Xcalar Design from the BRI data file.
- Execute the BRI extension to create two UDFs.
- Apply the first UDF to a target table to locate the rule violations.
- (Optional) Apply the second UDF to obtain a description of each violated rule.
- Derive additional information from the UDF results.
Creating a BRI data file
Before invoking the BRI extension, create a file in which you define the business rules. Because this file will be used by Xcalar Design to create a table, save the file in a format that Xcalar Design can process, such as CSV or Excel.
This file must contain the fields shown in the following table. Enter the field names in the first row. It is important that you use the exact field names, which are not case-sensitive. Then enter one business rule per row in the file.
|Rule Description||A comment explaining the purpose of the rule (for example, "A person's age cannot be a negative number.")|
A Python expression that specifies the rule. You must use a syntax suitable for a string. For example, the following expression evaluates to True if the DayOfWeek argument is not equal to "7":
DayOfWeek != "7"
|Rule Id||An ID that references the rule (for example, R00001).|
|Argument Names||Name of argument used in the rule string, which will be replaced by the argument passed to the UDF when the UDF is executed.|
|Category||The category to which a group of rules belong (for example, a category named Currency is for all the rules that are related to currency).|
The following Excel file is an example of a BRI data file specifying the business rules for airlines data:
Creating a BRI table in Xcalar Design
After you are finished with all the rules, use Xcalar Design to point to the BRI data file to create a table. In the Point to Data Source window, follow these steps:
- Select Promote the first row as header so that the resulting BRI table has the Rule Id, Rule Category, Rule Description, Rule String, and Argument Names as column headers.
- If the BRI data file is a CSV file, you must leave the Quoting Character field empty.
- Click CREATE DATASET.
- In the Dataset Preview window, select all columns and then click CREATE TABLE. The BRI table is created in the active worksheet and is displayed with all the columns from the BRI data file.
Running the BRI extension
Follow these steps to create the business rules inventory UDF:
- From the BRI table, invoke the My Extensions panel. Then start the Business Rules Inventory extension in the BizRules module.
- In the My Extensions panel, click the arrow in the BRI Table field to display a drop-down menu, from which you can select the BRI table.
- Click APPLY. A message appears to show that the BRI extension successfully creates and uploads the business rules inventory UDF to the cluster. The module name for the UDF includes the user name to indicate who creates the UDF (for example, bri_user1:business_rules_inventory), as shown in the following screenshot. Click CLOSE to dismiss the message.
Now the UDF is ready for you to use on any target table in your workbook to check for business rule violations.
Applying the business rules inventory UDF
To determine whether records in a particular table violate any rules specified in a BRI table, apply the business rules inventory UDF as a Map function as follows:
- From a column drop-down menu, start the MAP panel.
- In the MAP panel, select user-defined as the category, and then select the business rules inventory UDF (for example, bri_user2:business_rules_inventory).
- For each column that you want to test against a business rule, enter the column name to map it to the argument name in the BRI table. For example, suppose an argument name in the BRI table is DayOfWeek, and a rule is specified to determine whether its value is equal to 7. You can map the column name DayOfWeek_integer to it. This means that the rule specified for the DayOfWeek argument in the BRI table will be applied to the column named DayOfWeek_integer.
- In the New Resultant Column Name field, accept the default name or enter a new name for the resultant column. This column contains the resultant values created by the UDF, which indicate whether a rule is violated.
- Click MAP.
The UDF creates a resultant column. For each row in this column, the results for applying all the rules are returned in a comma-delimited string. The results for rule 1 will be the first value in the string, the second at the next position, and so on. A 0 means that the row is valid. A violation is denoted with the Rule Id instead of a 0.
Applying the convert from rule ID UDF (optional)
If a row shows a rule violation, you can see the Rule Id in the column created by the business rules inventory UDF. However, it would be more useful if a description of the violated rule is displayed. Follow these steps to create another column to display information about the business rules:
- From the resultant column created by the business rules inventory UDF, start the MAP panel.
- In the MAP panel, select user-defined as the category, and then select the convert from rule ID UDF (for example, bri_user2:convertFromRuleId).
- In the rules field, make sure that the column is the one containing the results from the business rules inventory UDF.
- In the New Resultant Column Name field, accept the default name or enter a new name for the resultant column. This column contains the descriptions corresponding to the rules that are violated.
- For example, in a table for airlines data, if a value in the DayOfWeek_integer is 7, and a rule specifies that day 7 is invalid, the following description is displayed:
Planes don't fly on Sunday (day 7).
If more than one rule is violated, the descriptions for the corresponding rules are separated by a comma. For example:
Planes don't fly on Sunday (day 7).,The carrier code ZZ is not valid.
Deriving additional information from the UDF results
The resultant column created by the business_rules_inventory UDF might contain multiple Rule Ids indicating which rules are violated. It is desirable to separate the Rule Ids into individual columns so that you can perform further modeling on each rule violation. For example, the column might contain the following string, showing that the record violates one of two rules:
Suppose you are interested in records that violate rule R00002, you can create a column showing only the results of R00002 as follows:
- In the resultant column created by the business_rules_inventory UDF, start the MAP panel.
- Select the cut function in the string category.
- In the field number field, type 2 because the field you are interested in, R00002, is the second field.
- In the delimiter field, type a comma (,).
- In the New Resultant Column Name field, accept the default or enter a name for the resultant column.
- Click MAP.
A new column is created with some cells containing a 0, meaning that the record is valid, and some cells containing R00002, meaning that the record violates rule R00002.
Repeat this procedure for each rule ID as necessary.
For each column created, you can obtain different types of information in the following ways:
- Use Profile to see the number of violations for each rule. You can also see the percentage of records that violate the rule in the Profile modal window. Profile is available from the column drop-down menu.
- Filter or exclude the rows that violate the rule. You can access the Filter this value or Exclude this value option by right clicking a cell in the column.
Optionally, to display a description for each Rule Id, you can run the convertFromRuleId UDF on each column. For example, in the column showing the R00002 violations, you can invoke the convertFromRuleId UDF to display the description of R00002. The following is an example of the description:
The carrier code ZZ is not valid.