Grouping data in a table

You can use the Group By function in a similar way as you would with the Group By clause in SQL.

Understanding the Group By operation

Before starting the Group By operation, determine what aggregate functions to use for the calculation. For example, when you use a table containing airlines information, you might be interested in the average arrival delay time and the sum of distance. You can select more than one function on which Xcalar performs the Group By operation.

You also select the columns to group on. For example, from the airlines table, you might want to see the arrival delay time and the sum of distance, which are grouped by the day of the week and the carrier name. In the resultant table, you can see the aggregate function results (average arrival delay and total distance) for each column that is grouped on. For example, one row of the resultant table might show that on day 1, the total distance flown by the carrier named XE is 37845 with an average arrival delay of 58.2.

Aggregate functions supported

The following aggregate functions are supported:

  • Average
  • Count
  • List aggregate
  • Maximum
  • Maximum integer
  • Minimum
  • Minimum integer
  • Sum
  • Sum integer

Understanding the listAgg (list aggregate) function

The listAgg function is a string function. It concatenates all strings in a column. The maximum number of characters in a result is 16,000. If the result exceeds this limit, the following error message is displayed:

Error: Value too large for defined data type

This function ignores null strings. For example, if a null string is concatenated with the string ABC, the result is ABC.

Grouping data by columns

Follow these steps to group data by one or multiple columns:

  1. Right click the header of the column that you want to group by to display the drop-down menu. Then select Group By….

  2. (Optional) In the GROUP BY panel, click ADD ANOTHER COLUMN to group on an additional column. For example, in the table about airlines, you can start the Group By function from the Carrier column and add the DayOfWeek column as a second column to group with. In the resultant table, data is grouped by both the Carrier and DayOfWeek columns.
  3. In the GROUP BY panel, follow these steps to select the function to perform for the Group By operation:

    1. Select the aggregate function.

    2. Enter the field name to which the function applies. For example, in the table for airlines information, you can specify the avg function be applied to the Distance field. This function calculates the average distance for the column being grouped on.
  4. Accept the default name for the new column or enter a new name.
  5. (Optional) If you want to apply another aggregate function to the Group By operation, follow these steps:

    1. Click that precedes Apply Additional Functions.
    2. Specify the function name and the column to which the function applies.
    3. Accept the default name for the new column or enter a new name.

    Repeat this step for each additional function.

  6. In the New table name field, enter the name of the resultant table.
  7. (Optional) Click Advanced Options to further customize the result of the Group By operation. The following table describes the options:

    Advanced option Description
    Include only erroneous rows This option filters only rows where the function cannot produce a value.
    Join table back to original This option joins the resultant table to the original table. The operation is a left outer join, with the original table being the left table. Joining back the table provides a convenient way to compare the values in the column that results from the Group By operation to values in the original column.
    Keep some columns

    By default, the resultant table contains only the new column created from the Group By operation and the columns on which you group. All columns in the resultant table, including the columns on which you group, become derived columns. The Data Browser for the resultant table contains only the derived fields.

    This option enables you to select additional columns to include in the resultant table. The column created by the Group By function is a derived column. All prefixed columns retain the column prefix.

    NOTE: This option is not applicable if you select Join table back to original because the table created by the Join operation automatically includes all the columns from the original table.
  8. Click GROUP BY.

The following screenshot shows an example of a table created by a multi-column, multi-function Group By operation, without joining back to the original table.

Go to top