Segment¶

A Segment is a subset of the data distribution that identifies a sub-domain inside the data. It is defined by a set of rules over data dimensions and metadata. A Task can include several Segments. There are no constraints about how they are specified: they can be disjoint, overlapping, or even nested. In other words, a sample can belong to all the Segments, none of them, or just some of them.

When Segments are specified for a Task, monitoring is performed both on the whole data, called all population, and for each Segment. The objective of a Segment is to allow the analysis of specific groups of data, whose variations might go unnoticed if only the whole population is monitored.

Segments, similarly to the Data Schema and the Model, must be defined before sending any data to the Platform. They must be created all at once, as they can't be modified upon creation. Additionally, their definition needs to happen after the creation of the Data Schema, as the rules of the Segment are based on the columns defined there, and also after the creation of the Model.

Segment Structure¶

The Segment structure is very simple, as it only requires the definition of two fields:

Field	Description
Name	The name of the Segment. It is used to display information related to the segment in the Web App.
Rules	A list of rules defining the subset of the population. These rules are applied in AND between them, which means that a sample belongs to the Segment if all the conditions expressed by the rules are satisfied. To prevent potential conflicts, it is not possible to define two rules over the same column.

Segments can be created both through the Web App and the SDK.

Segment Rules¶

A rule is a condition over a single data dimension that a specific sample must match to be considered part of the Segment. Each Segment has from one to several rules, which are applied in AND between them. A rule is defined by the following fields:

Field	Description
Column name	The name of the column in the Data Schema that the rule is applied to. A rule can be applied only on columns of role `INPUT`, `TARGET` and `METADATA`
Operator	The operator defining the rule. It can be either `IN`, meaning that the element matches the criteria, or `OUT`, indicating it does not satisfy the criteria.
Values	This field can have two possible meaning, according to the data type of the column specified in the rule: The data type is float: Values is a series of ranges [a, b] that define the numeric intervals over which the operator is applied. The range is closed, meaning that the extremes are always included. When operator is `IN`, the ranges are in OR, whereas, when the operator is `OUT` they are in AND. It is possible to set an open extreme by not specifying it (At least one of the extremes needs to be specified.) The data type is categorical or string: Values is a list which elements must match the content of the column. When operator is `IN`, the column value must be one of the specified elements, while, when operator is `OUT` it must not be one of them.

Examples¶

Let's consider a simple example where we have a dataset with the following columns:

Sample ID	X_0	X_1	Y	Metadata_1	Metadata_2
id_0	10	20	class_0	A1	B2
id_1	11	21	class_0	A2	B1
id_2	12	22	class_1	A1	B2
id_3	13	23	class_1	A2	B1
id_4	14	24	class_0	A1	B2

It represents a binary classification problem where the target is the column Y and the input features are X_0 and X_1. Columns Metadata_1 and Metadata_2 are metadata columns.

Let's now define some possible segments of increasing complexity.

A Segment that includes all samples where the value of the column X_0 is between 10 and 12:

Field	Value
Name	Segment_1
Rules	Only 1 rule is needed: Column name: X_0 Operator: IN Values: [10, 12]

This segment would include the samples with Sample ID equal to id_0, id_1 and id_2.

SDK Example

You can define the previous segment using the SDK with the following code:

client.create_task_segments(
            task_id=task_id,
            segments=[
                Segment(name=f'Segment 1',
                        rules=[
                            NumericSegmentRule(
                                column_name='X_0',
                                operator=SegmentOperator.IN,
                                values=[SegmentRuleNumericRange(start_value=10, end_value=12)]
                            )
                        ]
                )
            ]
        )

A Segment that includes all samples where the value of the column X_0 is greater or equal than 13 and the value of the column X_1 is strictly less than 24:

Field	Value
Name	Segment_2
Rules	We need 2 rules: Column name: X_0 Operator: IN Values: [13, +inf] Column name: X_1 Operator: IN Values: [-inf, 23]

This segment would include the sample with Sample ID equal to id_3.

SDK Example

You can define the previous segment using the SDK with the following code:

client.create_task_segments(
            task_id=task_id,
            segments=[
                Segment(name=f'Segment 2',
                        rules=[
                            NumericSegmentRule(
                                column_name='X_0',
                                operator=SegmentOperator.IN,
                                values=[SegmentRuleNumericRange(start_value=13)]
                            ),
                            NumericSegmentRule(
                                column_name='X_1',
                                operator=SegmentOperator.IN,
                                values=[SegmentRuleNumericRange(end_value=23)]
                            )
                        ]
                )
            ]
        )

Notice how leaving one end of the interval empty means that the interval is unbounded in that direction.

A Segment that includes all samples where the value of the column X_0 either lower or equal than 10 or greater or equal than 14 and the value of the metadata column Metadata_1 is equal to A1 or A3:

Field	Value
Name	Segment_3
Rules	We need 2 rules: Column name: X_0 Operator: IN Values: [-inf, 10], [14, +inf] Column name: Metadata_1 Operator: IN Values: [A1, A3]

This segment would include the samples with Sample ID equal to id_0 and id_4. Notice that, even though there is no sample with the value of Metadata_1 equal to A3, there are still samples belonging to the segment because the values of the rules are applied in OR between them.

SDK Example

You can define the previous segment using the SDK with the following code:

client.create_task_segments(
            task_id=task_id,
            segments=[
                Segment(name=f'Segment 3',
                        rules=[
                            NumericSegmentRule(
                                column_name='X_0',
                                operator=SegmentOperator.IN,
                                values=[SegmentRuleNumericRange(end_value=10), SegmentRuleNumericRange(start_value=14)]
                            ),
                            CategoricalSegmentRule(
                                column_name='Metadata_1',
                                operator=SegmentOperator.IN,
                                values=['A1', 'A3']
                            )
                        ]
                )
            ]
        )

A Segment that includes all samples where the value of the column X_1 is not between 21 and 23, the value of the target y_0 is equal to class_0 and the value of the metadata column Metadata_2 is different from B1:

Field	Value
Name	Segment_4
Rules	We need 3 rules: Column name: X_1 Operator: OUT Values: [21, 23] Column name: y_0 Operator: IN Values: [class_0] Column name: Metadata_2 Operator: OUT Values: [B1]

This segment would include the samples with Sample ID equal to id_2 and id_4.

SDK Example

You can define the previous segment using the SDK with the following code:

client.create_task_segments(
            task_id=task_id,
            segments=[
                Segment(name=f'Segment 3',
                        rules=[
                            NumericSegmentRule(
                                column_name='X_1',
                                operator=SegmentOperator.OUT,
                                values=[SegmentRuleNumericRange(end_value=21), SegmentRuleNumericRange(start_value=23)]
                            ),
                            CategoricalSegmentRule(
                                column_name='y_0',
                                operator=SegmentOperator.IN,
                                values=['class_0']
                            ),
                            CategoricalSegmentRule(
                                column_name='Metadata_1',
                                operator=SegmentOperator.IN,
                                values=['A1']
                            )
                        ]
                )
            ]
        )