Skip to content

Task

A Task is the third and last organizational entity in ML cube Platform. A Task represents an ordinary artificial intelligence task like regression, classification, text generation or object detection.

A Task is associated with a Model and a Data schema that describes all the information about the data in the Task.

A Task is associated with a unique identifier that will be used by SDK to operate on it. The identifier can be retrieved from the Task homepage or by looking at the url.

A Task has a status that summarizes the health of its AI model. The status depends on the monitoring module and changes from Ok, Warning or Drift when the Monitoring modules detect drifts on monitored quantities.

Moreover, in the Task homepage is present the section named "Data events" which shows the most recent detection events generated by the monitoring module. It is possible to click on view to see more details or discard the notification (the event will remain available for future analysis on the Detection page).

Attributes

A Task is described by a set of attributes specified during its creation. Some attributes are common for every Task, while others vary according to its type. Generic attributes are:

Attribute Description
Name Name of the Task, unique for the Project.
Tags Optional customizable list of tags. They are used to better describe the Task and to improve search.
Task type Artificial intelligence type of Task. Possible values are:
Data structure Type of input data the Task uses. Possible values are:
  • Tabular: standard table based data used in contexts like regression or classification.
  • Image: images in their different formats and channels.
  • Text: textual data expressed as strings. When data structure is Text, attribute Text Language is required.
  • Embedding: input data are arrays that could represent embedding either image or text data. This data structure is used when raw data are not shared with ML cube Platform.
Optional target Boolean value that specifies if the ground truth is always available or not. In some Tasks, the actual value is not present until explicit labeling is done. In this cases, the Task is marked as with optional target so that ML cube Platform works accordingly.
Cost info Optional information about costs that depend on Task Type.
Text language Which language is used in the Task when input data structure is text.
Positive class Required when Task Type is Binary Classification, it indicates the positive class to be predicted.
Context separator Available when Task Type is RAG, it specifies the string separator to split retrieved context into different chunks.
Default answer Available when Task Type is RAG, it specifies the default answer to be used when no retrieved context is available.
Target Type Available when Task Type is Semantic Segmentation, it specifies how target and predictions are specified in this task. For now, the only available option is Polygon. See Semantic Segmentation details for more information.

Warning

Some Task's attributes are immutable: type, data structure and optional target flag cannot be modified after the creation of the Task.

Platform modules and Task Type compatibility

Most of ML cube Platform operations are done at Task level: monitoring, retraining, analytics and other features are specific to AI models and data that belong to a Task. Indeed, each Task Type has a set of ML cube Platform modules:

Module Regression Classification RAG Object Detection Semantic Segmentation
Monitoring
Explainability
Retraining
Topic Modeling
RAG Evaluation
LLM Security

Tip

On the left side of the web app page the Task menu is present, with links to the above-mentioned modules and Task settings.

Task Type

ML cube Platform supports several Task Types providing specific features for each of them. Not all Task Types are compatible with data structures, in the table below are shown which data structure is supported by which Task Type:

Task Type Tabular Image Text Embedding
Regression
Classification
RAG
Object Detection
Semantic Segmentation

In the following sections, you can find a description of each Task Type with its specific information.

Regression

Supervised regression Task with continuous target.

Cost information

Cost information is expressed by two proportional coefficients \(c_{o}\) and \(c_{u}\):

  • \(c_{o}\) is the cost of overestimating the target value, i.e., when \(\hat{y} > y\)
  • \(c_{u}\) is the cost of underestimating the target value. i.e., when \(\hat{y} < y\)

Given a data batch, the mean cost \(\bar{C}\) is expressed as $$ \bar{C} = \frac{\sum_{i | \delta_i < 0} |\delta_i| \times c_{o} + \sum_{i | \delta_i > 0} \delta_i \times c_{u}}{N} $$ where \(\delta_i = y_i - \hat{y}_i\) is the different between the target and the estimated value.

Classification

Supervised classification Task with discrete target. Classification Tasks divides in:

  • Binary: when then target is a binary variable. For binary classification tasks additional positive class attribute must be specified indicating which value is considered as the positive one. For instance, in fraud detection classification task "1" can represent that the sample is a fraud, while "0" when it is not. In that case positive class attribute is "1".
  • Multiclass: when the target is a categorical variable with more than two possible values but only one value can be assigned.
  • Multilabel: when the target is an array indicating which of the possible categories are present. In this case, each element can be either 0 or 1, and more than one element of the array can be 1.

Cost information

Cost information differs from each of the three classification types, however, the concept is similar. A cost is associated to every misclassification possibility:

  • Binary:

    • \(c_{FP}\) is the cost of classifying a negative sample as positive
    • \(c_{FN}\) is the cost of classifying a positive sample as negative

    Given a data batch, the mean cost \(\bar{C}\) is expressed as $$ \bar{C} = \frac{N_{FP} \times c_{FP} + N_{FN} \times c_{FN}}{N} $$ where \(N_{FP}\) and \(N_{FN}\) are the number of false positives and false negatives respectively.

  • Multiclass:

    • \(c_{k}\) is the cost of misclassifying a sample, whose actual class is \(k\), with another class

    Given a data batch, the mean cost \(\bar{C}\) is expressed as $$ \bar{C} = \frac{\sum_{k} N_{k} \times c_{k} }{N} $$ where \(N_{k}\) is the number of misclassified samples of class \(k\).

  • Multilabel:

    • \(c_{FP}^{k}\) is the cost of classifying a sample as class \(k\) when the actual class \(k\) is not present
    • \(c_{FN}^{k}\) is the cost of not classifying a sample as class \(k\) when the actual class \(k\) is present

    Given a data batch, the mean cost \(\bar{C}\) is expressed as $$ \bar{C} = \frac{\sum_{k} N_{FP}^{k} \times c_{FP}^{k} + N_{FN}^{k} \times c_{FN}^{k}}{N} $$ where \(N_{FP}^{k}\) and \(N_{FN}^{k}\) are the number of false positives and false negatives of class \(k\) respectively

Retrieval Augmented Generation

Retrieval Augmented Generation is a particular AI task for Text data based on Large Language Models, in which they are used to generate responses of user query using a set of retrieved documents as context to provide a precise and more focused response.

RAG Tasks, do not have a Target therefore, the attribute optional target is always set to True. Moreover, in this Task, the Prediction is a text as well. While the input is composed of two entities:

  • User Input: the user query that the model needs to answer
  • Retrieved Context: the set of documents the retrieval engine selected to help the model

RAG Tasks have two additional attributes:

  • Context separator: which is a string used to separate different retrieved contexts into chunks. Context data is sent as a single string, however, in RAG settings multiple documents can be retrieved. In this case, context separator is used to distinguish them. It is optional since a single context can be provided.

    Example

    Context separator: <<sep>>

    Context data: The capital of Italy is Rome.<<sep>>Rome is the capital of Italy.<<sep>>Rome was the capital of Roman Empire.

    Contexts:

    - The capital of Italy is Rome.
    - Rome is the capital of Italy.
    - Rome was the capital of Roman Empire.
    
  • Default answer: which is a string used when no retrieved context is available. It is optional since other way to handle this situation are available.

    Example

    Default answer: "I am sorry, I cannot help you with that request."

Object Detection

Object Detection task processes images and provides as output a list of bounding boxes with associated label indicating the type of identified entity. Therefore, target is a list of four elements tuples indicating the x_min, x_max, y_min and y_max of the box and a string label for the entity type.

Semantic Segmentation

Semanric segmentation task processes images and provides as output a list of entities identified in the image with label indicating the type. The target can assume different forms, the Task attribute Target Type is used to specify it. When target type is Polygon, the entity is represented as a list of verices with x,y coordinates that defines the vertices of the polygon.