# Derived ML Batch Feature

Derived Features are created by applying transformations on data generated by raw features. They serve as advanced features that build upon the foundational raw features, enabling more complex data processing and feature engineering.

## Introduction

Derived Feature is an additional layer on top of raw features. It provides:

* Derived Features can utilize multiple raw features as inputs, combining their values to create more meaningful and polished results.
* These features are crucial in machine learning workflows as they allow for more sophisticated data transformations and enrichment.
* It performs built-in operations and advanced operations on the raw tabular data.

## Derived Feature Types

### Feature with built-in operations

Derived Features support a range of built-in operations such as add, subtract, multiply, and safe\_divide. These operations combine the raw data perform the transformation operation and adds new column to the dataframe having the new transformed values.

### Derived Feature Attributes

| Attribute                   | Description                                                                     | Example                           |
| --------------------------- | ------------------------------------------------------------------------------- | --------------------------------- |
| `name`                      | Unique name of the derived feature                                              | `user_click_rate`                 |
| `description`               | Human-readable description for easier understanding                             | `Click rate of users over time`   |
| `staging_sink`              | Data sink for staging the processed data                                        | `recommendation-data-sink-S3`     |
| `online_sink`               | Data sink for storing the processed data for online retrieval                   | `redis://192.168.1.100:6379`      |
| `data_type`                 | Data type of the derived feature                                                | `FLOAT`                           |
| `owners`                    | List of team members or teams responsible for the feature                       | `['data_team@company.com']`       |
| `schedule`                  | Schedule for computing the derived feature                                      | `daily`                           |
| `entity`                    | Entity to which the feature belongs                                             | `user_id`                         |
| `processing_engine`         | Engine used for processing the feature logic                                    | `Spark`                           |
| `processing_engine_configs` | Configuration options for the processing engine                                 | `{'num_partitions': 10}`          |
| `online`                    | Boolean flag indicating if the feature should be available for online retrieval | `True`                            |
| `offline`                   | Boolean flag indicating if the feature should be available for offline analysis | `True`                            |
| `transform`                 | Transformation logic applied to the raw feature values                          | `add(raw_feature1, raw_feature2)` |
| `start_time`                | Time since when the feature computation should begin                            | `2024-01-01 00:00:00`             |

### Special notes on attributes

* `feature_logic`: The Derived Feature supports operations like add, subtract, multiply, safe\_divide.
* `processing_engine & their configs`: These are the [default set of PySpark configurations](https://github.com/Yugen-ai/gru/blob/main/gru/config/features/default_processing_engine_configs_batch.yaml) used to run the Derived Feature.
* `online flag`: If the online flag is enabled, the data will be ingested into the online sink (i.e., Redis cache).
* `offline flag`: If the offline flag is enabled, the Derived Feature will ingest the data into the offline sink (i.e., S3 sink).
* `Read & Write option configs`: Users can provide some configurations at the time of feature registration. Currently, this option is not enabled, but we will be adding support for them soon.

### Example Object of Derived feature

* [Feature with built-in logic](https://github.com/Yugen-ai/gru/blob/main/gru/examples/create_derived_feature.py#L11-L38)

### Working with Derived Features

Once the derived feature is defined:

* It can be used as a standalone feature that combines and transforms raw features or other data sources.
* The output of a Derived Feature is used directly for machine learning model training or inference.
* It cannot be reused or referenced again in other features.

### Tool Tips

* Go to [Top](#top) ⬆️
* Go back to [README.md](https://docs.canso.ai) ⬅️
* Go back to [data-sources.md](https://docs.canso.ai/feature-store/data-sources) ⬅️
* Go back to [data-sinks.md](https://docs.canso.ai/feature-store/data-sinks) ⬅️
* Move forward to see [register-feature.md](https://docs.canso.ai/guides/register-feature) ➡️
