# Raw ML Batch Feature

A Raw Feature in the Canso platform is a fundamental component that executes ML pipelines. It processes data from registered data sources, applies feature logic, and stores the results in data sinks.

## Introduction

A Raw Features are the essential building blocks for machine learning models. It provides:

* Ensures seamless preparation of data for both training and inference stages.
* Performs standardized predefined logics or user-defined functions (UDFs).

## Raw Feature Types

### Feature with Predefined Logic

Raw Features with predefined logic utilize built-in transformations and aggregations for ease of use and consistency. Features created using common aggregations like window or sliding window and transformations such as SUM, MIN, MAX, etc.

### Feature with Custom UDF

Features created using user-defined functions for more complex and specific transformations. Raw Features with custom UDFs allow for more flexibility and can handle complex transformations not covered by predefined logic.

### Raw Feature Attributes

| Attribute                                | Description                                                 | Example                                                                                           |
| ---------------------------------------- | ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| **name**                                 | Unique Name of the raw feature                              | `user_clicks_7d`                                                                                  |
| **description**                          | Description of what the raw feature contains                | `Sum of user clicks in the last 7 days`                                                           |
| **owners**                               | Team that owns the raw feature                              | `['data_team@company.com']`                                                                       |
| **entity**                               | The entity the feature is based on                          | `user_id`                                                                                         |
| **data\_type**                           | Data type of the raw feature                                | `FLOAT`                                                                                           |
| **data\_sources**                        | List of data sources used to create the feature             | `[survey_telemetry_data]`                                                                         |
| **staging\_sink**                        | S3 sink where the intermediate data is stored               | `[operational_telemetry_data]`                                                                    |
| **online\_sink**                         | Redis sink for storing the feature online                   | `["online_telemetry_data"]`                                                                       |
| **online\_sink\_write\_option\_configs** | Configurations for writing to the online sink               | `{"online_telemetry_data":{ "file_type_properties": { "type": "PARQUET", "mergeSchema": False}}}` |
| **feature\_logic**                       | Transformation logic to compute the feature                 | `SlidingWindowAggregation`                                                                        |
| **processing\_engine**                   | Processing engine used for feature computation              | `spark`                                                                                           |
| **processing\_engine\_configs**          | Configurations for the processing engine                    | `{"memory": "4g", "cores": 2}`                                                                    |
| **online**                               | Flag to indicate if the feature should be available online  | `True`                                                                                            |
| **offline**                              | Flag to indicate if the feature should be available offline | `True`                                                                                            |
| **schedule**                             | Schedule for feature computation                            | `1D`                                                                                              |
| **active**                               | Flag to indicate if the feature is active                   | `True`                                                                                            |
| **start\_time**                          | Start time for feature computation                          | `datetime.now()`                                                                                  |

### Special notes on attributes

* `feature_logic`: The Raw Feature supports sliding windows and window aggregations.
* `processing_engine & their configs`: These are the [default set of PySpark configurations](https://github.com/Yugen-ai/gru/blob/main/gru/config/features/default_processing_engine_configs_batch.yaml) used to run the Raw Feature.
* `online flag`: If the online flag is enabled, the data will be ingested into the online sink (i.e., Redis cache).
* `offline flag`: If the offline flag is enabled, the Raw Feature will ingest the data into the offline sink (i.e., S3 sink).
* `Read & Write option configs`: Users can provide some configurations at the time of feature registration. Currently, this option is not enabled, but we will be adding support for them soon.

### Example Object of Raw Features

* [Feature with Predefined logic](https://github.com/Yugen-ai/gru/blob/main/gru/examples/create_raw_feature.py#L14-L72)
* [Feature with Custom UDF](https://github.com/Yugen-ai/gru/blob/main/gru/examples/create_custom_raw_feature.py#L12-L34)

### Working with Raw Features

Once the raw feature is defined:

* It can be registered and reused by Derived Features. The output of a Raw Feature becomes the input for a Derived Feature, enabling more complex feature engineering and reducing redundancy.
* It can be deployed to execute the defined operation.

### Tool Tips

* Go to [Top](#top) ⬆️
* Go back to [README.md](/...md) ⬅️
* Go back to [data-sources.md](/feature-store/data-sources.md) ⬅️
* Go back to [data-sinks.md](/feature-store/data-sinks.md) ⬅️
* Move forward to see [register-feature.md](/guides/register-feature.md) ➡️


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.canso.ai/feature-store/features/raw-feature.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
