# Data Sinks

A Data Sink in Canso is a reference to where processed data is stored. Data Sinks can be used by multiple Batch and Streaming features to save their outputs. Currently, Canso supports two types of data sinks: S3 for offline storage and Redis for online storage.

## Introduction

Data Sinks provide a standardized way to store processed data, enabling:

* Efficient storage and retrieval of processed data.
* Reusability across different batch and streaming features.
* Simplified handling of data storage configurations for Data Scientists.
* Consistent methods for saving feature and pre-processing table outputs.

## Data Sink Types

### Offline Data Sink Attributes (S3)

These attributes define how processed data is stored and accessed in Data Sinks for offline storage.

| Attribute     | Description                                | Example                                                                                |
| ------------- | ------------------------------------------ | -------------------------------------------------------------------------------------- |
| `name`        | Unique Name of the data sink               | `processed_sales_orders`                                                               |
| `description` | Description of what the data sink contains | Processed sales orders stored for analysis                                             |
| `owner`       | Team that owns the data sink               | `['data_engg@yugen.ai', 'sales@yugen.ai']`                                             |
| `bucket`      | Bucket Name                                | `mycompany_processed_data`                                                             |
| `leading_key` | Fixed Key Component                        | `processed_txns/users`                                                                 |
| `file_type`   | File Type                                  | `CSV`, `PARQUET`                                                                       |
| `metadata`    | Additional metadata about the offline sink | `{"output_mode": "append", "processing_time": "120 seconds", "output_partitions": 20}` |

### Online Data Sink Attributes (Redis)

These attributes define the configuration and usage of sink for low-latency retrieval.

| Attribute     | Description                                  | Example                                                                                |
| ------------- | -------------------------------------------- | -------------------------------------------------------------------------------------- |
| `name`        | Unique Name of the data sink                 | `user_session_data`                                                                    |
| `description` | Description of what the data source contains | Real-time user session data for quick access                                           |
| `owner`       | Team that owns the data source               | `['data_engg@yugen.ai', 'session_mgmt@yugen.ai']`                                      |
| `host`        | Redis Host                                   | `redis://192.168.1.123:6379`                                                           |
| `metadata`    | Additional metadata about the online sink    | `{"output_mode": "append", "processing_time": "120 seconds", "output_partitions": 20}` |

### Example object of Sinks

* [S3 Sink](https://github.com/Yugen-ai/gru/blob/c01d1f124605d927bc45312cf86fc3c232fc680a/gru/examples/s3_data_sink.py#L5-L19)
* [Redis Sink](https://github.com/Yugen-ai/gru/blob/c01d1f124605d927bc45312cf86fc3c232fc680a/gru/examples/redis_data_sink.py#L3-L11)

## Working with Data Sinks

Once a Data Sink is defined, it can be:

* Registered for reusability across different teams.
* Referenced by Batch and Streaming features to save their outputs.

### Tool Tips

* Go to [Top](#top) ⬆️
* Go back to [README.md](https://docs.canso.ai) ⬅️
* Go back to [data-sources.md](https://docs.canso.ai/feature-store/data-sources) ⬅️
* Move forward to see [raw-feature.md](https://docs.canso.ai/feature-store/features/raw-feature) ➡️
* Move forward to see [derived-feature.md](https://docs.canso.ai/feature-store/features/derived-feature) ➡️
* Move forward to see [streaming-feature.md](https://docs.canso.ai/feature-store/features/streaming-feature) ➡️
