# Data Sinks

A Data Sink in Canso is a reference to where processed data is stored. Data Sinks can be used by multiple Batch and Streaming features to save their outputs. Currently, Canso supports two types of data sinks: S3 for offline storage and Redis for online storage.

## Introduction

Data Sinks provide a standardized way to store processed data, enabling:

* Efficient storage and retrieval of processed data.
* Reusability across different batch and streaming features.
* Simplified handling of data storage configurations for Data Scientists.
* Consistent methods for saving feature and pre-processing table outputs.

## Data Sink Types

### Offline Data Sink Attributes (S3)

These attributes define how processed data is stored and accessed in Data Sinks for offline storage.

| Attribute     | Description                                | Example                                                                                |
| ------------- | ------------------------------------------ | -------------------------------------------------------------------------------------- |
| `name`        | Unique Name of the data sink               | `processed_sales_orders`                                                               |
| `description` | Description of what the data sink contains | Processed sales orders stored for analysis                                             |
| `owner`       | Team that owns the data sink               | `['data_engg@yugen.ai', 'sales@yugen.ai']`                                             |
| `bucket`      | Bucket Name                                | `mycompany_processed_data`                                                             |
| `leading_key` | Fixed Key Component                        | `processed_txns/users`                                                                 |
| `file_type`   | File Type                                  | `CSV`, `PARQUET`                                                                       |
| `metadata`    | Additional metadata about the offline sink | `{"output_mode": "append", "processing_time": "120 seconds", "output_partitions": 20}` |

### Online Data Sink Attributes (Redis)

These attributes define the configuration and usage of sink for low-latency retrieval.

| Attribute     | Description                                  | Example                                                                                |
| ------------- | -------------------------------------------- | -------------------------------------------------------------------------------------- |
| `name`        | Unique Name of the data sink                 | `user_session_data`                                                                    |
| `description` | Description of what the data source contains | Real-time user session data for quick access                                           |
| `owner`       | Team that owns the data source               | `['data_engg@yugen.ai', 'session_mgmt@yugen.ai']`                                      |
| `host`        | Redis Host                                   | `redis://192.168.1.123:6379`                                                           |
| `metadata`    | Additional metadata about the online sink    | `{"output_mode": "append", "processing_time": "120 seconds", "output_partitions": 20}` |

### Example object of Sinks

* [S3 Sink](https://github.com/Yugen-ai/gru/blob/c01d1f124605d927bc45312cf86fc3c232fc680a/gru/examples/s3_data_sink.py#L5-L19)
* [Redis Sink](https://github.com/Yugen-ai/gru/blob/c01d1f124605d927bc45312cf86fc3c232fc680a/gru/examples/redis_data_sink.py#L3-L11)

## Working with Data Sinks

Once a Data Sink is defined, it can be:

* Registered for reusability across different teams.
* Referenced by Batch and Streaming features to save their outputs.

### Tool Tips

* Go to [Top](#top) ⬆️
* Go back to [README.md](/...md) ⬅️
* Go back to [data-sources.md](/feature-store/data-sources.md) ⬅️
* Move forward to see [raw-feature.md](/feature-store/features/raw-feature.md) ➡️
* Move forward to see [derived-feature.md](/feature-store/features/derived-feature.md) ➡️
* Move forward to see [streaming-feature.md](/feature-store/features/streaming-feature.md) ➡️


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.canso.ai/feature-store/data-sinks.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
