Feature Stores: A Deep Dive

Mar 9, 202611 min readMLOpsFeature Engineering

How feature stores work: storage, synchronization, and point-in-time correctness.

What Is a Feature Store?

Many machine learning models fail in production not because of bad algorithms, but because of bad data management [1].

Features used during model training sometimes differ from those used in production. This mismatch causes training/serving skew, where feature values differ between training and production, degrading model performance [2][3].

Training datasets can also accidentally include feature values that were not available when the label occurred. This leaks future information into the training data, making offline evaluation metrics appear better than they are [4].

As teams build more models, another issue emerges: the same feature is often recomputed across multiple pipelines. These duplicated implementations drift over time, leading to inconsistent feature definitions, redundant work, and difficult debugging [5].

These problems all stem from the same issue: feature data is difficult to manage consistently across the machine learning lifecycle.

Feature stores were created to address this challenge. Uber introduced this concept in 2017 as part of its Michelangelo platform [6]. The core idea is simple: maintain a single source of truth for features so they can be defined once and reused consistently for both model training and production inference [7].

For engineers, the feature store interface is straightforward. Call get_historical_features() to build training datasets and get_online_features() to retrieve features for real-time predictions.

But behind this simple interface lies significant infrastructure. A feature store must compute features across batch and streaming pipelines, store them in systems optimized for training and serving, keep those systems synchronized, and enforce point-in-time correctness.

This post explains how feature stores accomplish this behind the scenes.

Training and Serving Are Different Workloads

Training and serving use the same features but in completely different ways.

Training: operates on massive historical datasets. It may scan billions of rows. Throughput and efficiency are prioritized over latency.
Serving: requires making predictions on a single a feature vector in milliseconds. The system cannot scan the entire dataset.

Serving both workloads from a single storage system does not work. Feature stores solve this by splitting storage into two layers: an offline store for batch workloads and an online store for low-latency access. Both stores contain the same features but are optimized for their workloads.

Tip

This is essentially SCD Type 4 with the offline store as your historical ledger and the online store as your real time memory.

Note

Training can also be performed online using streaming updates, and inference can sometimes be done in batch offline. For simplicity, this post focuses on the common case of offline training and online inference.

Storage Design

Offline Store

The offline store handles batch workloads such as training and backfills. Data is stored in columnar formats such as Parquet with Delta Lake or Hudi to enable ACID transactions [8].

user_id:        [ 42          17          99         ]
event_timestamp:[ 2024-03-01  2024-02-15  2024-03-10 ]
login_count:    [ 2           8           3          ]
avg_purchase:   [ 40          25          67         ]

Training jobs can read only the columns they need. Compression is also very efficient, especially for low cardinality columns where techniques like run-length encoding work well [9].

The offline feature store acts as a historical repository of feature data used for model training and batch workloads, often storing months or years of historical feature values. Efficient compression makes it practical and inexpensive to retain this data for training, debugging, and backfilling newly created features.

Online Store

The online store is optimized for low-latency access. Data is stored row by row or as key-value pairs so that all features for a single entity can be retrieved with a single fast lookup.

| user_id | event_timestamp | login_count | avg_purchase |
| ------- | --------------- | ----------- | ------------ |
| 42      | 2024-03-01      | 2           | 40           |
| 17      | 2024-02-15      | 8           | 25           |
| 99      | 2024-03-10      | 3           | 67           |

Row-oriented databases such as Postgres or key-value stores like Redis are common choices. They provide millisecond read latency but are expensive for storing large historical datasets. For this reason, the online store usually keeps only the most recent feature values.

Offline vs Online Store Summary

Feature	Offline Store	Online Store
Purpose	Model training and batch inference	Real-time inference
Access pattern	Large batch workloads	Single item lookups
Storage format	Columnar (Delta, Hudi)	Row-oriented or key-value (Postgres, Redis)
Data retained	Full historical feature values	Latest value per entity
Latency	Seconds to hours	Milliseconds
Cost	Low cost with strong compression	Higher cost for low-latency reads

Keeping these two stores synchronized is one of the main engineering challenges.

Keeping Offline and Online Stores in Sync

Feature values must land in both stores. Writing to both systems independently is risky. If one write job succeeds and the other fails, the stores diverge. This is known as the dual write problem.

There are two common approaches to solving it.

Streaming Updates with Kafka

One method is to route all feature updates through a streaming platform such as Kafka [10].

The feature pipeline writes new feature values to a Kafka topic. Events are usually serialized using a schema format such as Avro and stored in a schema registry. This ensures both downstream consumers interpret the feature payload the same way and allows safe schema evolution as features change over time.

Both the online and offline stores subscribe to this topic. One consumer writes updates to the online store for low-latency inference. The other consumer writes the same events to the offline store for historical storage.

Kafka acts as the central log for all feature updates. Once an event is written to the topic, downstream services replicate it into both storage systems. This pattern avoids the dual write problem.

By default, Kafka is configured to provide at-least-once delivery guarantees [11]. If a consumer fails, it can replay events from the log. This provides redundancy, but also introduces the possibility of duplicates.

To maintain correctness, feature stores typically use idempotent writes in the online store and deduplication or ACID upserts in the offline store. Both systems can also rely on the ordered event stream to apply updates deterministically[5:1].

The online store updates almost immediately, so new features are available for inference. The offline store may lag slightly because updates are often batched and compacted before being written to columnar storage.

The result is two systems that remain eventually consistent.

Tip

The Hopsworks feature store implements this architecture to maintain consistency between online and offline feature stores. See the Hopsworks documentation for more detail.

Batch Materialization

The Kafka approach streams updates into both stores in real time. Another approach is simpler. Treat the offline store as the source of truth and periodically copy feature values into the online store.

A scheduled batch job reads the latest feature values from the offline store and writes them into the online store.

This process is called materialization. The job queries recent feature values from the offline store and loads them into the online store so they are available for low-latency inference.

This design keeps the system simple. The offline store remains the system of record, and the online store provides quick lookups for the latest feature values.

The tradeoff for simplicity is freshness. The online store only reflects new data after each materialization run. This is often acceptable for models that tolerate slightly stale features, such as periodic recommendation systems.

Tip

The Feast feature store follows this architecture and periodically materializes features from the offline store into the online store. See the Feast documentation on the Online Store and Offline Store to understand how this works.

Point-in-Time Correctness

Joining labels to features introduces a risk of data leakage [4:1].

A naive join can include feature values that were not available when the label occurred. This makes offline metrics look artificially good, but the model can perform poorly in production.

Example: predicting whether a user will buy a premium subscription

Suppose a label occurs on 2024-03-03.

Labels Table

| user_id | event_timestamp | purchased_premium |
| ------- | --------------- | ----------------- |
| 42      | 2024-03-03      | true              |

Features Table (updated once daily)

| user_id | event_timestamp | login_count | avg_purchase |
| ------- | --------------- | ----------- | ------------ |
| 42      | 2024-03-01      | 2           | 40           |
| 42      | 2024-03-04      | 5           | 41           |
| 42      | 2024-03-07      | 9           | 42           |

If your join logic simply grabs the latest record, it will use a login_count of 9. But on the day of the purchase, that number was 2. By using 9, you've leaked information from four days into the future into your training set.

The Fix: Point-in-Time-Join

To maintain temporal integrity, you must perform a point-in-time join. This ensures you only retrieve feature values where $T_{feature} \le T_{label}$ .

1. The Modern Approach: `AS OF` Join

If your data processing framework supports it, the AS OF JOIN is the cleanest way to grab the single most recent record relative to the label timestamp.

AS OF JOIN

SELECT
    labels.user_id,
    labels.event_timestamp,
    labels.purchased_premium,
    features.login_count,
    features.avg_purchase
FROM labels
AS OF LEFT JOIN features
    ON labels.user_id = features.user_id
    AND labels.event_timestamp >= features.event_timestamp

The AS OF keyword on line 8 implicitly enforces the temporal guard. No future feature values can leak into the join.

Note

At the time of writing, several modern data systems/frameworks support AS OF JOIN for time aware queries. Notable examples include DuckDB, Snowflake, and Polars.

2. The Standard SQL Approach: Window Functions

If AS OF JOIN isn't available, you can use a QUALIFY statement with a window function.

Point-in-Time JOIN

SELECT
    labels.user_id,
    labels.event_timestamp,
    labels.purchased_premium,
    features.login_count,
    features.avg_purchase
FROM labels
LEFT JOIN features
    ON labels.user_id = features.user_id
    AND labels.event_timestamp >= features.event_timestamp
QUALIFY ROW_NUMBER() OVER (
    PARTITION BY labels.user_id, labels.event_timestamp
    ORDER BY features.event_timestamp DESC
) = 1

Line 10 blocks any feature value from the future. Lines 11-14 collapse multiple valid rows down to the single most-recent one per entity.

Note

The QUALIFY clause is gaining widespread adoption and is supported by platforms such as Snowflake, DuckDB, and Databricks. If QUALIFY is not available in your SQL dialect, the same result can typically be achieved using a window function combined with a CTE.

Automation via Feature Stores

Writing these queries manually for dozens of features is tedious and error-prone. By calling a single method like get_historical_features(), the system handles point-in-time correctness automatically, ensuring your model only learns from feature values available at the time of prediction.

Summary

Feature stores solve several core challenges in production machine learning systems.

Storage: Training and serving have fundamentally different access patterns. Training requires scanning large historical datasets, while online inference requires retrieving a single feature vector within milliseconds. Feature stores separate these workloads into two systems: an offline store optimized for large analytical queries and an online store optimized for low-latency lookups.

Synchronization: Both stores must contain consistent feature values. Two common architectures address this. Streaming pipelines publish updates through systems such as Kafka, allowing both stores to ingest events in near real time. Alternatively, batch materialization treats the offline store as the source of truth and periodically loads the latest feature values into the online store. Streaming prioritizes freshness, while materialization prioritizes simplicity.

Point-in-time correctness: Training datasets must only include feature values that were available when the label occurred. Without this constraint, joins can leak future information and inflate offline evaluation metrics. Feature stores enforce point-in-time joins automatically, ensuring models learn from data that reflects the true prediction context.

By handling these concerns, feature stores allow machine learning teams to define features once and reuse them consistently across training and production. Behind a simple interface, they provide the infrastructure needed to keep feature data reliable, reproducible, and aligned across the entire ML lifecycle.

For Further Reading

Feast: Feature Store Concepts
Hopsworks: The Feature Store for Machine Learning
Databricks: Databricks Feature Store
Tecton: What is Tecton?
Chalk.ai: What is a Feature Store? A Complete Guide for ML Teams

References

[1]Sambasivan, Nithya and Kapania, Shivani and Highfill, Hannah and Akrong, Diana and Paritosh, Praveen and Aroyo, Lora M "Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI" Association for Computing Machinery, 2021. https://dl.acm.org/doi/pdf/10.1145/3411764.3445518 ↩

[2]Polyzotis, N., et al. "Data Validation for Machine Learning" Proceedings for Machine Learning and Systems, 2019. https://proceedings.mlsys.org/paper_files/paper/2019/file/928f1160e52192e3e0017fb63ab65391-Paper.pdf ↩

[3]Martin Zinkevich "Rules of Machine Learning" Google, 2025. https://developers.google.com/machine-learning/guides/rules-of-ml ↩

[4]Kaufman, S., Rosset, S., Perlich, C., and Stitelman, O. "Leakage in Data Mining: Formulation, Detection, and Avoidance" Association for Computing Machinery, 2012. https://dl.acm.org/doi/10.1145/2382577.2382579 ↩↩1

[5]Dowling, J., et al. "The Hopsworks Feature Store for Machine Learning" Association for Computing Machinery, 2024. https://dl.acm.org/doi/pdf/10.1145/3626246.3653389 ↩↩1

[6]Jeremy Hermann and Mike Del Blaso. "Meet Michelangelo: Uber’s Machine Learning Platform" Uber, 2017. https://www.uber.com/blog/michelangelo-machine-learning-platform ↩

[7]Mike Del Blaso and Willem Pienaar "What is a Feature Store?" Databricks, 2025. https://www.databricks.com/blog/what-is-a-feature-store ↩

[8]Armbrust, M., Das, T., Sun, L., Yavuz, B., Zhu, S., Murthy, M., Torres, J., et al. "Delta Lake: High-Performance ACID Table Storage over Cloud Object Stores" Proceedings of the VLDB Endowment, Vol. 13, No. 12, 2020. https://vldb.org/pvldb/vol13/p3411-armbrust.pdf ↩

[9]Abadi, D., Boncz, P., Harizopoulos, S., Idreos, S., and Madden, S. "The Design and Implementation of Modern Column-Oriented Database Systems" Foundations and Trends in Databases, Vol. 5, No. 3, 2013. https://stratos.seas.harvard.edu/publications/design-and-implementation-modern-column-oriented-database-systems ↩

[10]Kreps, J., Narkhede, N., and Rao, J. "Kafka: A Distributed Messaging System for Log Processing" Proceedings of the 6th International Workshop on Networking Meets Databases (NetDB), 2011. https://notes.stephenholiday.com/Kafka.pdf ↩

[11]Confluent, Inc. "Kafka Message Delivery Guarantees" Confluent Documentation, 2024. https://docs.confluent.io/kafka/design/delivery-semantics.html ↩

Feature Stores: A Deep Dive

Mar 9, 202611 min readMLOpsFeature Engineering

How feature stores work: storage, synchronization, and point-in-time correctness.

What Is a Feature Store?

Many machine learning models fail in production not because of bad algorithms, but because of bad data management [1].

These problems all stem from the same issue: feature data is difficult to manage consistently across the machine learning lifecycle.

For engineers, the feature store interface is straightforward. Call get_historical_features() to build training datasets and get_online_features() to retrieve features for real-time predictions.

This post explains how feature stores accomplish this behind the scenes.

Training and Serving Are Different Workloads

Training and serving use the same features but in completely different ways.

Training: operates on massive historical datasets. It may scan billions of rows. Throughput and efficiency are prioritized over latency.
Serving: requires making predictions on a single a feature vector in milliseconds. The system cannot scan the entire dataset.

Tip

This is essentially SCD Type 4 with the offline store as your historical ledger and the online store as your real time memory.

Note

Storage Design

Offline Store

The offline store handles batch workloads such as training and backfills. Data is stored in columnar formats such as Parquet with Delta Lake or Hudi to enable ACID transactions [8].

user_id:        [ 42          17          99         ]
event_timestamp:[ 2024-03-01  2024-02-15  2024-03-10 ]
login_count:    [ 2           8           3          ]
avg_purchase:   [ 40          25          67         ]

Training jobs can read only the columns they need. Compression is also very efficient, especially for low cardinality columns where techniques like run-length encoding work well [9].

Online Store

The online store is optimized for low-latency access. Data is stored row by row or as key-value pairs so that all features for a single entity can be retrieved with a single fast lookup.

| user_id | event_timestamp | login_count | avg_purchase |
| ------- | --------------- | ----------- | ------------ |
| 42      | 2024-03-01      | 2           | 40           |
| 17      | 2024-02-15      | 8           | 25           |
| 99      | 2024-03-10      | 3           | 67           |

Offline vs Online Store Summary

Feature	Offline Store	Online Store
Purpose	Model training and batch inference	Real-time inference
Access pattern	Large batch workloads	Single item lookups
Storage format	Columnar (Delta, Hudi)	Row-oriented or key-value (Postgres, Redis)
Data retained	Full historical feature values	Latest value per entity
Latency	Seconds to hours	Milliseconds
Cost	Low cost with strong compression	Higher cost for low-latency reads

Keeping these two stores synchronized is one of the main engineering challenges.

Keeping Offline and Online Stores in Sync

Feature values must land in both stores. Writing to both systems independently is risky. If one write job succeeds and the other fails, the stores diverge. This is known as the dual write problem.

There are two common approaches to solving it.

Streaming Updates with Kafka

One method is to route all feature updates through a streaming platform such as Kafka [10].

Kafka acts as the central log for all feature updates. Once an event is written to the topic, downstream services replicate it into both storage systems. This pattern avoids the dual write problem.

The result is two systems that remain eventually consistent.

Tip

The Hopsworks feature store implements this architecture to maintain consistency between online and offline feature stores. See the Hopsworks documentation for more detail.

Batch Materialization

A scheduled batch job reads the latest feature values from the offline store and writes them into the online store.

This process is called materialization. The job queries recent feature values from the offline store and loads them into the online store so they are available for low-latency inference.

This design keeps the system simple. The offline store remains the system of record, and the online store provides quick lookups for the latest feature values.

Tip

Point-in-Time Correctness

Joining labels to features introduces a risk of data leakage [4:1].

A naive join can include feature values that were not available when the label occurred. This makes offline metrics look artificially good, but the model can perform poorly in production.

Example: predicting whether a user will buy a premium subscription

Suppose a label occurs on 2024-03-03.

Labels Table

| user_id | event_timestamp | purchased_premium |
| ------- | --------------- | ----------------- |
| 42      | 2024-03-03      | true              |

Features Table (updated once daily)

| user_id | event_timestamp | login_count | avg_purchase |
| ------- | --------------- | ----------- | ------------ |
| 42      | 2024-03-01      | 2           | 40           |
| 42      | 2024-03-04      | 5           | 41           |
| 42      | 2024-03-07      | 9           | 42           |

The Fix: Point-in-Time-Join

To maintain temporal integrity, you must perform a point-in-time join. This ensures you only retrieve feature values where $T_{feature} \le T_{label}$ .

1. The Modern Approach: `AS OF` Join

If your data processing framework supports it, the AS OF JOIN is the cleanest way to grab the single most recent record relative to the label timestamp.

AS OF JOIN

SELECT
    labels.user_id,
    labels.event_timestamp,
    labels.purchased_premium,
    features.login_count,
    features.avg_purchase
FROM labels
AS OF LEFT JOIN features
    ON labels.user_id = features.user_id
    AND labels.event_timestamp >= features.event_timestamp

The AS OF keyword on line 8 implicitly enforces the temporal guard. No future feature values can leak into the join.

Note

At the time of writing, several modern data systems/frameworks support AS OF JOIN for time aware queries. Notable examples include DuckDB, Snowflake, and Polars.

2. The Standard SQL Approach: Window Functions

If AS OF JOIN isn't available, you can use a QUALIFY statement with a window function.

Point-in-Time JOIN

SELECT
    labels.user_id,
    labels.event_timestamp,
    labels.purchased_premium,
    features.login_count,
    features.avg_purchase
FROM labels
LEFT JOIN features
    ON labels.user_id = features.user_id
    AND labels.event_timestamp >= features.event_timestamp
QUALIFY ROW_NUMBER() OVER (
    PARTITION BY labels.user_id, labels.event_timestamp
    ORDER BY features.event_timestamp DESC
) = 1

Line 10 blocks any feature value from the future. Lines 11-14 collapse multiple valid rows down to the single most-recent one per entity.

Note

Automation via Feature Stores

Summary

Feature stores solve several core challenges in production machine learning systems.

For Further Reading

Feast: Feature Store Concepts
Hopsworks: The Feature Store for Machine Learning
Databricks: Databricks Feature Store
Tecton: What is Tecton?
Chalk.ai: What is a Feature Store? A Complete Guide for ML Teams

References

[3]Martin Zinkevich "Rules of Machine Learning" Google, 2025. https://developers.google.com/machine-learning/guides/rules-of-ml ↩

[5]Dowling, J., et al. "The Hopsworks Feature Store for Machine Learning" Association for Computing Machinery, 2024. https://dl.acm.org/doi/pdf/10.1145/3626246.3653389 ↩↩1

[6]Jeremy Hermann and Mike Del Blaso. "Meet Michelangelo: Uber’s Machine Learning Platform" Uber, 2017. https://www.uber.com/blog/michelangelo-machine-learning-platform ↩

[7]Mike Del Blaso and Willem Pienaar "What is a Feature Store?" Databricks, 2025. https://www.databricks.com/blog/what-is-a-feature-store ↩

[11]Confluent, Inc. "Kafka Message Delivery Guarantees" Confluent Documentation, 2024. https://docs.confluent.io/kafka/design/delivery-semantics.html ↩

What Is a Feature Store?#

Training and Serving Are Different Workloads#

Storage Design#

Offline Store#

Online Store#

Offline vs Online Store Summary#

Keeping Offline and Online Stores in Sync#

Streaming Updates with Kafka#

Batch Materialization#

Point-in-Time Correctness#

Example: predicting whether a user will buy a premium subscription#

The Fix: Point-in-Time-Join#

1. The Modern Approach: AS OF Join#

2. The Standard SQL Approach: Window Functions#

Automation via Feature Stores#

Summary#

For Further Reading#

References

What Is a Feature Store?#

Training and Serving Are Different Workloads#

Storage Design#

Offline Store#

Online Store#

Offline vs Online Store Summary#

Keeping Offline and Online Stores in Sync#

Streaming Updates with Kafka#

Batch Materialization#

Point-in-Time Correctness#

Example: predicting whether a user will buy a premium subscription#

The Fix: Point-in-Time-Join#

1. The Modern Approach: AS OF Join#

2. The Standard SQL Approach: Window Functions#

Automation via Feature Stores#

Summary#

For Further Reading#

References

What Is a Feature Store?

Training and Serving Are Different Workloads

Storage Design

Offline Store

Online Store

Offline vs Online Store Summary

Keeping Offline and Online Stores in Sync

Streaming Updates with Kafka

Batch Materialization

Point-in-Time Correctness

Example: predicting whether a user will buy a premium subscription

The Fix: Point-in-Time-Join

1. The Modern Approach: `AS OF` Join

2. The Standard SQL Approach: Window Functions

Automation via Feature Stores

Summary

For Further Reading

What Is a Feature Store?

Training and Serving Are Different Workloads

Storage Design

Offline Store

Online Store

Offline vs Online Store Summary

Keeping Offline and Online Stores in Sync

Streaming Updates with Kafka

Batch Materialization

Point-in-Time Correctness

Example: predicting whether a user will buy a premium subscription

The Fix: Point-in-Time-Join

1. The Modern Approach: `AS OF` Join

2. The Standard SQL Approach: Window Functions

Automation via Feature Stores

Summary

For Further Reading