Etsy's Feature Systems team encountered a potential problem when using timestamp features in machine learning models due to precision misinterpretation across frameworks.
The issue stemmed from the timestamp data type, which was being interpreted differently by different frameworks, leading to a potential training/serving skew.
To address this, ML practitioners recommended avoiding the timestamp type and using a more basic numeric type, such as Longs.
The team investigated the root cause, discovering that the problem extended beyond specific bugs and highlighted a larger issue for ML practitioners in dealing with timestamp features.
The team realized that the complexity of datetime objects and timestamp types was unnecessary for their use case, as they only needed integer representations at specific precision.
During an architecture working group meeting, there was consensus to represent datetime features as primitive numeric types to ensure consistency between model training and inference.
The team decided to standardize on primitive types more generally to promote consistency across all training contexts.
The team also recognized the need for improved documentation to simplify feature transformation for customers.
The incident highlighted the potential challenges in applying software engineering practices to ML-specific needs.
As ML continues to integrate into software systems, these types of nuances will likely become more common and require further refinement of best practices.
etsy.com
etsy.com
Create attached notes ...
