Dropbox Tech Blog

Is this a date? Using ML to identify date formats in file names

Follow
The article discusses Dropbox's implementation of a machine learning model designed to identify date formats in file names, enhancing file organization and retrieval. Effective file naming is crucial for teamwork, and Dropbox's automated naming conventions feature allows users to set rules for file names, ensuring consistency and efficiency. Initially, Dropbox attempted a rule-based approach for date identification but faced challenges due to the variety of date formats used by different individuals. This led to the development of a machine learning model that accurately recognizes dates within file names. The model underwent several stages, including data annotation, tokenization, and classification, utilizing techniques such as Inside-Outside-Beginning (IOB) tagging to label date components. The machine learning model, based on the transformer architecture (specifically DistilRoberta), showed a significant improvement over the previous rule-based system, increasing the number of renamed files by 40%. To optimize performance, Dropbox implemented techniques like model pruning and quantization, successfully reducing latency during inference. Following its rollout in August 2022, the feature gained popularity, with over a million files renamed shortly after launch. Future enhancements may include the extraction of additional entities beyond dates, leveraging advanced models for even greater accuracy in file naming conventions.
favicon
dropbox.tech
dropbox.tech
Create attached notes ...