Time Series Are Not That Different for LLMs

Large Language Models (LLMs) have revolutionized computational linguistics and computer vision. The key to their success lies in massive training data, transferability through prompting and self-supervised pretraining. Researchers are now exploring the application of LLMs to time series data, known as Large Time Series Foundation Models (LTSMs). Like LLMs, LTSMs aim to learn from diverse time series data and adapt to various tasks through fine-tuning. The connection between language and time series models lies in the sequential nature of their data, with time series tokens being analogous to language tokens. To bridge the semantic gap between these two domains, researchers are investigating methodologies to align the information through tokenization, base model selection, prompt engineering, and training paradigm definition. By using symbolic representation and learnable linear layers for tokenization, selecting base models based on task analogies, employing statistical prompts, and considering different training approaches, optimal performance and zero/few-shot forecasting can be achieved. A recent benchmark study, LTSM-bundle, evaluates various design choices and provides an open-source framework for re-programming and benchmarking LLMs on time series data. The study reveals that statistical prompts, linear tokenization, full fine-tuning, smaller models for long-term forecasting, medium-sized models for short-term forecasting, and dataset diversity contribute to improved LTSM performance. The LTSM-bundle outperforms existing methods for re-programming LLMs for time series and transformer-based forecasting models.

towardsdatascience.com

RSS Hunter

2024-07-12