As a machine learning engineer, model deployment is a critical component of MLOps, and there are various ways to deploy a ML model, with cloud and edge deployment being the two main categories. Cloud deployment is the most popular choice, but it can be further divided into subcategories such as API deployment, serverless deployment, and batch processing. API deployment involves deploying a model as an API, which can be queried through a simple command, and is popular due to its ease of implementation, scalability, and centralized management. However, it can be costly and may have latency challenges. Serverless deployment, on the other hand, involves running a model without owning or provisioning servers, and is cost-effective for low-traffic applications. Batch processing is another cloud deployment option, which is suitable for tasks that do not require real-time results and can be more cost-effective. Edge deployment, which involves deploying models on devices such as smartphones, is often overlooked but can be a viable option for applications that require real-time processing, privacy, and low infrastructure costs. Edge deployment can be further divided into native phone applications, web applications, and edge servers, each with their own characteristics. Native app deployment has several advantages, including zero infrastructure cost, better privacy, and direct integration with the app, but may have limitations such as phone resource constraints and device fragmentation. Ultimately, the choice between cloud and edge deployment depends on the specific requirements of the project.
towardsdatascience.com
towardsdatascience.com
