Etsy Engineering | Code as Craft

Scaling Etsy Payments with Vitess: Part 3 – Reducing Cutover Risk

Etsy Payments moved 40 billion rows across 23 tables into a Vitess-managed sharded environment, using vindexes for data sharding. This article focuses on errors that can arise during the cutover. Understanding Vitess's transaction modes is crucial. Single mode maintains atomicity, but multi mode can lead to partial commits. Two-phase commit mode is experimental and not recommended. Reverse VReplication ensures data synchronization between the unsharded and sharded keyspaces after cutover. It can break due to unique key enforcement, requiring fixes such as row deletion or manual Pos column updates. Scatter queries, where the sharding key is omitted from the WHERE clause, can result in excessive query volume and potential outages. Vitess now offers a --no_scatter flag to prevent them. Incompatible queries can fail after cutover. Exhaustive testing in a development environment is essential to identify and resolve such queries. Other potential errors include those related to unsupported SQL constructs, which can be addressed by upgrading to newer Vitess versions. Despite these risks, cutovers are generally reversible, provided that reverse VReplication functions properly. However, the impact of any disruption should be carefully considered.
favicon
etsy.com
etsy.com