Mistral AI has released Pixtral 12B, a multimodal AI model that combines language and vision processing. Pixtral 12B can analyze images and answer questions about their content when prompted with text. The model is available for download on Hugging Face, GitHub, and via torrent. While details about Pixtral's training data are confidential, it natively supports an arbitrary number of images of any size. With 40 layers, a hidden dimension size of 14,336, and 32 attention heads, Pixtral 12B offers extensive computational processing. It also has a dedicated vision encoder for advanced image processing. Mistral AI's move into multimodal models puts it in competition with AI leaders like OpenAI and Anthropic, whose models already possess image-processing capabilities. However, Pixtral 12B's unique features, such as its support for arbitrary image sizes and quantities, may differentiate it from competitors.
slashdot.org
slashdot.org
Create attached notes ...
