I came across a concept these past few days: LVM, which stands for Large Vision Model. Of course, I am not a professional in this field, so I'm just learning about it at a basic level.
A Large Vision Model (LVM) is an artificial intelligence model that focuses on processing and understanding images. Similar to how Large Language Models (LLMs) are collections of text-based intelligence, LVMs are collections of images containing various classification stages of objects and things.
Professor Andrew Ng's sharing is as follows:
Professor Ng mainly discusses domain-specific LVMs, and their official website is: https://landing.ai/. Domain-specific refers to models trained using a set of specific images related to particular industries or fields, such as agriculture, medical devices, manufacturing, etc. The domain specificity of these models allows them to be trained using private images from enterprises, potentially numbering in the tens of thousands, millions, or even billions.
The development of LVMs is seen as a revolution in the field of image processing, much like how the development of LLMs has changed the way we handle text. However, there is a key difference between the two: while the content LLMs learn from internet texts is usually similar enough to most enterprise texts for the model to be applicable, many companies' proprietary images differ significantly from typical images found online.
After all, I am not an expert in the industry, so today I just learned a little about it. 😓