Information on Advancing Computer Vision by Berkeley AI Lab: Updates on AI & Technology by Outreinfo
Vision is so natural for humans. Vision is a very, very complicated process, but we don’t register most of it consciously. And so it’s very hard for us to realize why it is such a hard problem for the computer. Many Labs, including the Berkeley AI Lab are trying to understand how we can make computers see and understand our visual world. For example, if you looking at a redwood tree, you are not just looking at the tree. you are looking at it through the prism of what you have seen before. So how do we train computers to see?
A lot of the current computer vision systems are very much about translating the visual world into a few keywords, which of course is quite limiting because a picture is worth a thousand words. The field of computer vision today is undergoing a big shift. We were working really hard for many decades, and we had basically nothing to show for it. The problem turned out to be much deeper and much more exciting than anyone imagined.
The vision part of this grand goal of an AI robot was considered to be a summer project. The idea was that if you write a smart enough algorithm, you will teach the computer how to see. Only much later did we realize that what the computer needs is to connect what it’s sensing with what it has seen in the past.
Alexei’s scientific superpower -Berkeley AI Lab Advancing Computer Vision: One of human superpowers that helped me early on was that their vision is not great. In this case the brain fills in a lot of the missing details, and that was very important for him early on to realize the huge importance of large-scale visual data playing in this enterprise of seeing and understanding. Humans not seeing just with their eyes; they are seeing with their eyes and our memory.
The role of large-scale data: Data is absolutely fundamental to machine learning in general, and computer vision in particular. Everybody is always excited about algorithms, and actually, it’s the data that’s doing a lot of the heavy lifting. So one way to think about what a Berkeley Artificial Intelligence Lab is doing is to give data the appreciation it deserves.
In Berkeley Artificial Intelligence Lab, researchers have done many different research’s connected with visual data: scene understanding, image generation, and editing. Basically, modelling the visual world by creating or modifying it. Their work is actually starting to be in the real world: self-driving cars, of course, software like Adobe Photoshop, and what we call computational photography.
There is a lot of computer vision that happens with the latest smartphones from Google and Apple. So there are two basic paradigms in contemporary computer vision.
The drawbacks of supervised learning: The older and still most prominent paradigm is supervised learning. It’s basically you get a huge amount of images, for example, or a video, and then for every image, some human is going to go and annotate what is on that image. That supervision is often something that can introduce more bias. It comes from the labels. And now you have millions of these labelled images. The neural network is going to try to learn the association between a particular image and the corresponding label, and some of those categories are not very meaningful.
Test-time training: For computers, this problem is more extreme because computers are less able to generalize. In most machine learning algorithms, researchers have a fixed dataset. It might be a very big dataset, you know, billions of images. Researchers let the computer train on all the data in this dataset. And then that model is frozen. Researchers go out into the real world and we just hope somehow we are able to perform well on this data.
One of the projects that we’ve been doing in Berkeley Artificial Intelligence Lab is what we call test-time training, which is the idea that every time Researchers faced with a new piece of data — a new image, for example, we use that image to adopt our model. It’s always updating. When the environment changes, the model hopefully will change with it. So, for example, for self-driving cars, a lot of the self-driving cars are based in California, where it rarely snows, almost never. And then they go to Minnesota. There needs to be a mechanism for the car to somehow adapt.
The future of computer vision: Right now the field of vision computing, researchers are finding things out all the time. The latest text-generative models like Chat GPT have been very exciting. Huge, huge amounts of data give you these almost magical abilities to generalize, to do analogies. A lot of small scale companies have started to get interested in robotics and the connection between robotics and computer vision. They are talking about this interaction between the data and the algorithm and trying to figure out what is happening on the inside. It’s kind of like doing neuroscience on the computer. That might even give us some insights into how biological agents see the world.
For more such Updates on AI & Technology by Outreinfo – Click here