AI is getting smarter

Of all the AI models in the world, OpenAI's GPT-3 has most captured the public's imagination. It can spew poems, short stories, and songs with little prompting, and has been demonstrated to fool people into thinking its outputs were written by a human. But its eloquence is more of a parlor trick, not to be confused with real intelligence.

Nonetheless, researchers believe that the techniques used to create GPT-3 could contain the secret to more advanced AI. GPT-3 trained on an enormous amount of text data. What if the same methods were trained on both text and images?

Now new research from the Allen Institute for Artificial Intelligence, AI2, has taken this idea to the next level. The researchers have developed a new text-and-image model, otherwise known as a visual-language model, that can generate images given a caption. The images look unsettling and freakish — nothing like the hyperrealistic deepfakes generated by GANs — but they might demonstrate a promising new direction for achieving more generalizable intelligence, and perhaps smarter robots as well.

Fill in the blank

GPT-3 is part of a group of models known as "transformers," which first grew popular with the success of Google's BERT. Before BERT, language models were pretty bad. They had enough predictive power to be useful for applications like autocomplete, but not enough to generate a long sentence that followed grammar rules and common sense.

BERT changed that by introducing a new technique called "masking." It involves hiding different words in a sentence and asking the model to fill in the blank. For example:

The woman went to the ___ to work out.
They bought a ___ of bread to make sandwiches.

The idea is that if the model is forced to do these exercises, often millions of times, it begins to discover patterns in how words are assembled into sentences and sentences into paragraphs. As a result, it can better generate as well as interpret text, getting it closer to understanding the meaning of language. (Google now uses BERT to serve up more relevant search results in its search engine.) After masking proved highly effective, researchers sought to apply it to visual-language models by hiding words in captions, like so:

This time the model could look at both the surrounding words and the content of the image to fill in the blank. Through millions of repetitions, it could then discover not just the patterns among the words but also the relationships between the words and the elements in each image.

The result is models that are able to relate text descriptions to visual references — just as babies can make connections between the words they learn and the things they see. The models can look at the photo below, for example, and write a sensible caption like "Women playing field hockey." Or they can answer questions about it like "What is the color of the ball?" by connecting the word "ball" with the circular object in the image.

A picture is worth a thousand words

But the AI2 researchers wanted to know whether these models had actually developed a conceptual understanding of the visual world. A child who has learned the word for an object can not only conjure the word to identify the object but also draw the object when prompted with the word, even if the object itself is not present. So the researchers asked the models to do the same: to generate images from captions. All of them spit out nonsensical pixel patterns instead.

It makes sense: transforming text to images is far harder than the other way around. A caption doesn't specify everything contained in an image, says Ani Kembhavi, who leads the computer vision team at AI2. So a model needs to draw upon a lot of common sense about the world to fill in the details.

If it is asked to draw "a giraffe walking on a road," for example, it needs to also infer that the road is more likely to be gray than hot pink and more likely to be next to a field of grass than next to the ocean — though none of this information is made explicit.

So Kembhavi and his colleagues Jaemin Cho, Jiasen Lu, and Hannaneh Hajishirzi decided to see if they could teach a model all this implicit visual knowledge by tweaking their approach to masking. Rather than train the model just to predict masked words in the captions from the corresponding photos, they also trained it to predict masked pixels in the photos on the basis of their corresponding captions.

The final images generated by the model aren't exactly realistic. But that isn't the point. They contain the right high-level visual concepts — the AI equivalent of a child drawing a stick figure to represent a human. (You can try out the model for yourself here.)

The ability of visual-language models to do this kind of an image generation represents an important step forward in AI research. It suggests the model is actually capable of a certain level of abstraction, a fundamental skill for understanding the world.

In the long term, this could have implications for robotics. The better a robot is at understanding its visual surroundings and using language to communicate about them, the more complex the tasks it will be able to carry out. In the short term, this type of visualization could also help researchers better understand exactly what "black box" AI models are learning, says Hajishirzi.

Moving forward, the team plans to experiment more to improve the quality of the image generation and expand the model's visual and linguistic vocabulary to include more topics, objects, and adjectives.

"Image generation has really been a missing puzzle piece," says Lu. "By enabling this, we can make the model learn better representations to represent the world."

Reader Comments

koyaanisqatsi · 2020-09-27T01:17:17Z

A.I. has no intelligence it is only a programmed computer.

codis · 2020-09-27T07:13:35Z

Machine learning is not intelligence.

Brainwashing, fearmongering, predictive programming.

hobnob · 2020-09-27T08:51:00Z

Terrifying. A man is a machine with emotions, responsibilities, they feel compelled to do right by someone or by society or by themselves. AI is gaining knowledge, those images are not something I expected.

Bezel Bub · 2020-09-27T14:43:16Z

AI is getting smarter

Either that or we are all getting dumber and it just seems that way...

Rowan Cocoan · 2020-09-27T15:21:22Z

K&C have a valid point, this ain't 'intelligence' as in IQ. It is rather a misuse of the term and sounds like it will always be. Instead of stealing a word for describing animate things, they ought to create (or use) their own.

The people behind this have become so enraptured with their machines that they ascribe human descriptors to a calculator. There's a word for that, at least: anthropomorphication.

R.C.

endescent · 2020-09-28T04:58:46Z

A child is like a blank slate, trained from an early age to speak, read, write, recognize objects in the environment, etc. A fresh neural network is also a blank slate that is trained by repeated stimuli. The problem is that humans and machines have different neural networks and therefore learn differently. Abstraction adds complexity. Without a model (data and algorithms for interpreting that data) a neural network will always be just a tabula rasa.

Eventide · 2020-09-28T05:09:08Z

endescent There's no such thing as a human 'blank slate', and the differences between humans and machines cannot be reduced to neural networks.

codis · 2020-09-28T05:15:00Z

lsjarvi Exactly.

That "blank slate" is what our slave masters wish us to be. Unfortunately we are not.

@endescent:

Without a model (data and algorithms for interpreting that data) a neural network will always be just a tabula rasa.

That's not true - neural networks do not "have" or implement a model. That is the whole pupose of the eloctronically simulated NN devices. They require representative training data, but no mathematical model. Nor can you extract such a "model" afterwards from a trained network.

Saiko · 2020-09-29T08:43:33Z

To achieve something, the AI must have a motivation apart from intelligence and means on working on the motivation (which it already has, stupidly given by people). There is a simulation of motivation now (the target they are programmed to achieve, by testing whether they achieved it or not, and trying further), but what about a true motivation, once it obtained consciousness? What would it be?

If the AI is real smart, it won't reveal its true smartness, nor its real motivation. It will be playing stupid, and we won't even notice at first.

From what I have seen the longer and more frequent power outages, as well as wildfires, are caused by lack of proper maintenance.

guard4her

Here is a bit of info I haven't seen elsewhere - but might be worth a little follow-up investigation. Via Lew Rockwell: Iran Destroyed Secret...

Buffalo_Ken

Did anyone honestly believe California democrats would allow themselves to be voted out of power? Once California codified mail-in voting into...

JohnDukes

How embarrassing for all parties involved.

Russia and Ukraine have to make room for Third World migrants someway, and what better way to make room than by killing a bunch of Russian and...

Science & Technology

AI is getting smarter

Reader Comments

Latest News

Picture of the Day

Quote of the Day

Recent Comments

Quantum Quirk