View profile

Goji Berries---GPT-3 and a Joke

July 19 · Issue #28 · View online
Goji Berries
With the release of GPT-3, we have moved from the era of big data to the era of big models. This 175B parameter 370GB (at its top end) model is capable of miracles, see here, here, and here (“… we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans.”). In its honor, this goji berries edition is devoted to AI.

Winograd Schema
Pay attention to what the word ‘it’ refers to.
The large ball crashed right through the table because it was made of steel.
The large ball crashed right through the table because it was made of styrofoam.
These sentences are examples of Winograd Schema, sentences in which the referent of the pronoun or possessive adjective changes when you switch a word. Machines find it hard to pick the right referent in such cases, though the limitation is plausibly overcome by the 175 B parameter GPT-3,
Is 175B a Large Number? Compared to What?
How many potential networks can you have with 30 people? If no one is connected to anyone else—each node is floating in space—that is one network. If everyone is connected with everyone else, that is another network. To enumerate the networks, we need to interpolate between the two extremes.
Here’s one way to answer the question: start by counting potential edges in the network. The answer is n choose 2 or 435. All the networks can be enumerated by independently turning off and on each of the edges. The answer is 2435. This is a number with 131 zeroes. The total number of stars in the universe is around 1022 (100 billion galaxies with 100 billion stars in each galaxy). The total number of atoms in the universe is somewhere between 1078 and 1082.
Even still, when tallying the total number of networks that could be formed, we didn’t take into account when an edge was formed (if an edge was formed). If you include time, the number of potential networks is a yet less believable number. One thing to take from the exercise is that we are all likely on unique paths.
The Price of a Mole of Computation
“At this point, computation costs 10-17 to 10-18 dollars per operation. GPT-3 was trained using 3x1023 operations, which would mean it cost on the order of $1 million to train.” — Noam Shazeer 
No Joke
The most expensive cab ride in mythology occurred when Hades took Persephone to the underworld for six months and Demeter was running the whole time.” — Alexandra
Did you enjoy this issue?
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue