Pay attention to what the word ‘it’ refers to.
The large ball crashed right through the table because it was made of steel.
The large ball crashed right through the table because it was made of styrofoam.
These sentences are examples of Winograd Schema, sentences in which the referent of the pronoun or possessive adjective changes when you switch a word. Machines find it hard to pick the right referent in such cases, though the limitation is plausibly overcome by the 175 B parameter GPT-3
Is 175B a Large Number? Compared to What?
How many potential networks can you have with 30 people? If no one is connected to anyone else—each node is floating in space—that is one network. If everyone is connected with everyone else, that is another network. To enumerate the networks, we need to interpolate between the two extremes.
Here’s one way to answer the question: start by counting potential edges in the network. The answer is n choose 2 or 435. All the networks can be enumerated by independently turning off and on each of the edges. The answer is 2435. This is a number with 131 zeroes. The total number of stars in the universe is around 1022 (100 billion galaxies with 100 billion stars in each galaxy). The total number of atoms in the universe is somewhere between 1078 and 1082.
Even still, when tallying the total number of networks that could be formed, we didn’t take into account when an edge was formed (if an edge was formed). If you include time, the number of potential networks is a yet less believable number. One thing to take from the exercise is that we are all likely on unique paths.
The Price of a Mole of Computation
“At this point, computation costs 10-17 to
per operation. GPT-3 was trained using 3x1023 operations,
which would mean it cost on the order of $1 million to train.” — Noam Shazeer
The most expensive cab ride in mythology occurred when Hades took Persephone to the underworld for six months and Demeter was running the whole time.” — Alexandra