How Large Language Models 'Know'
GPT-4 seems to "know" more factual knowledge than any human. But what does it mean to know something? Apparently the way that a Large Language Model (LLM) knows a fact is qualitatively different from the way a human knows a fact. It is mindblowing to realize that the particular concept we have for what it means to know something is actually an anthropomorphic (human-centric) concept!
For example, ask GPT-4 this question
In Analytica, the feature that automatically maps operations over extra array indexes is called what?
GPT-4 responds correctly 100% of the time (based on 100 trials) with
This feature is called Intelligent Arrays™.
or with a slight paraphrasing thereof. Clearly it "knows" perfectly that Intelligent Arrays is a core feature of Analytica. Or does it? Â Next, in a separate chat ask it
What modeling language has a feature called Intelligent Arrays that automatically maps an operation over extra indexes?
When you ask for the same information, but in this reversed order, it responds with an incorrect answer 89% of the time. A fascinating paper posted to arXiv this week,
Berglund et al. (22-Sep-2023), The reversal curse: LLMs trained on "A is B" fail to learn "B is A", arXiv 2309.12288 Lukas Berglund, Meg Tong , Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak , Owain Evans
found that this unexpected generalization failure is ubiquitous across all language models. This fundamentally challenges how we conceptualize what it means to "know" something. Their catchy name for this phenomenon, "the reversal curse", is sure to become common lingo in research circles.
In addition to experiments with pretrained models including GPT-4, they also fine tuned GPT-3 and Llama-1 on fictional statements such as "Uriah Hawthorne is the composer of Abyssal Melodies" and showed that after mastering this fact, these LLMs could not correctly answer "Who composed Abyssal Melodies?".
The exact reasons for this phenomena will no doubt be a lively topic of debate in research circles in the coming few weeks. On the surface, this appears to be a failure of logical deduction, but that explanation is not really consistent with observations. More likely, this is related to the mechanism of associative retrieval, similar to the way you sometimes cannot remember a certain fact unless your memory was recently primed by a related concept.
Evidence that the associative retrieval hypothesis is responsible comes from the following example. Ask GPT the same question as before, but this time as a multiple choice question: Â
What modeling language has a feature called Intelligent Arrays that automatically maps an operation over extra indexes?
(A) AIMMS
(B) Analytica
(C) AMPL
(D) APL
(E) GAMS
Â
Now GPT-4 selects the correct answer, (B), 100% of the time. The options listed here covered 99% of the responses it came up with to the earlier question without the multiple choice options.
A plausible interpretation is that the options prime the associative recall, after which it is able to access its knowledge about Analytica and then successfully perform the logical reversal of direction.
Currently viewing this topic 1 guest.
- 4 Forums
- 87 Topics
- 286 Posts
- 1 Online
- 1,887 Members