Department of Philosophy - University of Genova - Italy
Holism in Artificial Intelligence?
(longer version of a paper published in Philosophy of Science in Florence edited by M.L.Dalla Chiara, R. Giuntini, L. Laudisa, forthcoming - Acts of The tenth International Congress of Logic, Methodology and Philosophy of Science, Florence, August 1995] .
In the discussion on semantic holism it has been claimed that A.I. is almost entirely holistic. In this paper I show that some of the main lines of research in symbolic artificial intelligence are not holistic; I will consider three classical cases: toy words, frames and contextual reasoning. I claim that these examples from A.I. can be interpreted as implementing molecularist intuitions about language. Eventually I suggest that some assumptions behind the discussion on holism should be re-interpreted, expecially the usual references to Frege and Wittgenstein.
0. Meaning Holism and Artificial Intelligence
Meaning holism  is an inference from two classical premises, yielding an undesirable consequence. The two premises are:
(i) the meaning of an expression is its role in the system of language (from Frege: the meaning of a word is its role in the context of a sentence; and from Wittgenstein: understanding a sentence is understanding a whole language)
(ii) there is no principled distinction between definitional (analytic) and empirical (synthetic) features of language (from Quine's Two dogmas of Empiricism).
The suggested inference is:
(iii) the meaning of an expression is determined by its whole linguistic role: the linguistic role of an expression is the totality of inferences connected with it; but, given (ii), there is no way of restricting the set of inferences and beliefs connected with an expression; this amounts to say that meaning depends on the whole language.
The consequence is
(iv) if the meaning of a word depends on the totality of our beliefs, communication and learning become impossible: two people who do not share all their beliefs (or all the information in the knowledge base) cannot give the same meaning to the same linguistic expression. Therefore, they cannot disagree, or understand each other, because they do not share the same meanings. They cannot communicate because their belief systems are essentially incommensurable.
The problem -- Fodor and Lepore claim that A.I. is "almost everywhere holistic"; such a sweeping contention needs an explanation .Before Fodor and Lepore's criticism , holism was a very widely accepted position, though often very vaguely explained, both in connectionistic and in symbolic A.I. systems. We need to carefully distinguish between vague acceptance of holism and the implementations given in the work of A.I. researchers. I shall not consider connectionist systems, but I shall confine my remarks to symbolic A.I. Contrary to what Fodor and Lepore claim, I will suggest that mainstream symbolic A.I. is an attempt to implement molecularist theories of meaning (theories where the meaning of an expression is determined not by the overall system of language to which the expression belongs, but by subparts of it). I will briefly consider three classical cases of A.I. research: (1) procedural semantics as implemented in toy worlds, (2) semantic networks and frames, (3) contextual reasoning. I will try to show that in all these cases we find evidence against an holistic view of meaning, and hints towards a molecularist view that does not degenerate into holism.
As far as the two premises of so called "holism" are concerned, I will not discuss, except sporadically, the problem of the analytic-synthetic distinction; I will discuss the interpretation of the first premise, eventually trying to cast new, anti-holistic light upon the quotation by Wittgenstein, so often used in favour of holism.
1. Meaning as Procedure - The Case of Toy-Worlds
The first theory of meaning devised by artificial intelligence was the idea of meaning as procedure. The idea itself has been variously developed, both in psychological terms (e.g. Johnson-Laird) and as a general theory which should integrate model theoretical semantics (Woods)[see Penco 1992]. Nevertheless, the original formulation's core remains untouched. Such a core is the idea that a representation of the meaning of an expression is given by the procedure attached to that expression. In Winograd 1972, this idea appears very similar to Frege's idea of sense [see Marconi 1992], supplemented with Austin's theory of speech acts. As in Frege the sense of an expression is the way to give its reference, so in Winograd the procedure attached to an individual term is the way to give the object, the procedure attached to a predicate is the way to give the class, and the procedure attached to a sentence is a command for the program to do something (storing information, answering a question, perform a certain action). If you need to pick up a red cube, the procedure for "the red cube" selects from the set of cubes the one which is red. The imperative mode of "pick up a cube" activates the procedure attached to the verb "pick up" and actually makes the robot pick up the cube. Most important, procedures are compositional, as Fregean senses are intended to be.
These early results are not easy to interpret in the light of our present concern, particularly because there is an apparent tension within Winograd's views on meaning given in his early papers: on the one hand, he explicitly assumes an holistic vision of meaning; on the other hand its technical apparatus devised to represent meaning seems to be a molecularistic one (if not strictly atomistic). It is possible to accept the technical devices, while rejecting his metaphysical assumption.
Winograd's overall picture of meaning appears to be explicitly holistic: "the meaning of any concept depends on its interconnection with all the other concepts in the model" (1973, p.167). Given that the task of defining "meaning" as such is impossible, Winograd suggests that "rather than trying to prove things about meaning we can design procedures which can operate with the model and simulate the processes involved in the human use of meaning". Therefore, procedures are meant as an alternative to meaning understood in vaguely holistic terms; they should represent the "use" of concepts. All language use can be thought of as a "way of activating procedures". An apparent result of this strategy should be a shift from a theory of meaning to a "model of understanding".
Nevertheless, on the other hand, Winograd continues by speaking of "meanings"; he suggests that an expression as "pick up" "has different meanings depending on whether it refers to a single object or several" (P.174) (in the latter case it means "put away"); at other times he speaks directly of "meaning" as a set of procedures (e.g. "the different possibilities for the meaning of "the" are procedures which check various facts about the context, then prescribe actions such as "Look for a unique object in the data base which fits the description"..."(p.175)
We could conclude that in Winograd's work meanings are represented by procedures plus their context of use. We have for instance the same procedure which is used to express the meaning of two different words, "pick up" and "put away"; the only difference is the context in which the procedure is used, in one case the context of the action of picking up a single block, in the second the context of picking up all the blocks. What shall we conclude? We might say that the procedure represent a basic core meaning and, depending on the context and way of application, it represents two different meanings or two different aspects of meaning. Winograd is not completely clear on this matter. But it is seems clear, anyhow, that procedures are constant and do not depend on the whole system. They are written as definite LISP programs and always perform the same steps when activated. Procedures represent a sort of "core" meaning, which gives the basic strategy for using a word. This core may be articulated, depending on the context, to express what may be considered different aspects of meaning.
There is something definitely anti-holistic in this representation of meaning: meaning-as-use is represented as a specific procedure attached to a word. Procedures run in the same way in all different contexts, and we cannot say that they depend on the overall system. We might say that, notwithstanding Winograd's adherence to holism from a philosophical point of view, his meanings-as-procedures are more easily identifiable with an atomistic or molecularistic stance: we cannot make the meaning dependent on the entire system. The meaning is always relative to a procedure + the specific context in which it runs. The result is some kind of molecularistic interpretation: meaning depends on a procedural core, which remains fixed, and on different applications of this core.
Winograd 1981 has analysed the value and the limitations of his early theory. Among the limitations we find the inadequacy of the definition of meanings of words and the difficulty of dealing with commonsense knowledge. These may be considered as two main problems that lie behind this project, casting doubts on such a representation of meaning:
(i) The relative flexibility of the system is based on rigidly defined procedures. But most of our concepts are often vague, and they have often many different applications. A single rigid procedure attached to a predicate cannot represent the vagueness of most of our concepts (even if some step in this direction has been done with the idea of meaning = procedure plus context).
(ii) The analysis of language is restricted to simple artificial situations (toy-worlds). But we need to analyse language in real situations. A knowledge base that would be adequate to real situations would be too big and complex to be used by a program without running the risk of combinatorial explosion
The A.I. milieu reacted to these limitations with ingenuity. And the response to these limitations has also thrown, or may throw some light on the problem of a definition of meaning. The following sections deal with limitations (i) and (ii) respectively.
2. Meaning as Stereotype - The Case of Frame Systems
Minsky 1975 deserves credit for introducing the notion of "frame". Although Minsky's early formulation of frames as conceptual structures with default values was rather vague, it was accepted and made both more precise and logically expressible in different formalisms: Winograd himself used similar ideas in KRL (Bobrow-Winograd 1977), KL-ONE family of semantic networks developed a theory of default values, Hayes 1979 translated the idea of frame into logical formalism; after these results the idea of frame and frame-nets has become a common tool in A.I. and has been described in most textbooks of LISP or of introduction to A.I. I will give a highly abstract characterization of it, in order to make some theoretical points, relevant to the theme of holism.
In early semantic networks, such as Quillian 1967, in order to understand the meaning of a word it is necessary to go through the entire net (e.g. to check all the activated nodes to compare the meaning of two different lexical items); after the discussion of the idea of frame and of default value, the situation seems changed.
Let us consider the concept of "tiger". A frame for "tiger" may be seen as a set of slots with default values, such as
The first slot is an ISA arch, which links the concept to its immediate superconcept in the net. Each other slot is given a default value, but it may have other alternative values (the ones after the _), which are activated depending on incoming information. In this way we can express the idea that there is no set of necessary and sufficient conditions for something to be a tiger (Wittgenstein's ideas on family resemblance predicates are quoted by Minsky as one of his sources in devising the idea of a frame). E.g. a tiger may become vegetarian, may have white and black stripes (if it is an albino tiger), may have three legs (if it has been wounded), but it still is a tiger.
Frames are part of a frame system, where each value of a slot may be [connected with] another frame. However, the default representation provided by the frame can be understood as a fixed meaning given to the lexical item representing the concept; this meaning is supposed to be the one that best represents the average use of the expression in the linguistic community . But other uses and other meanings may be attached to the same expression: other properties, instead of the default properties, may be the values of the slots of the frame. We may have them already enclosed in a set of possible values (e.g. the stereotypical table has four legs, but it may have also three legs: in the "legs" slot there will be a default value "4", but different values are allowed). The information of the context will help to decide which values to choose.
This does not amount to say that default values and the other possible values embedded in a frame representation are what is common to all speakers: probably there is no set of information which is the intersection of the information available to all speakers belonging to a linguistic community. But certainly meaning - average use - is determined by the majority, as influenced by experts and their definitions. In case of conflict people defer to experts. Therefore, a certain stability in the definition of meaning reflects the practice and the needs of a linguistic community. Frames with default values are stable structures; the availability of alternative values for each slot of a frame is what gives new life and flexibility to the idea of meaning.
We might say that the idea of frame brings out the difference between a general objective representation of a concept and different ways of approximating it; as Frank 1995 claims, different combinations of the values of a frame may be considered somehow "private combinations", which are built upon a more stable "lexical content" (the "stable information represented in the mental lexicon"). We do not necessary need to speak of "mental" lexicon here; we just need to distinguish between different, idiosyncratic ways of understanding a concept, on the one hand, and the stable function of concepts in interpersonal intercourse, on the other. We need stable concepts specific enough to be used as starting point for different uses of the expression which is used to express them; a set of default values which is intended to give the stereotype is a good approximation to this kind of stability.
The stereotypical definition of a concept may be intended as representing the meaning of the predicate used to express the concept. But, even if we accept that a stable, stereotypical definition of meaning is given by the interplay of average use and deference to experts, it is still unclear to what extent individual frames depend on the overall system. As we have already remarked, we find the idea of global connections at the origin of semantic networks: in Quillian's perspective differences in meaning between two lexical items were individuated with the differences of all the connections activated with other nodes of the network, starting from the set of links immediately defining the items. From this point of view, in order to understand the meaning of an item, and compare it with another, we need to run the entire data base. But this sounds very uncomfortable Although this idea ("spreading activation") seems fascinating, there is something strange: intuitively we do not need to understand the entire system of lexicon in order to understand a sentence like "Peter killed the tiger"; we need to know just a few things about tigers and killing. This intuitive attitude is reflected in the restrictions that are imposed on artificial systems: if we are building a frame system, we must take care that individual steps in its computations do not involve the whole system, on pain of combinatorial explosion. A technical requirement deriving from the finite nature of our machines leads to restrictions on our representation of meaning. In A.I., we have at least two strategies for dealing with this problem:
(a) One strategy consists in making a difference between conceptual definitions and factual information. This kind of research is somehow an attempt at recovering a viable distinction between sentences (or inferences) that are necessary to define meaning and sentences that are not; it is an attempt at performing the role of the distinction between analytic and synthetic sentences, which had been so effectively criticized by Quine . Obviously, this attempt must face formidable difficulties. KL-ONE systems, for instance, have proposed a distinction between the assertional and the definitional; however, they do not explain the distinction but rather presuppose it (see Marconi 1994 ). Moreover, the KL-ONE distinction between definitional and assertional remains rather vague and does not represent a principled distinction.
A similar strategy had been already envisaged by McCarthy in his early work. In his "Advise taker" project (1968) he defined the concept of "immediate inference": in order to understand a situation we do not need to make explicit all the inferences from the relevant premises, but only their immediate consequences, beginning with the inferences which require just one step in the deductive process. The idea is barely sketched; however, it does point to the necessity of controlling the risk of combinatorial explosion of inferences derivable from a single premise (see also Cozzo 1994).
What do we count as "principled"? Priciples which define the distinction between two kinds of inferences might derive from the needs of implementation. If machines are limited artefacts (as we are limited agents), we need, at least for computational economy or computational necessity, to distinguish basic semantic information from factual idiosyncratic information, or basic information about the literal meanings of words from information about different applications. If we abandon the principles behind the analytic-synthetic distinction, we may still find some pragmatic principles which justify the distinction between two sorts of inferences, the one defining basic uses of words, the other defining occasional applications of them.  A philosophical attitude of this kind is well followed in the practice of A.I.: any "viable" (=compatible with an A.I. systems) representation of meaning as inferential role is bound not to include all (or most) possible inferences. Many A.I. programs may be considered as attempts to respect this restriction in defining meaning and understanding. A clear example is the work of Norvig 1989, who designs an algorithm computing a limited set of proper inferences quickly, without computing all types of inferences. Proper inferences are defined as plausible, relevant, and easy. Quick computation of a small set of proper inferences yields a partial interpretation, which can be used as input for further processing.
(b) Another strategy consists in defining, with some degree of arbitrariness, a set of relevant contexts: if we are speaking of tigers and refer to somebody shooting a tiger, we need some idea of what happens in big-game hunting, but we do not need to know anything about skiing or going to a restaurant. This strategy is reflected in the practice of partitioning a semantic network into an organised, hierarchical structure, where superconcepts control sets of concepts under their nodes. Hereditary aspects, widely discussed in the literature (see Frixione 1994), do not entail that in order to understand the meaning of a lexical item you need to go through the entire net: the path of hereditary relations is followed inside a specific semantic field , and what happens in other semantic fields is not at all relevant to understanding or defining the meaning of the items in the field under consideration. Information about birds (e.g., that penguins don't fly) is not relevant to understanding what a tiger is.
Shank's scripts are another classic example of this kind of strategy. The idea of a script (think of the famous script "restaurant") is just the idea of encoding a certain amount of information in a single unit, somehow autonomous and not depending on other information. High level representation such as scripts are treated in formalisms which use, to a certain extent, a top-down representation of knowledge, where procedures (demons) are used to activate the script or scenario relevant to a given situation (the first to think of a connection between frames and procedures has been Winograd 1975, with the use of procedural attachments to frame declative symbolisms; among many examples see BORIS, by Lenhert and others 1983).
In both cases (a) and (b), frame systems can be seen as representing the average use of words in a language. They might include links to experts' knowledge: part of commonsense knowledge about the world is the acknowledgement that experts exist for most of our fields of interest. Admitting a link to scientific definitions of each term is a representation of such an ability to defer. This representation must presuppose individual sites for scientific definitions, and the availability of such sites to people looking for more precise information and definitions. In both cases, what counts as semantic competence is relatively fixed. We must assume some "idealisation" of the speaker, which does not amount to claiming that speakers share exactly the stereotypical meaning, the set of properties stereotypically defined. If the representation of meaning is the representation of the practice of a language in a community, the stereotype is the set of features which are the most constant in this practice.
Indeed, some problems seem untouched by the strategy of the frame system: understanding a language does not consist only in the mastery of different concepts; on the contrary, sometimes we have to understand a speech which requires an amount of information that seems to be difficult to extract from a set of frames or from a script. Such limitations of frame analysis have been pointed out by Haugeland (1979), who says that "common sense can be organised only partially according to subject matter. Much of what we recognise as "making sense" is not "about" some topic for which we have a word or idiom, but rather about some (possibly unique) circumstance or episode." The strategy of contextual reasoning -the last topic in our brief survey of non-holistic views in symbolic A.I.- is intended to face these limitations.
3. Meaning as Use in a Context - The Case of Contextual Reasoning
The idea behind the strategy of toy-worlds was that language is strictly intertwined with action, so that we need to know a lot about the world in order to use language. Therefore, programs like SHRDLU had vast knowledge of a limited world (a toy-world) so that we could analyse the workings of language as in a thought experiment, without all the complications of interaction in a real situation. But we can also think of the strategy of toy worlds as a paradigm of how language works. We may imagine the general workings of language as split among different toy worlds, each with its own language and basic knowledge.
The main limitations of the idea of toy worlds was the impossibility of passing from a limited representation of a single small set of information to a general representation of the complexity of the world. But with such a re-interpretation, we may see that these old fashioned A.I. programs still gives us suggestions on the development of A.I. The main idea could be expressed in the following way: instead of relying only on a network of frames, in order to understand the meaning of a sentence we must consider sentences within something like toy worlds. The representation of our knowledge and our language cannot be given once and for all in a single system, but must be given in different systems connected to each other. We find today many general frameworks which can be considered an implicit development of this suggestion (see in generale AkmanSurav 1996 and Bouquet 1997):
(i) projects like CYC, developed initially by Lenat, which propose the construction of a very big knowledge base organized in micro-theories, which represent defined knowledge on an aspect of the world. A logical foundation of this kind of approach is found in Guha 1991.
(ii) Mental spaces, a theory originally proposed by Fauconnier and developed by Dinsmore 1991 in a formalism based on the idea that knowledge is partitioned in logical spaces.
(iii) Multi-context theories by Giunchiglia 1993, which rely on intuitions early given by Weyhrauch and by McCarthy 1967,1993. In multi-context theories (M-C theories from now on) a context is a complete description of a particular set of objects and actions formally given as a triple <L,A,R>, where L is a language (with a vocabulary of words used in the context), A is a set of axioms (a body of specific information), and R set of inference rules. Each context is therefore represented as a single complete formal system.
A first reaction to this strategy could be that the problem of holism remains untouched: there is just a shift from "big" languages to "small" languages: no evidence of any anti-holistic attitude is given in this kind of work. I think that this conclusion in not sound and I will try to show some anti-holistic suggestions given in these kinds of approach. I will refer hereafter to the third line of thought given above. First of all, we have to consider the relation of M-C theories to the earlier frame theories. In principle, there is no conflict between frame analysis of the lexicon and M-C theories devised for dealing with commonsense reasoning. Because of their original interest in reasoning and action, M-C theories lack a general concern for lexical semantics and the representation of lexical meaning. However, M-C theories could be enriched with the idea of concepts as frames; actually, contexts might play also a role similar to Schank's Scripts. The main difference in this case could consist in the formalization: first-order logic allows greater simplification and generality of the system.
All this looks highly programmatic. Even if the discussion on lexical meaning is not yet fully developed, however, some peculiar features of multi-context theories are worth noting for the purpose of our discussion, and give some conceptual tools which could help the building of a molecularistic stance in A.I.
(i) locality - With M-C theories, we abandon the idea of a single language to which all sentences belong; there are languages, and the same sentence (the same string of characters) may belong to different languages: languages are embedded in contexts, each of which deals with a special part of the knowledge as is normally organized in a linguistic community. The deeply anti-holistic intuition behind this move was clearly expressed by Giunchiglia (1993): "Reasoning is usually performed on a subset of the global knowledge base; we never consider all we know but only a small subset". Any system which does not capture this point suffers from what Giunchiglia calls the "problem of (lack of) locality" (Giunchiglia 1996 is mirroring the original point given by McCarthy 1987 of the "problem of (lack of) generality".)
(ii) travelling through contexts - All contexts are on the same level (there is no super-context), so that relations among contexts can be dealt with easily. Simple algorithms implement rules for (a) entering and exiting a context, (b) taking some elements from a context (or a whole context) into another; (c) sharing inference rules among contexts.
(a) entering and exiting a context is the first operation given in McCarthy 1987 to show that in any assertion a context is always referred to; while making a derivation inside a context, we may enter the context, make the derivation and assert the conclusion as valid relative to the context. Exiting the context, the conclusion itself must be referred to only with an index for the context in which it has been derived.
(b) can be treated with forms of lifting, expressible as <<Lift (c1,c2) Æ ("x (true (x1,C1) Æ true (x1,C2))>>. This means that you may take what is true in a context C1 as true inside another context C2. This step (which could be done also with subparts of contexts) is fundamental for (iii) below.
(c) was developed by Giunchiglia 1993 with the technique of bridge rules: they are rules which permit to pass from a premiss in a context, through some steps, to a conclusion which is valid in another context.
(iii) dealing with individual situations - How can this conceptual machinery face the problem of commonsense reasoning in individual, not necessarily typical situations? A possible answer is the idea of "working context". The rationale for this idea is that, when we face a situation, we pick up information from different contexts; formally, we construct a working context by "lifting" axioms and rules from the relevant contexts. Thus we can deal with individual situations without having to take the whole data base into account. Facing an individual situation, we pick up from different contexts exactly what is necessary to understand and solve the problem in question. If unexpected obstacles were to arise, other contexts could be lifted into the working context. This idea is a development of the idea of "default" worked out in semantic networks: facing a problem we find a default solution; e.g., in order to fly from London to Moscow we only need a small amount of knowledge concerning flying; if we discover that we have lost our ticket, or our luggage, or our clothes, more knowledge must be taken into account, knowledge which we would normally disregard in drawing our normal inferences).
(iv) comparing meanings - What happens when we want to compare two expressions belonging to two different contexts? May we have a conception of identity of meaning through contexts? We may only invent suggestions of such a problem not already solved in this approach. We have an apparent alternative (but intermediate solutions are possible): a) we decide to have some stable contexts, or definitional contexts, where an expression takes its most basic meaning, which might be organized on the general pattern emerged in the discussion given in frame nets. These definitional contexts should be able to be lifted in different relevant contexts. We might therefore speak of sameness of meaning when a word used in two different contexts belongs to a definitional context which has been lifted in the two contexts (which might represent sets of beliefs). b) we decide to allow only compatibility relations between expressions. We might say that two expressions are compatible if they may be used in the same contexts. But two expressions cannot be considered identical in meaning if they are used in different contexts.
The first strategy, more traditional, seems more apt to ensure a safe way towards a theory of meaning which uses the concept of basic meaning of an expression to be a stereotypical representation which can be shared among contexts. This strategy could require a global vocabulary organized before the definition of the different languages for each context. The lexicon of a single language might partly depend on the definitions in the global vocabulary. On the other hand, if we follow the second strategy, we have to give up speaking of meaning or we have to run the risk of a local-holistic approach, where the meaning is defined relatively to the entire context in which the expression is used. In this case it is not possible to define identity relations between meanings, unless we have identity of contexts. We might have no global vocabulary, but just a list of expressions to be interpreted in each context.
The discussion of lexical semantics through contexts has therefore to clarify some of its basic choices; but, although some doubts about holism are still undecided, the general strategy seems open towards some kind of molecularistic stance.
A remark on the interpretation of Wittgenstein as an holistic thinker.
In the three cases we examined, we found strong evidence against the claim that, in A.I., meaning and understanding are conceived holistically, i.e. as depending on the whole system. Each attempt has flaws and limitations. However, it should not be impossible to seek an integration, picking up from the three different settings their original solutions, and making them interact with one another: from the first, the idea that procedures may be a good way of representing certain aspects of the use of language (such as recognition procedures or activation of reactions to different speech acts), from the second the idea that we may have a stereotypical set of information which represent a basic or core meaning of an expression; from the third, that meaning is defined not on the entire system of language as such, but relative to single sublanguages (or contexts).
These ideas show strong similarities with well known aspects of Wittgenstein's philosophy: the first with the idea of meaning as use; the second with the idea of family resemblance predicates; the third with the general idea of language games. I have already hinted at the first two similarities and I will dwell with the third in these last remarks. However my point is the following: if these similarities are sound, and my analysis of these trends in A.I. is correct, Wittgenstein should be regarded as very near to a molecularistic stance. Nevertheless Wittgenstein is often considered an ancestor of holism. How might we account for such a drastic difference in interpreting one of the leading philosophical figure of our century?
The general interpretation of Wittgenstein as one of the main ancestors of holism, seems to come from some misunderstanding. Scholars like Davidson and Rorty quote Wittgenstein as a thinker who influenced them in direction of total holism. But their interpretation seems too easy: the quotation which is often appealed to does not give the whole truth: "To understand a sentence means to understand a language" is just a slogan (Wittgenstein 1953,§199;Quine 1960 quotes Blue and Brown Book, p.5). To give this slogan a more defined meaning we have to put together two central ideas of Wittgenstein's vision of language:
1) language as a globally structured system does not exist; there are language games, intended as complete languages; in order to understand some of them I have to learn the most basic ones.
2) "to understand a language means to be master of a technique" (this is the sentence which follows the above quotation and is normally neglected); mastering a technique is being able to follow rules, and rules are all on the same level.
The first idea alone is sufficient to take away all weight of the suggestion of a Wittgensteinian holism. Wittgenstein's language games are each a kind of complete language, even if restricted to a very limited situation of action. Wittgenstein himself treated language as a network of different languages: each language game in its autonomy is regarded as a complete language in itself. Therefore the quotation "understanding a sentence is understanding a language" does not necessarily leads toward holism and it permits a different interpretation; the one given by Dummett is based on the following idea, which sees to adhere better to the idea of language games as developed by Wittgenstein: "a sentence cannot be understood in isolation; it can be understood only as a part of a language. This need not, however, be the whole language (...) It may be only a fragment of the language; but that fragment must be one that could be the whole of a language" (Dummett 1987,p.233,p.251). However Dummett's interpretation is a strange compromise: he declares that a "minimum holism" is indispensable; we should accept holism in respect of different sublanguages, which are able to be strong enough to be represented as entire languages. But holism relative to a sublanguage or to a context is still holism, with its consequences: if we represent sets of beliefs as contexts, it would be almost impossible to represent two people with exactly the same set of beliefs; and if meaning is holistically defined relatively to contexts, and two people cannot share exactly the same contexts, they do not share the same meanings. But do they have to?
We have to distinguish carefully here between a representation of meaning and a representation of understanding. A representation of meaning might follow the lines of a stereotypical representation with default values as we have seen in the discussion of frames: here we have a representation of something socially shared in a language as a social product. But understanding is related to individual ability to use the language; and we cannot make the assumption that every speaker grasp exactly the same stereotype defined in the language as a social product. Even if we accept that there is "a common store of thoughts" and concepts which are transmitted from one generation to the next (Frege 1892), we must recognise individual failures to take note of such thoughts and concepts. We need a weaker definition of understanding: a person understands a concept if he/she understands some of its (plausible, relevant, and easy) inferential relations.
The alternatives to holism are atomism and molecularism. According to the former, meaning is defined atomistically for each single word. For the second, meaning is defined by subparts of language, not by the whole language. Molecularism gives therefore the background for a non-holistic definition of understanding.Molecularism has face the challenge posed by Fodor and Lepore (the challenge being: either molecularism collapse into holism or it is obliged to adhere to a rigid distinction between analytic and synthetic propositions). An aswer to the challange was made and formulated in a very precise way by Perry 1994 and Marconi 1997 in their discussion of Fodor and Lepore's definition of anatomism ; grossly speaking the answer is formulated as follows: in order, for two people, to understand a sentence P, it is not necessary that exist some set of sentences which two people have to share, but it is necessary that two people share some set of sentences. This solution gives a molecularistic theory of understanding (semantic competence) which seems to escape the criticism given by Fodor and Lepore, and nicely fits our requirements of a weak definition of understanding.
This interpretation of molecularism (and of a molecularist view of semantic competence) seems to be coherent with Wittgensten's ideas. Furthermore it is coherent with the key passage where Wittgenstein explains his idea that understanding a sentence is understanding a language, the passage quoted in (1) above. The passage is highly programmatic; nevertheless it suggests a picture where understanding is not equated to having a grasp of a huge totality of information. It is possible to make a comparison with Wittgenstein's stress in philosophy of mathematics against the idea of actual infinity: real numbers are not infinite extensions, but laws for producing extensions; ¿_ does not represent a huge infinity of numbers, but it is the symbol of a technique: understanding natural numbers does not amount to representing a huge amount of numbers, but it means understanding the technique of counting. In the same way we may represent our understanding of a language not as a general running over an entire data base, but the ability to follow basic rules which permits us to get what we want (or where we want).
Perhaps we do not strictly share the same meanings, as given in idealised stereotypical definitions; in communication we just converge towards some set of common inferences. And this kind of practice might be considered, at the same time, what is constitutive of stereotypes. But, even if we do not properly share meanings or stereotypes, there is, however, something we do share in communication: some basic strategies by which we converge towards common inferences. Such strategies could be exemplified by the ability to switch from a context to another, to import information from a context, to build up new contexts from a given one, to defer to experts' contexts in cases of uncertainty (and normally by default). While looking for a representation of meaning as something shared by all speakers of a language, we reach the tentative conclusion that what is shared are not exactly meanings, but strategies to find ourselves at home in any context.
I am grateful to Paolo Bouquet, Ernest Lepore, and Diego Marconi for comments on earlier versions of this paper. Part of the work has been done at the University of Rochester, with the kind help given by the Staff of the Department of Philosophy in letting me working there in 1994. An early version of the paper has been given at a Conference on Holism organized by Rosaria Egidi in Rome, 1994.
2 I give here the standard view given by Fodor and Lepore 1992 and subsequent essays; they distinguish different kinds of holism. I will refer in the paper to meaning holism and competence holism as two distinct problems (is definition of meanings dependent on the entire language? is our understanding dependent on the understanding of the entire language?). In the argument given by Fodor and Lepore there is a tendency to conflate the two problems. A more general definition of holism has been given by Dummett 1991 as the following thesis: the meaning of a sentence depends on its composition (the words of which it is composed and the order in which they are put together) and the knowledge of the entire language to which the sentence belongs. This definition also does not distinguish sharply between meaning holism and competence holism.
3 Fodor and Lepore 1992 and Lepore 1997. Obviously Fodor and Lepore speak of a general feeling in the environment of A.I. in the eighties, which might be well represented by Haugeland 1979: here four forms of holism are described and it is assumed that A.I. ssomehow demonstrated at least "commonsense holism". Commonsense holism is the claim that the whole of commonsense knowledge is relevant at each step of the interpretation of a sentence. Because of that, stuctures like frames and scripts are like an encyclopedia entry,with links to larger structures and cross references to other concepts, suggesting an holistic definition of the concepts which are supposed to be represented by these techniques. I will try to give a different interpretation of these kind of structures.
4 The negative consequences of holism had already been clearly described by Dummett 1973, to whom Fodor and Lepore refer. Dumett's criticism has, unfortunately, not been so widely known and Fodor and Lepore's book has the merit to make it widespread.
5 We could maintain that connectionism is the study of subsymbolic mechanisms and processes of the mind, while symbolic A.I. is the study of symbolic mechanisms of our culture or the study of cognitive systems that supervene the processing of individual minds (Smolensky 1988; Clark 1991). An argument for holism in dealing with psychological descriptions of mental processing is given in Block 1995. Block's result, if correct, is a strong evidence in favour of an holistic approach to the working of the mind; this approach seems to be devoted to studying the working of individual mental contents (narrow contents). I will not deal with such an approach, which could perhaps be accounted for in a connectionistic framework; I will discuss instead researches in A.I. whose aim is an idealized respresentation of cognitive contents as knowledge shared by limited agents.
6 There are strong suggestions for an argument from compositionality principle against holism. A case in that direction is given by Fodor & Lepore in their criticism to Block's "Advertisement for a semantics for psychology". See F&L pp.181-182. They apply their argument against the possibiity of a viable Conceptual Role semantics which desires to be holistic, but inevitably falls in some kind of analytic-synthetic distinction, which blocks holism. Also Dummett 1992 argues on the conflict between holism and compositionality (ch.10,§5).
7 It seems to me that this ambiguity depends on the distinction we spoke above (footnote 3) between two aspects of meaning that seems to lie at the bottom of the contrast between connectionist and symbolic systems: the problem of the individual mental content and its processing, and the problem of the objective use of words in a social setting. Winograd often insists upon linking artificial intelligence to the analysis of the mental, while I accept the suggestion given by Smolensky to understand symbolic A.I. as an analysis of the socially accepted symbolic systems. We may give up Winograd's metaphysics while retaining something from his technicl realizations. The appeal to holism in Winograd is probably linked to the contrast he make s between declarative and procedural systems: declarative systems are based on modularity (compositionality) and are atomistic; procedural systemas are based on interaction, therefore they seem to be more holistic (Winograd 1975, §3).
8 Intended in such a way, meaning could be easily represented in first order logic as a set of meaning postulates giving the default properties (following Hayes 1979); but it is apparent that problems are given by the differences between default values and other possible values. It seems therefore reasonable to look for new formalisms, from cirumsciption to probabilistic logic, as languages in which to express the values of the slots of a frame. Here there is the link between the frame systems and different kinds of default logics (see Frixione 1994 p.51).
9 As Foor and Lepore claim (F&L p.27), finding a "principled way to distinguish" propositions you have to believe in order to believe a proposition P from the ones you do not would break the argument from anatomism (molecularism) to holism. The problem is the meaning of "principled"; which princile are admissible? A finite machine has among its principle the need to perform procedures in finite time, therefore it has to distinguish between a small and relevant set of inferences and all the infinite inference that it is possible to derive from a sentence. Is that enough for a "principled" distinction?
10 As Marconi notes, part of the problem derives from Brachmann and Levesque's refusal to treat the content of the definitional box as generalizations expressed in first order language, in order not to be confused with the contents of the assertional box, which are given in first order language. Both Marconi's suggestion (treat the formulas in the definitional box as necessitated) and the general development of first order languages in symbolic artificial intelligence would easily take care of this technical aspect of the problem. The main problem is just the choice of what to put in the assertional vs. the definitional box.
11 We may refer here to the pragmatic stance taken by Brandom 1994, where he distinguisches "the properties governing correct use in which the concepts grasped by individual consist, on the one hand, from the dispositions to apply concepts, make inferences, and perform speech acts, in which an individual's grasping of a concept consists, on the other" (p.636). We might also think of the discussion by Dummett 1994 (ch.10,§1) on the degree of complexity: only sentences with lower degree of complexity that the ones to be defined may be considered pertaining to the basic definition of meaning.
12 I use the term here in a rather non-technical sense (I do not refer for instance to the work done by Fillmore). To find the beginning of the idea of isolated semantic field we need to go back to structuralism. Strucuturalism has also be considered by Fodor and Lepore as an example of holism. But this attribution is implausible: far from being holistic, structuralists were well aware that some terms are dependent on other terms only in reference to a specific "semantic field". Early examples made by Hiemslev and others were always devote to show how different languages divide reality in different ways, pointing always to particular subparts of the lexicon (e.g. terminology dealing with wood, with domestic animals, and so on. Linguistic values of lexical items were to be defined only relative to local opposition in respect of other lexical items , not with the entire lexicon. This has been clear since the terminology of "semantic value", "oppositive value", "semantic field" was introduced in the twenties and thirties by works of scholars such as Ferdinand de Saussure, Hiemslev and J.Trier.
13 A general point on the problema had already been given by Georges Rey (1983, p. 259). For Perry 1994, strong anatomism is expressed by: "x (EQ) (P is shared Æ Nec Q is shared); while molecularism requires: "x (P is shared Æ Nec (EQ) Q is shared). Marconi 1997 gives a different formulation. A discussion of the point where Fodor and Lepore suggest a misunderstanding of molecularism is made by Penco 1997.
back to Home Page