Linguistics Anonymous

24 January 2006

Of mathematical constructs and functional heads

With regards to the previous discussion of LISP, I think it would also be interesting to point out the similarities between modern functional head syntax and common infix mathematics. If you look at the structures (a + b) and [Spec Head Comp], one could draw a huge number of parallels between them. This could serve as an interesting case study of why (a + b) is the most common structure of math as well as a possible reanalysis of how linguistics is treating functional heads.

Math uses operators as a way to link terms, and in a sense, tell the 'user' how to combine the first term with the second (note that the first term may be on the right). Indeed, this is exactly how operators are used in some programming languages like Java, C, and Python. The compiler will first come across a single term, then the operand which tells the compiler what to do with the former and the later term.

Suppose we were to apply this same logic to syntax and functional heads. Then the purpose of the functional head would be to link parts of a clause and in a sense give meaning to the elements. Thus the AgtP [He Agt drive] could be interpreted as follows:

1) an element VP(drive) is found
2) we are going to add an Agt it
3) 'He' is found and is added as an Agt

This reanalysis, or restating, of the idea of functional heads will probably have some severe consequences all of which I have not yet worked out(obviously), but here are a few interesting ones:

a) everything must be added to a clause(tree) structure by way of a functional head, ie: there are no lexical heads.
b) the lexicon may not include part of speach features since any lexeme could be merged into the Spec and the cases which are not parsable could be worked out by semantics. Think of how easy it is to crate new verbs from nouns: "I will phone you later" This could be simply stated as a 'noun' merging with a V functional head. In most instances however, semantics doesn't 'allow' it to parse, but it seems very productive: "I usually ipod on the way to class"(just made this up) meaning: "I usually listen to my ipod on my way to class"
c) prepositions are phonetically realized functional heads - is this the same as lexical heads? not really sure yet...

I really don't know how much of what I just said is actually logical or anything since I have really only begun to thinking about this idea or to see if research has already been done.

-R
/woo! first post

17 January 2006

linguists should learn LISP, pt. 1

this will be a multi-part post, which will grow/be revised/deemed less important the more i learn about the topic.

LISP stands for LISt Processesing and is a functional programming language invented by John McCarthy (not the linguist, the computer scientist - coincidence in names noted) in the 1950's and is the second-oldest programming language still in use today (only FORTRAN is older). the langauge is well known for its bare-bones structure and emphasis on programming in which efficiency is not the main concern (as opposed to accuracy, minimality, etc.). LISP makes complex notions such as recursion and abstraction conceptually easy to implement by way of its structure...

consider the following expression in LISP computing the sum of two integers:
(1) (+ 4 5)

fed into an interpreter, this will, of course, produce the response 9. the interesting part about this is the structure of the expression: there is a prefixed primitive operator and the operands fed to it as arguments. more on why this is interesting after some more examples from the language.

primitive data in LISP comes in two forms, atoms and lists. an atom is any discrete element which evaluates: with no abstraction, this will be numbers and character strings of reasonable length: 3, 56, c3, a, etc. a list is a an element recursively constructed by concatenation of atoms: (3, 4, 6, 8) , (f, matt, 98, john), etc. there are no other kinds of data important to us now (there is another, dated, type, the dotted pair. possibly more on this in another post).

expressions such as (1) can be themselves made up of further expressions, for instance, (2):
(2) (* 9 (+4 5 (-6 4)) (+3 5))

which we could represent more schematically as a tree structure with each tree containing branches for each of the elements in the expression/list:

each node on the tree contains at the left the operator for the expression and the subsequent nodes (which may branch) contain the operands. optionally (because implemented LISP has no notion of trees, this is a loose 'optionally'), the values of the non-terminal nodes may be said to percolate their values upward - shown in fig. 1 by the labeled nodes. finally, terminal nodes are atoms in LISP (or at the current level of abstraction, in the case that other elements have been defined from primitives).

i hope at this point you can see where i am going, but allow me to continue with some preliminary LISP notions.

abstraction in LISP is accomplished in a variety of different ways based on implementation, but let us consider here that of SCHEME. an example will begin the discusison:

(3) (DEFINE SQUARE ((lambda (x))(* x x))

this expression makes reference to DEFINE as a primitive which abstracts. the new element is named SQUARE and is said to be the function that maps from x to the square of x (or x times itself - i apologize for those who might not know any lambda calculus - see Heim and Kratzer's Semantics in Generative Grammar for a good introduction for linguists). note again that we can draw this expression as a tree:

so now we know enough about LISP to begin to see some almost startling parallels.

first, take the notion of recursitivity. it is a well known-fact that langauge is recursive and any grammar which accounts for language must account for this fact - do not fear: i will not be proposing that LISP is that grammar. LISP does, however, capture the notion of recursivity in the same way in which it is captured in the nebulous region between syntax and semantics: atoms are combined to form syntactic constituents which can then be manipulated as atoms again, ad infinitum - just like in LISP, the computational power far exceeds the ability of the human mind to parse, as the computational power is theoretically infinite. we could easily take, using LISP to model, for example, syntactic computation, syntactic features as atoms and the syntactic primitives (words, functional heads) they produce by concatenation (or combination) as lists. elements of these lists are then available as individual entities if necessary (in LISP functions usually packaged with implementations or easily written that operate on lists such as FIRST and LAST can produce these results), which would allow us to model feature checking or valuing easily.

with expressions, again, we can see an interesting parallel: expressions have their operator prefixed to the left which carries the semantic or syntactic weight of the expression and operates on the operands. theoretically, the prefix notation is optional, but if we were to keep it, we could easily see LISP as an implementation of Kayne's antisymmetry and/or Bowers' Grammatical Relations theory (the latter is forthcoming). this is a digression, however, as the major point is this: one element (a head) carries the burden of most of the value of the node and is responsible for percolation - (* 5 5) evaluates to the product of 5 and 5, and (V NP) evaluates to VP - after percolation, the element can be operated upon as a whole (XP movement) or as individual items (Head Movement).

moving on, let's take a moment to look at data abstraction. in LISP, we can name variables to hold set data:

(4) (DEFINE a (* 5 5))

here the symbol a is defined as a certain value. this operation is directly parallel to Chomsky's latest formulation about Merge being an operation which forms a list of two elements and produces a label:

(5) MERGE(a, b) = LB(a){a b}

all that is left to be explained in the analogy (and actually in Chomsky's formulation as well) is how LISP can decide which element produces the label. we could even take a preliminary stab at a LISP representation of Merge:

(6) (DEFINE MERGE ((lambda(a))(lambda(b))(LB(a)(a b))))

all that would be left, then, is a proper definition of LB.

one final point in this disconnected series of parallels. when we look at figure 2, we can immediately see that the node immediately dominating (* x x) would not actually be a valid expression in LISP without some relevant abstraction - we need a binder such as (lambda(x)) or another previous instance of DEFINE to make the expression evaluate. this is analogous to principle a of the binding theory in syntax - anaphors (syntactic representations of unbound variables) must be bound by their antecedent or by a wh-phrase (the syntactic representation of a (lambda(x))). this binding then forces us to make reference to the notion of c-command, which we can see in the LISP expression as well! the lambda operator's mother node (the full lambda expression) must c-command the well-formed formula (* x x) in order for the complex to be a valid expression.

okay, so that's a lot of garbage, but what does it all amount to? is LISP the way to express the universal component of syntax (and possibly semantics?) - probably not, though it does allow us to express directly the conceptual basis for the recent minimalist developments in syntax. however, all the usual questions will remain about how much idealization of human language computation is appropriate before descriptive adequacy is lost.

it may simply be that LISP views data and process in a way similar to the human language faculty (again, a very loose notion of 'similar' is needed here), however even this point should not be understated. we should take note of the power and central notions of recursivity and simplicity in LISP and ask ourselves how this might be instantiated in universal grammar and what it says more generally about natural computational processes.

well, this article has either blithered too long or raised more questions than it answered, but one thing is definitely for certain - in viewing analysis of syntactic phenomena (and probably semantic as well) in natural language, it certainly cannot hurt to have the mental flexibility provided by a background in LISP, since the langauge directly captures notions central to human language: recusivity, simplicity (minimality, in whatever sense is most relevant), and abstraction. therefore, one might say that linguists should learn LISP.

03 January 2006

reflexives in arabic verbal forms.

the arabic verbal system is famous for its multiple forms based upon a common triliteral root linking each of the discrete verbs to a common semantic base (mccarthy calls these binyanim in his prosodic theory of nonconcatenative morphology. i will call them "forms"). the varying forms are created by affixing segments to the root - typically by infixation. there are many forms, but three are under consideration here: forms V, VI, and VIII, the reflexives of forms II, III, and I, respectively.

(actually, there is quite a bit of debate about the semantic relationships between the various verbal forms. classical arabic scholars identified V as the reflexive of II, VI as the reflexive of III, and VIII as the reflexive of I. this relationship seems to hold in classical arabic (CA), but it is not always the case even in modern standard arabic, the current literary descendent of CA. the picture is further complicated when looking at the spoken dialects - most of the standard semantic relationships have broken down and are not as clear-cut as the classical analysis. all data concerning this matter presented here is from Younes, Munther. 2000. "Redundancy and Productivity in Palestinian Arabic Verb Derivation." in Proceedings of the Third Conference of AIDA. Manwel Misfud, ed.)

each of the three forms under consideration here contain a common affix /ta/, identified by mccarthy and most of the standard literature as the reflexive morpheme in these forms. the intersting part about these data is that in forms V and VI the morpheme is prefixed to the verb:

root I V VI
ktb katab takattab takaatab
Drb Darab taDarrab taDaarab

in form VIII, however, the morpheme is infixed between the first and second root consonants (along with a prefixed /i-/ and a deletion of the first interconsonantal vowel):

root VIII
ktb iktatab
Drb iDtarab

i feel, contrary to the standard literature, that the form VIII morpheme in arabic is, or is in the process of becoming in modern dialects, a different morpheme than the reflexive which appears in forms V and VI. the first major reason for believing this comes from the semantic relationships outlined above. in modern arabic, form VIII has been lexicalized and rarely ever has a meaning classifiable as a reflexive of I. instead, form VII is beginning to take on the role of passive of I.

morphophonological data supports this analysis. we would expect identical morphemes to behave similarly with respect to the neutrality of their application across different environments (or at least vary in a predictable way based on the phonology of the language). however, with the form VIII /ta/, however, this is not the case. form VIII /ta/ participates in three kinds of phonological processes not exhibited by /ta/ in the V/VI environments: emphasis, voicing, and semivowel assimilation - in form VIII, the initial consonant of the morpheme assimilates to the first root consonant emphasis features ([RTR]), voicing features ([voice]), and semivowel features ([continuant]). any examination of the /ta/ morpheme specifically and the /t/ phoneme more generally in other environments shows that this is not standard to the phonology of arabic. this only occurs in form VIII.

if the VIII /ta/ is different, then what exactly is happening? younes (op. cit.) points out that this may actually be an example of a larger-level phonological process in that the phonological structure of the root actually seems to be determining which form is the reflexive of form I: VIII in roots with {n, l, r} as the first consonant, VII elsewhere. this seemingly bizarre fact can be explained quite easily if, contrary to the standard analysis, VIII is not seen as the reflexive of I. this leaves us with the following standard/reflexive alternations:

standard reflexive
I ~ VII/VIII
II ~ V
III ~ VI

as far as further research is concerned, there is one avenue which i think needs exploring: when presented with nunce words, do speakers of arabic generate the reflexive of a form I according to the phonological environment conditioning as specified in younes? if not, this fact would have to be explained and would possibly call into question this analysis, but if so, it would lend even more support for a reanalysis of form VIII reflexvies and their morphemes in arabic.

02 January 2006

partial agreement in arabic clauses.

this is my thesis topic, probably, so it is obviously incomplete as a blog post.

in modern standard arabic and the modern spoken dialects, the main verb in arabic clauses exhibits different agreement patterns based upon its position relative to the subject. the standard clausal structure is VSO in arabic, with the postverbal subject triggering only partial agreement on the verb: person and gender, but not number. in modern standard arabic, number agreement on a presubject verb is ungrammatical. postsubject verbs, however (in SVO clauses), trigger full agreement on the verb, and no other kind of agreement is grammatical.

the situation gets more complicated than this, however. an auxiliary in a clause in arabic forces AuxSVO word order. as we would predict, the auxiliary agrees partially with the subject, whereas the main verb agrees fully. the problem for an easy account of this data comes when one looks beyond the standard written arabic to the spoken dialects. in many of these dialects, VSO clauses are allowed to agree fully, unlike in modern standard arabic.

thus far, a few different accounts been posited for this data. one of the most commonly referenced involves an incorporation account of the agreement: full or partial agreement on the verb involves incorporation of a phonetically null pronoun into the verb as it raises to the T position (or when the V+T complex raises to I). the major problem with this account, for me, is that it doesn't seem any more explanatory than stipulative accounts of the data: what exactly is it which governs the insertion of this pronoun? if the pronoun account is correct, what is the feature value of that pronoun in standard dialects of arabic? is it different for each of the different dialects of arabic?

another major account has been to see agreement as different under government (which would hold in VSO clauses) than under spec-head relationship which dictates standard agreement. however, i would like to avoid this answer, however, as a notion of government is something i would like to avoid, working in the minimalist framework.

finally, minimalist accounts of this data do exist, making use of the strength of the features on the Agr head which controls the position of the verb. again, however, modern research has moved away from the Agr head as a result of the work that came after the explosion of Infl. the question is, then: is there a way to account for this data under the current state of the minimalist framework?

it is obvious that the solution will have one characteristic: the separation of Move from Agree advanced in chomsky's probe-goal theory. movement to the clause-initial position should not be for agreement purposes but for EPP feature checking. given this, we should look immediately to the Tense functional head - it is the only head which exists locally that might be able to be used to capture this data.

and this is the current state of my research: i think it is possible to capture this data by postulating a correspondence between the presence of the number feature on the T head and the presence/absence of EPP on that same head. if the number feature on the T head is present, then EPP should be present on that head as well, leading to the predicted SVO word order and agreement. VSO agreement, then, would be the product of no number feature or EPP feature on the T head. the agreement patterns are captured by a selection of one of two different lexical T heads. differences in the dialects could be accounted for by postulating different lexical T heads for those dialects.

and that's all i have, for now. more research needs to be done, however, as there is one major piece of data this solution does not account for: pronouns. in arabic, postverbal pronouns trigger full agreement...

introduction.

welcome to linguistics anonymous, a collection of semi to fully-minimalist ramblings about various topics in current linguistics research. both of the authors are undergraduates at cornell university, and their intellectual interests include, but are not limited to: japanese linguistics, arabic linguistics, minimalist syntax, morphosyntax, optimality theory, and the syntax-sematincs interface.

check back soon for posts with actual content.