[Logo]

Link Grammar Parser

by Davy Temperley, John Lafferty and Daniel Sleator
(this variant maintained by Dom Lachowicz - <domlachowicz@gmail.com> and Linas Vepstas - <linasvepstas@gmail.com> )

News

December, 2008: link-grammar 4.4.1 released! This includes an important security fix; anyone using versions 4.2.4 or earlier are advised to upgrade.

What is the Link Grammar?

The Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of a set of labeled links connecting pairs of words. The parser also produces a "constituent" (Penn tree-bank style phrase tree) representation of a sentence (showing noun phrases, verb phrases, etc.).

Did the AbiWord team write Link Grammar?

In large part, no. The project is the brainchild of Davy Temperley, John Lafferty and Daniel Sleator, all university professors. It is the product of a decade of academic research into grammar, and is founded on a theory backed by numerous publications. Its canonical homepage is hosted by Carnegie Mellon University.

So, then what is it doing @ AbiSource.com?

The AbiWord team had a concrete need - to integrate a grammar checking feature into AbiWord. The best choice, they felt, was to build upon Temperley et. al.'s successful Link Grammar project.

However, in order for the link-grammar project to be useful to them and to the greater Free Software world, the AbiWord community felt that a variety of changes to the project would be necessary. While they did have success (a few years ago) convincing the authors to release Link Grammar under a GPL-compatible license, there was no practical way to continue project development and maintenance at the CMU website. So the AbiWord community took it under its wing and has nurtured the project since.

Notable changes from the upstream Link Grammar package include:

  • Actively maintained.
  • Portability fixes to non-Linux platforms (i.e. Windows).
  • Java bindings.
  • Support for UTF8 Unicode, and languages other than English.
  • A variety of other bug fixes to both the source code, and the dictionaries.
  • A more standard, portable build system, making packagers' lives easier.
  • Convenience features for integrators, such as a simplified API, pkg-config integration, dynamic/shared library support.

Downloading Link Grammar

The system can be downloaded either as a tarball, or via SVN. The current stable version is Link Grammar 4.4.1 (December, 2008). Older versions are available here.

Unstable, working versions are available through AbiWord's SVN repository. Anonymous read-only access is available by issuing the command:

svn co http://svn.abisource.com/link-grammar/trunk link-grammar

General instructions for AbiWord's anonymous SVN can be found here.

The Link Grammar source can be browsed online here.

Documentation

There is an extensive set of pages documenting the dictionary; specifically, the names of links and thier meanings, as well as how to write new rules. The documentation for the programming API is here.

Mailing Lists

The current list for Link Grammar discussion is at the link-grammar google group.

Subscribe to link-grammar:

Enter email:

Bug Tracker

Bug reports, patches, RFEs, etc. are gladly welcomed.

Disclaimer

Link grammar is a natural language parser, not an artificial intelligence. This means that there are many sentences that it cannot parse correctly, and many others for which it generates multiple parses. There are also entire classes of speech that it cannot parse, such as Valley-girl speak. Link grammar does best on "newspaper English": medium-length sentences written with good grammar, proper punctuation, and proper capitalization. It don't do 733t speek, etc. In particular, it has problems with the following "registers" and types of writing:
  • Phrases (that are not a part of a complete sentence)
  • Bulleted lists, such as this.
  • Quotations within sentences (and parenthetical remarks) These can be handled by an appropriate front-end, that separates out the quotations from the rest of the text.
  • Slang speech, words, like 733t warez d00dz, although it can certainly guess from context if the slang is sufficiently grammatical.
  • Long run-on sentences. These can generate thousands of alternative parses in a combinatorial explosion.
  • Certain "registers", such as newspaper headlines; for example, "Thieves rob bank."
In addition, it has a variety of "bugs": it currently has trouble with "if...then..." constructs, compound queries ("who did it, and why?"), lists, "...not only...but also..." constructs, certain types of idiomatic phrases, certain types of "institutional utterances", and so on. The goal of the project is to eventually fix all of these cases; progress is ongoing.

Recent Changes

Version 4.4.1 (15 December 2008) includes the following changes:

  • Balance the dictionary tree; this speeds word-lookup slightly.
  • New MSVC6 build files from Evgenii Philippov.
  • Fix java server classes to pass along the link-grammar version number.

Version 4.4.0 (7 December 2008) includes the following changes:

  • fix: recognize curly-single-quote ’ where straight quote can be used.
  • recognize and explicitly ignore emoticon types.
  • Include MSVC6 build files.
  • Apply patch needed for Ruby bindings.
  • fix: "Where did they come from?", per Viswanath IIIT
  • fix: "Where did they go to?"
  • fix: "It gives me peace of mind."
  • fix: many, many incorrectly identified mass nouns.
  • fix: ladle.v "molten hot" "piping hot"
  • fix: "It's a shame that...", "The crux of the plan is that..."
  • Performance improvements (about 11%) to prunce.c from Bruce Wilcox
  • fix: "He eats with me nightly."
  • Add new public api function: linkgrammar_get_version()
  • MSVC9 build files from Borislav Iordanov
  • Java network-efficient client-server classes from Borislav Iordanov

Version 4.3.9 (8 October 2008) includes the following changes:

  • Issue 13: "John is altogether amazingly quick."
  • Nonstandard spelling "unequivocably"
  • Dictionary fixes for 'marginally', etc. "That one is marginally better"
  • Issue 7: Dictionary fixes for 'done': "I am done working"
  • dictionary entries for walk-up drivethru car-wash
  • dictionary: "I am through being mad", "It was a through flight", etc.
  • Issue 11: "You are doing well"
  • Issue 3: "I asked Jim a question", "I told Jim a story"
  • Passive subjects with objects: "I was told that crap, too" "...was asked..."
  • Fixes for Apple Mac OSX (crash on non-executable stack)
  • Early version of Filip Maric's boolean SAT solver
  • fix: "He talked quietly of revolt."
  • fix: "It consists mostly of sand.", "He talks, mostly of revolution."
  • fix: "He talked mostly to Ann.", "He talks a lot."
  • fix: than_usual: "He is taking longer than usual."
  • fix: a batch of new verbs from Roman Khlupin
  • fix: Fix crash on Apple Macintosh by correctly identifying the platform.
  • fix: "San Gabriel" "Block Island" "Great Southern Bank" "de la Rente"
  • fix: "I biked Johnson Creek."

A summary of older changes can be found here.

Adjunct Projects

RelEx Semantic Relation Extractor
RelEx is an English-language semantic relationship extractor, built on the Carnegie-Mellon link parser. It can identify subject, object, indirect object and many other relationships between words in a sentence. It will also provide part-of-speech tagging, noun-number tagging, verb tense tagging, gender tagging, and so on. Relex includes a basic implementation of the Hobbs anaphora (pronoun) resolution algorithm. Optionally, it can use GATE for entity detection.
Perl bindings
A perl module was written by Dan Brian. [Download] [Documentation (mirror)]. See also a tutorial. Note that the perl bindings were developed against an older version of the link parser.
Ocaml bindings
OCaml interface to Link Grammar
Ruby bindings
There are two different packages providing Ruby bindings: Ruby Link Grammar, which is up-to-date and currently maintained, and Link Grammar 4 Ruby, which is wildly out-of-date (its for version 4.2.2) and is unmaintained. You only need one!
Persian dictionaries
Persian dictionaries, by Jon Dehdari. These require the Persian stemming engine, as significant morphology analysis needs to be performed to parse Persian.
Arabic dictionaries
Arabic dictionaries, by Jon Dehdari. [download] These require the Aramorph stemming package, which is included.
Russian parser
Located at http://slashzone.ru/parser/. By Sergey Protasov. Russian morpheme dictionaries can be had at http://aot.ru.
English dictionary extensions
LinkGrammar-WN is a lexicon expansion for the English language Link Grammar Parser. This project adds 14K new words to the dictionaries. The extended lexicon is provided under the GPL license, and thus cannot be merged back into the current project.
Medical terms
Extending the Link Grammar Parser's lexicon from UMLS' Specialist lexicon -- adds many medical terms. All but the six largest of these dictionaries have now been merged into version 4.3.1. The large dictionaries EXTRA.2, EXTRA.3, EXTRA.8, EXTRA.9, EXTRA.12, and EXTRA.17 have not been merged. These dictionaries contain 180K assorted medical, biological and biochemical terms and phrases.

Of related interest

Genia tagger
The Genia tagger is useful for named entity extraction.

Recent Applications and Publications

Some recent uses and applications of the Link Grammar Parser are shown below. There is also an older bibliography on the CMU website (mirror) referencing several dozen papers pertaining to the Link Grammar Parser.

Some miscellaneous facts:

  • Any categorical grammar can be easily converted to a link grammar; see section 6 of Daniel Sleator and Davy Temperley. 1993. "Parsing English with a Link Grammar." Third International Workshop on Parsing Technologies.
  • Link grammars can be learned by performing a statistical analysis on a large corpus: see John Lafferty, Daniel Sleator, and Davy Temperley. 1992. "Grammatical Trigrams: A Probabilistic Model of Link Grammar." Proceedings of the AAAI Conference on Probabilistic Approaches to Natural Language, October, 1992. See also the P. Szolovits paper above.

License

The Link Grammar license is essentially the BSD license. A copy of this license can be found below, and at the original author's CMU site

Copyright (c) 2003-2004 Daniel Sleator, David Temperley, and John Lafferty. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. The names "Link Grammar" and "Link Parser" must not be used to endorse or promote products derived from this software without prior written permission. To obtain permission, contact sleator@cs.cmu.edu

THIS SOFTWARE IS PROVIDED BY DANIEL SLEATOR, DAVID TEMPERLEY, JOHN LAFFERTY AND OTHER CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.