Question Answering - staffwww.dcs.shef.ac.uk

Question Answering - staffwww.dcs.shef.ac.uk

Question Answering Available from: http://www.dcs.shef.ac.uk/~mark/phd/work/index.html Mark A. Greenwood MEng Overview What is Question Answering? Approaching Question Answering A brief history of Question Answering Question Answering at the Text REtrieval Conferences (TREC)

Top performing systems. Progress to date The direction of future work What is Question Answering? What is Question Answering? The main aim of QA is to present the user with a short answer to a question rather than a list of possibly relevant documents. As it become more and more difficult to find answers on the WWW using standard search engines, question answering technology will become increasingly important. Answering questions using the web is already enough of a problem for it to appear in fiction (Marshall, 2002):

I like the Internet. Really, I do. Any time I need a piece of shareware or I want to find out the weather in Bogota Im the first guy to get the modem humming. But as a source of information, it sucks. You got a billion pieces of data, struggling to be heard and seen and downloaded, and anything I want to know seems to get trampled underfoot in the crowd. Approaching Question Answering Approaching Question Answering Question answering can be approached from one of two existing NLP research areas: Information Retrieval: QA can be viewed as short passage retrieval.

Information Extraction: QA can be viewed as opendomain information extraction. Question answering can also be approached from the perspective of machine learning (see Soubbotin 2001) A Brief History of Question Answering A Brief History Of Question answering is not a new research area as Simmons (1965) reviews no less than fifteen English Language QA systems. Question answering systems can be found in many areas of NLP research, including:

Natural language database systems Dialog systems Reading comprehension systems Open domain question answering Natural Language Database Systems These systems work by analysing the question to produce a database query. For example: List the authors who have written books about business

Would generate the following database query (using Microsoft English Query): SELECT firstname, lastname FROM authors, titleauthor, titles WHERE authors.id = titleauthor.authors_id AND titleauthor.title_id = titles.id These are some of the oldest examples of question answering systems. Early systems such as BASEBALL and LUNAR were sophisticated, even by modern standards (see Green et al. 1961 and Woods 1973). Dialog Systems

By definition dialog systems have to include the ability to answer questions if for no other reason than to confirm user input. Systems such as SHRDLU were limited to working in a small domain (Winograd, 1972) and they still had no real understanding of what they are discussing. This is still an active research area (including work in our own research group).

Reading Comprehension Systems Reading comprehension tests are frequently used to test the reading level of school children. Researchers realised that these tests could be used to test the language understanding of computer systems. One of the earliest systems designed to answer reading comprehension tests was QUALM (see Lehnert, 1977) Reading Comprehension Systems How Maple Syrup is Made Maple syrup comes from sugar maple trees. At one time, maple syrup was used to make sugar. This is why the tree is called a "sugar" maple

tree. Sugar maple trees make sap. Farmers collect the sap. The best time to collect sap is in February and March. The nights must be cold and the days warm. The farmer drills a few small holes in each tree. He puts a spout in each hole. Then he hangs a bucket on the end of each spout. The bucket has a cover to keep rain and snow out. The sap drips into the bucket. About 10 gallons of sap come from each hole. Who collects maple sap? (Farmers) What does the farmer hang from a spout? (A bucket) When is sap collected? (February and March)

Where does the maple sap come from? (Sugar maple trees) Why is the bucket covered? (to keep rain and snow out) Reading Comprehension Systems Modern systems such as Quarc and Deep Read (see Riloff et al. 2000 and Hirschman et al. 1999) claim results of between 30% and 40% on these tests. These systems, however, only select the sentence which best answers the question rather than just the answer. These results are very respectable when you consider the fact that each question is answered from a small piece of text, in which the answer is only likely to occur once. Both of these systems use a set of pattern matching rules augmented with one or more natural language techniques.

Open Domain Question Answering In open domain question answering there are no restrictions on the scope of the questions which a user can ask. For this reason most open domain systems use large text collections from which they attempt to extract a relevant answer. In recent years the World Wide Web has become a popular choice of text collection for these systems, although using such a large collection can have its own problems. Question Answering at the Text REtrieval Conferences (TREC)

Question Answering at TREC Question answering at TREC consists of answering a set of 500 fact based questions, such as: When was Mozart born?. For the first three years systems were allowed to return 5 ranked answers to each question. From 2002 the systems are only allowed to return a single exact answer and the notion of confidence has been introduced. The TREC Document Collection The current collection uses news articles from the following sources: AP newswire, 1998-2000 New York Times newswire, 1998-2000

Xinhua News Agency newswire, 1996-2000 In total there are 1,033,461 documents in the collection. Clearly this is too much text to process using advanced NLP techniques so the systems usually consist of an initial information retrieval phase followed by more advanced processing. The Performance of TREC Systems The main task has been made more difficult each year: Each year the questions used have been select to better reflect the real world. The questions are no longer guaranteed to have a correct

answer within the collection. Only one exact answer instead of five ranked answers Even though the task has become harder yearon-year, the systems have also been improved by the competing research groups. Hence, the best and average systems perform roughly the same each year The Performance of TREC Systems 0.80 0.70 0.60

MRR Score 0.50 Best System Worst System 0.40 Mean System Sheffield's Best Sheffield's Worst 0.30 0.20

0.10 0.00 8 9 TREC 10 Top Performing Systems Top Performing Systems For the first few years of the TREC evaluations the best performing systems were those using a

vast array of NLP techniques (see Harabagiu et al, 2000) Currently the best performing systems at TREC can answer approximately 70% of the questions (see Soubbotin, 2001). These systems are relatively simply: They make use of a large collection of surface matching patterns They do not make use of NLP techniques such as syntactic and semantic parsing Top Performing Systems These systems use a large collection of questions and corresponding answers along with a text collection (usually the web) to generate a large number of surface matching patterns.

For example questions such as When was Mozart born? generate a list of patterns similar to: ( - ) was born on , born in , was born These patterns are then used to answer unseen questions of the same type with a high degree of accuracy.

Progress to Date Progress to Date Most of the work undertaken this year has been to improve the existing QA system for entry to TREC 2002. This has included developing a few ideas of which the following were beneficial: Combining Semantically Similar Answers Increasing the ontology by incorporating WordNet Boosting Performance Through Answer Redundancy Combining Semantically Similar Answers

Often the system will propose two semantically similar answers, these can be grouped into two categories: 1. The answer strings are identical. 2. The answers are similar but the strings are not identical, i.e. Australia and Western Australia. The first group are easy to combine as a simple string comparison will show they are the same. The second group are harder to deal with and the approach taken is similar to that used in

Brill et al. (2001). Combining Semantically Similar Answers The test to see if two answers, A and B, are similar is: If the stem of every non-stopword in A matches the stem of a non-stopword in B then they are similar or vice versa. As well as allowing multiple similar answers to be combined, a useful side-effect is the expansion and clarification of some answer strings, for example: Armstrong becomes Neil A. Armstrong Davis becomes Eric Davis This method is not 100% accurate as two answers which appear similar, based on the above test, may in fact be

different when viewed against the question. Incorporating WordNet The semantic similarity between a possible answer and the question variable (i.e. what we are looking for) was computed as the reciprocal of the distance between the two corresponding entities in the ontology. The ontology is relatively small and so often there is no path between two entities and so they are not deemed to be similar (i.e. neither house nor abode are in the ontology but they are clearly similar in meaning). The solution to this was to use WordNet (Miller, 1995) and specifically the Leacock and Chodorow semantic similarity measure (1998). Incorporating WordNet

entity act entity activity object object life form substance

animal person card game substance chordate victim fish4 food

food foodstuff vertebrate fish1 aquatic vertebrate fish2 fish3 This measure uses the hypernym ( is a kind of ) relationships in

WordNet to construct a path between two entities. For example these hypernym relationships for fish and food are present in WordNet Incorporating WordNet We then work out all the paths between entity fish and food using the generated hypernym trees. It turns out there are object three distinct paths. The shortest path is between the first definition of fish and the definition of food,substance as shown. The Leacock-Chodorow similarity is

food calculated as: d Semantic Similarity ln 32 Which would give a score of 2.37. However, to match our existing measure we use just the reciprocal of the distance, i.e. 1/3. entity object substance

food foodstuff fish1 Using Answer Redundancy Numerous groups have reported that the more instances of an answer there are the more likely it is that a system will find the answer (see Light et al, 2001).

There are two ways of boosting the number of answer instances: 1. Use more documents (from the same source) 2. Use documents from more than one source (usually the web) Using Answer Redundancy Our approach was to use documents from both the TREC collection and the WWW using Google as the IR engine, an approach also taken by Brill et al (2001), although their system differs from ours in the way they make use of the extra document collection. We used only the snippets returned by Google for the top ten documents not the full documents themselves.

The system produces two lists of possible answers one for each collection. Using Answer Redundancy The two answer lists are combined, making sure that each answer still references a document in the TREC collection. So if an answer appears only in the list from Google then it is discarded as there is no TREC document to link it to. The results of this Collection MRR Not Found (%) approach are small TREC 0.256 68 (68%)

but worth the extra Google 0.227 68 (68%) effort. Combined 0.285 65 (65%) Question Answering over the Web Lots of groups have tried to do this (see Bucholz, 2001 and Kwok et al, 2001). Some use full web pages, others use just the snippets of text returned by a web search engine (usually Google). Our implementation, known as AskGoogle,

uses just the top ten snippets of text returned by Google. The method of question answering is then the same as in the normal QA system. Future Work Future Work As previously mentioned, quite a few groups have had great success with very simple pattern matching systems. These systems currently use no advanced NLP techniques.

The intention is to implement a simple pattern matching system, and then augment it with NLP techniques such as: Named Entity Tagging Anaphora Resolution Syntactic and Semantic Parsing Any Questions? Thank you for listening.

Bibliography E. Brill, J. Lin, M. Banko, S. Dumais and A. Ng. Data-Intensive Question Answering. Proceedings of the Tenth Text REtrieval Conference (TREC 2001). S. Bucholz and W. Daelemans. Complex Answers: A Case Study using a WWW Question Answering System. Journal of Natural Language Engineering, Vol. 7, No. 4 (2001). B. F. Green, A. K. Wolf, C. Chomsky and K. Laughery. BASEBALL: An Automatic Question Answerer. In Proceedings of the Western Joint Computer Conference 19, pages 219-224 (1961). S. Harabagiu, D. Moldovan, M. Paca, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Grju, V.Rus and P. Morrescu. FALCON: Boosting Knowledge for Answer Engines. The Ninth Text REtrieval Conference (TREC 9), 2000. L. Hirschman, M. Light, E. Breck and J. Burger. Deep Read: A Reading Comprehension System. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 1999. C. Leacock and M. Chodorow. Combining Local Context and WordNet Similarity for Word Sense Identification. In C. Fellbaum, editor, WordNet: An Electronic Lexical Database, chapter 11, pages 265-284. MIT Press, 1998. W. Lehnert. A Conceptual Theory of Question Answering. Proceedings of the Fifth International Joint Conference on Artificial Intelligence, pages 158-164, 1977. D. Lin and P. Pantel. Discovery of Inference Rules for Question Answering. Journal of Natural Language Engineering, Vol. 7, No. 4 (2001).

C. Kwok, O. Etzioni and D. Weld. Scaling Question Answering to the Web. ACM Transactions in Information Systems, Vol 19, No. 3, July 2001, pages 242-262. M. Light, G. Mann, E. Riloff and E. Breck. Analyses for Elucidating Current Question Answering Technology. Journal of Natural Language Engineering, Vol. 7, No. 4 (2001). M. Marshall. The Straw Men. HarperCollins Publishers, 2002. G. A. Miller. WordNet: A Lexical Database. Communication of the ACM, vol 38: No11, pages 39-41, November 1995. E. Riloff, and M. Thelen. A Rule-based Question Answering System for Reading Comprehension Tests. ANLP/NAACL-2000 Workshop on Reading Comprehension Tests as Evaluation for Computer-Based Language Understanding Systems R. F. Simmons. Answering English Questions by Computer: A Survey. Communications of the ACM, 8(1):53-70 (1965). M. M. Soubbotin. Patterns of Potential Answer Expressions as Clues to the Right Answers. Proceedings of the Tenth Text REtrieval Conference (TREC 2001). J. Weizenbaum. ELIZA A Computer Program for the Study of Natural Language Communication Between Man and Machine. Communications of the ACM, 9, pages 36-45, 1966. T. Winograd. Understanding Natural Language. Academic Press, New York, 1972. W. Woods. Progress in Natural Language Understanding An Application to Lunar Geology. In AFIPS Conference Proceedings, volume 42, pages 441-450 (1973).

Recently Viewed Presentations

  • Bring the Principal!

    Bring the Principal!

    The Impact of Leadership on Student Outcomes: An Analysis of the Differential Effects of Leadership Types Viviane M. J. Robinson Claire A. Lloyd Kenneth J. Rowe Vol. 44, No. 5 (December 2008) 635-674. This article discusses a study that was...
  • Neuron Structure and Function

    Neuron Structure and Function

    This is due to leak channels So the resting potential is closest to the Nernst potential for K+ Also have leakage of Na+ (PNa) and Cl- (PCl) Actual measurements of membrane potential Measured in giant axon of squid Found resting...
  • Revisiting our practices: - PhET Interactive Simulations

    Revisiting our practices: - PhET Interactive Simulations

    Phet-based activities database on website--Trish Loeblein show website, sim list, balloons and sweater, moving man, elctromag run phet sims (all free!): directly from web (regular browser, platform independent) download whole website to local computer for offline use 2006-- 1 Million...
  • The Good Friendship: David and Jonathan

    The Good Friendship: David and Jonathan

    The friendship begins Jonathan and David became good friends after David defeated Goliath. Even before hand, Jonathan prayed that the Lord would help David be victorious. Even after the death of Goliath, Jonathan stood by his father and David. Jonathan...
  • The Learnability of Quantum States

    The Learnability of Quantum States

    New Computational Insights from Quantum Optics Scott Aaronson Based on joint work with Alex Arkhipov * * * * * * * * * * * * The Extended Church-Turing Thesis (ECT) Everything feasibly computable in the physical world is...
  • Initiatives: What will we do to achieve success?

    Initiatives: What will we do to achieve success?

    Balanced Literacy Framework. Balanced Math Framework. Train a cohort of teachers in the math Standards Mastery Framework (SMF) who will redeliver content to the whole faculty. Increase the percentage of students scoring in the low risk or above average performance...
  • Kingdoms and states of Africa - EUPSchools

    Kingdoms and states of Africa - EUPSchools

    The Kingdom of Ghana. First great trading state in West Africa. Upper . Niger River . valley. Most were farmers who lived under a local ruler. Villages. formed the kingdom of . Ghana. King Governed w/o . laws; had a...
  • What facilitated the impact of HIV on gay communities?

    What facilitated the impact of HIV on gay communities?

    Gay Communities & HIV. What facilitated the impact of HIV . infection among gay . communities in the U.S.? How did gay men shape the . US response to the HIV epidemic? How are gay communities currently . shaped ....