This is the work done by me and my friend (Behera) as my undergraduate project. Being from the Electrical department and doing the final year project in computer science department was no easy task. Thankfully our supervisor Dr Sudeshna Sarkar was very helpful and the professors from Electrical department too ignorant to understand what I was talking about in the presentation. We did a good amount of work and the work was conferred best undergraduate project is systems award!
Here goes the abstract from the thesis -
A question answering (QA) system provides direct answers to user questions by consulting its knowledge base. Since the early days of artificial intelligence in the 60’s, researchers have been fascinated with answering natural language questions. However, the difficulty of natural language processing (NLP) has limited the scope of QA to domain-specific expert systems. In recent years, the combination of web growth, improvements in information technology, and the explosive demand for better information access has reignited the interest in QA systems. The wealth of information on the web makes it an attractive resource for seeking quick answers to simple, factual questions such as “who was the first American in space?” or “what is the second tallest mountain in the world?” Yet today’s most advanced web search services (e.g., Google, Yahoo, MSN live search and AskJeeves) make it surprisingly tedious to locate answers to such questions. Question answering aims to develop techniques that can go beyond the retrieval of relevant Documents in order to return exact answers to natural language factoid questions, such as “Who is the first woman to be in space?”, “Which is the largest city in India?”, and “When was first world war fought?”. Answering natural language questions requires more complex processing of text than employed by current information retrieval systems.
This thesis investigates a number of techniques for performing open-domain factoid question answering. We have developed an architecture that augments existing search engines so that they support natural language question answering and is also capable of supporting local corpus as a knowledge base. Our system currently supports document retrieval from Google and Yahoo via their public search engine application programming interfaces (APIs). We assumed that all the information required to produce an answer exists in a single sentence and followed a pipelined approach towards the problem. Various stages in the pipeline include: automatically constructed question type analysers based on various classifier models, document retrieval, passage extraction, phrase extraction, sentence and answer ranking. We developed and analyzed different sentence and answer ranking algorithms, starting with simple ones that employ surface matching text patterns to more complicated ones using root words, part of speech (POS) tags and sense similarity metrics. The thesis also presents a feasibility analysis of our system to be used in real time QA applications.
You can download the thesis here
Everything is written in JAVA. The source code is huge ~100Mbs so I'm not posting them.
Print This Post
Comparison of our system with existing ones