CSE 635 Multimedia Information Retrieval Chapter Chapter 1: 1: Introduction Introduction to to IR IR Srihari-CSE635-Fall 2002 Motivation IR: IR: representation, representation, storage, storage, organization organization of, of, and and access access to to information information items items Focus Focus is is on on the the user user information information need need User User information information need: need: When Whendid didthe theBuffalo BuffaloBills Billslast

lastwin winthe theSuper SuperBowl? Bowl? Find Findall alldocs docscontaining containinginformation informationon oncollege collegetennis tennisteams teamswhich: which:(1) (1)are are maintained by a USA university and (2) participate in the NCAA tournament. maintained by a USA university and (2) participate in the NCAA tournament. Emphasis Emphasis is is on on the the retrieval retrieval of of information information (not (not data) data) Srihari-CSE635-Fall 2002 Motivation Data retrieval

which docs contain a set of keywords? Well defined semantics a single erroneous object implies failure! Information retrieval information about a subject or topic deals with unstructured text semantics is frequently loose small errors are tolerated IR system: interpret contents of information items generate a ranking which reflects relevance notion of relevance is most important Srihari-CSE635-Fall 2002 Comparison of different information systems Discipline Data Object Primary Operation DB Size IR Document Small v. large DBMS Table AI

Logical statements Retrieval (probabilisti c) Retrieval (determinist ic) Inference Small v. large small Evaluation: Precision versus Recall Srihari-CSE635-Fall 2002 Motivation IR at the center of the stage IR in the last 20 years: classification and categorization systems and languages user interfaces and visualization Still, area was seen as of narrow interest Advent of the Web changed this perception once and for all universal repository of knowledge free (low cost) universal access no central editorial board many problems though: IR seen as key to finding the solutions! Srihari-CSE635-Fall 2002 Basic Concepts The User Task Retrieval Database Browsing

Retrieval information or data purposeful needle in a haystack problem Browsing glancing around Formula 1 racing; cars, Le Mans, France, tourism Filtering (push rather than pull) Srihari-CSE635-Fall 2002 Basic Concepts Logical view of the documents documents represented by a set of index terms or keywords Accents spacing Docs stopwords Noun groups stemming Automatic or Manual indexing Structure recognition structure Full text Index terms Document representation viewed as a continuum: logical view of docs might shift

Srihari-CSE635-Fall 2002 The Retrieval Process Text User Interface 4, 10 user need Text Text Operations 6, 7 logical view logical view Query Operations DB Manager Module Indexing user feedback 5 query Searching 8 inverted file Index 8 retrieved docs Text Database Ranking ranked docs

2 Retrieval Indexing Srihari-CSE635-Fall 2002 Applications of IR Domains Specialized Specialized Domains biomedical, biomedical,legal, legal,patents, patents,intelligence intelligence Information Information filtering filtering Summarization Summarization Cross-lingual Cross-lingual Retrieval Retrieval Question-Answering Question-Answering Systems Systems Ask AskJeeves Jeeves

applications Web Web applications shopbots shopbots personal personalassistant assistantagents agents Mining Text Text Mining data datamining miningon onunstructured unstructuredtext text IR Multimedia Multimedia IR images, images,document documentimages, images,speech, speech,music music Srihari-CSE635-Fall 2002 IR Techniques Machine

Machine learning learning clustering, clustering,SVM, SVM,latent latentsemantic semanticindexing, indexing,etc. etc. improving improvingrelevance relevancefeedback, feedback,query queryprocessing processingetc. etc. Natural Natural Language Language Processing, Processing, Computational Computational Linguistics Linguistics better betterindexing, indexing,query queryprocessing processing incorporating

incorporatingdomain domainknowledge: knowledge:e.g., e.g.,synonym synonymdictionaries dictionaries use useof ofNLP NLPin inIR: IR:benefits benefitsyet yetto tobe beshown shownfor forlarge-scale large-scaleIR IR Information InformationExtraction Extraction Highly focused Natural language processing (NLP) Highly focused Natural language processing (NLP) named entity tagging, relationship/event detection named entity tagging, relationship/event detection Text Text indexing indexing and and compression compression User User interfaces interfaces and and visualization visualization AI AI advanced advancedQA QAsystems, systems,inference, inference,etc.

etc. Srihari-CSE635-Fall 2002 Issues to be Addressed by IR How How to to improve improve quality quality of of retrieval retrieval Faster Faster indexes indexes and and smaller smaller query query response response times times better better understanding understanding of of user user behaviour behaviour interactive interactiveretrieval retrieval visualization visualizationtechniques techniques Srihari-CSE635-Fall 2002

