Design, query, and evaluate information retrieval systems
Statement of Competency
Possessing this competency indicates proficiency in creating an IR system with which an information search can be performed with excellent recall and precision. It requires the information professional to have an understanding of how queries are executed, how to improve the collection’s findability, and assess the usefulness of the IR system.
Information retrieval systems are the backbone of information services. Without an efficient information retrieval system, access to information is adversely affected. Designing an information retrieval system involves the creation of the database by a database designer, who is responsible for setting up the structure of the database. The database design is one of two functions in the creation of the database and it involves two primary decisions: 1) purpose of the database and 2) nature of its users (Jackson, 2007, p. 3). During this phase, the database designer designates the field names, the type of data for each field, how the field is to be indexed and if there are limits on the values that can be entered in the field, for example, a field can only contain numbers.
The database as it is used in libraries was first conceptualized in answer to a need, which is to find the right information faster. In 1966, NASA awarded parallel funding to Lockheed, for an in-house installation, and to Bunker-Ramo, for a dial-up service, to automate the NASA database. Although this database stored the information successfully, retrieval wasn’t optimal. It wasn’t until Roger Summit developed Dialog, that the way we retrieved information dramatically changed (Information Today, 2003). In my Information Retrieval class, we learned how to use DB/TextWorks and as groups created a small collection of records. In order for the individual records to become searchable, it is very important that they are properly indexed. The groups in our class exchanged databases to evaluate.
The second function in the design of the IR system is database maintenance. The database maintainer is also called the indexer, the person who populates the database with records for the items in the collection. The reason the indexer is also called the database maintainer is because records in a database are subject to change. Records are constantly added, updated or deleted. The job of the indexer is very important because accuracy in describing records and the use of correct metadata standards guarantees access to information. In the past decade, there has been unprecedented growth in metadata schema because now we have to create surrogate records for digitized and born-digital items as well.
A query is an expression of an information need. In indexing records in a database, the database maintainer has to describe what they know about the record. The metadata used by the indexer helps the records become discoverable. LIS professionals, as opposed to the casual library users, are likely to have more successful searches because of their knowledge of how records are indexed. Also, IR systems are generally designed to support analytical searching. It is a method of searching that involves logical steps that examines the immediate outcome from the query. Information professionals can employ search strategies to locate the desired information such as building blocks, pearl growing and lawn mowing.
Most of the time, the search evolves into what is called “berrypicking.” Marcia Bates (1989), in her seminal work The Design of Browsing and Berrypicking Techniques for the Online Search Interface, identified this new model for searching in online environments and other information systems. She argues that this model is much closer to the real behavior of information searchers. She emphasizes the need to understand this because it can guide us in designing effective IR systems.
Hernon and McClure (1990) defined evaluation as “the process of identifying and collecting data about specific services or activities, establishing criteria by which their success can be assessed and the degree to which the service or activity accomplishes stated goals and objectives.” The evaluation of IR systems is called explicit evaluation, which an assessment is done by researchers and system operators with the explicit purpose of seeing if system performance, in this case, of the IR system, can be improved (Rowley & Hartley, 2008, p. 291). The goal of an IR system is locate relevant documents in response to a user’s query (Kumar, et.al. 2005, p. 1). However, effectiveness is not the only criterion when evaluating an IR system. Other key criteria are: usability, satisfaction, and cost (Rowley & Hartley, 2008, p. 292).
Effectiveness is measured by recall – the system’s ability to retrieve relevant information; and precision – the measure of the system’s ability to filter out irrelevant or unwanted results. Recall usually leads to a greater number of related results while precision garners fewer but more accurate results. A successful IR also has to take into consideration user experience and satisfaction. These criteria are what makes Google popular. Its relative ease of use trumps the more systematic Boolean search that librarians employ. Finally, a key criteria that is considered, especially by smaller libraries, is the cost of an IR system. For example, calculating the cost of the operation of an OPAC is becoming more difficult to determine with the increasing remote access of required by users (p. 294).
Justification of Evidence
- PICO Search Strategy (ALA Health Information 101 – RUSA e-course)
PICO is a mnemonic that stands for Patient – Intervention – Comparison – Outcome. It is a tool that an information professional can use to formulate a clinical question. Sayers (2007) wrote that successful search strategies are usually highly structured and built around a PICO framework. I learned in my Health Information class that using this search strategy is helpful in finding relevant medical literature because medical research usually identifies the specific population, the intervention or treatment, the kind of study and the results. This is especially useful when searching evidence-based literature.
The artifact I included here is an assignment from the Health Information class I took from ALA. Using the specifics given by the instructor, I had to find information that compares the drug ibuprofen and a placebo on providing near complete or complete relief from migraines for adult patients. Using the databases of the National Library of Medicine, I first identified the PICO elements. Not all information professionals are well-versed in medical terms so I thought it was prudent to first find the definitions of some of the elements from MedlinePlus. MedlinePlus is a website for both consumers and health professionals and it has directories, a medical encyclopedia, a medical dictionary, health information in Spanish, extensive information on prescription and OTC drugs, health information from the media, and links to clinical trials (Medline, 2015). I wanted to get systematic reviews, which provide the highest quality of information from multiple years of studies.
In the PubMed database, I opted to do an Advanced Search using MeSH (Medical Subject Headings). The search terms for each element of PICO can be linked together using Boolean operators. Sometimes, it is necessary to modify search terms when one doesn’t get the required information. This is done either by changing the search term, adding more search terms, or in this case, I had to remove some of the search terms and I also used modifying filters to arrive at the results I needed.
- Searching Online Resources (Disaster Information Management Resource Center Project, National Library of Medicine)
During my fellowship at the National Library of Medicine (NLM), one of the projects I had was a reference interview for the division’s computer scientist. He was very happy with the results and asked me to conduct another research project. He had expressed regret that I would be leaving soon because he said he didn’t remember how to use the databases efficiently so aside from just giving him the results of my research, I created this document as a guide for searching NLM databases. In this artifact, I demonstrated my skill in retrieving information from different information systems. The NLM is part of the National Institutes of Health and all of its databases can be searched from the National Center for Biotechnology Information (NCBI) website.
I recommended that if he wanted to only read full-text journal articles, then he can choose PubMed Central from the list of databases, which significantly narrows down the number of results. The NIH Online Catalog also has an Advanced Keyword Search. Instead of leaving the default search to “Any Field,” it should be changed to “Subject” and do so with each field to be used.
The NIH Library also has links to other subscription databases such as Scopus and Web of Science. Searching in these databases are very similar to searching the NCBI databases so it wasn’t necessary to provide detailed instructions. Instead, I complemented each search with screenshots of the search results. In addition to these, I also showed him examples of pertinent results using free online resources such as Google and Google Scholar. I got very positive feedback from him. He said that I just didn’t give him fish but showed him how fish and that he planned to keep the document as guide for his future research.
- ArcGIS Metadata Scheme Evaluation (Writing Data Management Plans – ACRL e-Course)
In this artifact, I evaluated the ArcGIS Metadata Schema for my class in data management. ArcGIS is a geographic information system (Wikipedia, 2015). It can be used to create maps, compiling geographic data, analysis of map information and its online and desktop platforms for managing geographic information in a database. The ArcGIS Metadata Schema identifies geospatial datasets, listing and defining their attributes, establishes rules for the behavior of geospatial and attribute data and determines the relationship between geospatial datasets and attribute tables (U.S. Census Bureau, p.8).
In the class, we learned how to use Beall’s criteria for evaluating a metadata schema’s strengths and weaknesses. Jeffrey Beall is a catalog librarian and assistant professor at the University of Colorado at Denver. Beall recommends libraries and organizations use the criteria in terms of their particular needs and the needs of the users of the data the metadata describes (Beall, 2007, p. 28). Some of the criteria are granularity, level of connection to content standards, availability of searching systems, interoperability, scalability, etc.
Metadata is often defined as “data about data.” Metadata is a set of data that describes another piece of data (or document). “Ordinarily, information storage and retrieval systems have been concerned with text and text-like records,” says Michael Buckland, School of Information professor at University of California, Berkeley. “The present interest in ‘multimedia’ reminds us that not all phenomena of interest in information science are textual or text-like” (Buckland, 1997, p. 804)
Today’s information professional does not only deal with textual materials like books but also events, processes, images, and objects (p. 804). So the traditional means of describing and managing items in a collection is no longer sufficient. It was good practice to evaluate this metadata schema because libraries are now involved in using different schemas in cataloging information. It is impossible to know all existing schemas but Beall (2007) said that knowing the points of comparison is beneficial in two ways: it can simply the selection of which schema to use and it’s also useful in evaluating the effectiveness of an scheme already in use. He also advises that libraries and organizations should regularly examine the schemes they use to see if the schemes they are using are still meeting their needs (p. 28).
Amadeus Global Website. Retrieved from http://www.amadeus.com/web/amadeus/en_1A-corporate/Amadeus-Home/1319560218660-Page-AMAD_HomePpal
Bates, M. (1989). The design of browsing and berrypicking techniques for the online search interface. Online Review, 13, 407-424.
Beall, J. (2007). Discrete criteria for selecting and comparing metadata schemas. Against the Grain, 19(1).
Buckland, M.K. (1997). What is a “document”? Journal of the American Society of Information Science, 48(9), 804-809.
Clark, L. (2015). Using NoSQL databases to gain competitive advantage. Retrieved from http://www.computerweekly.com/feature/NoSQL-databases-ride-horses-for-courses-to-edge-competitive-advantage.
Hernon, P. and McClure, C.R. (1990). Evaluation and library decision making. Norwood, NJ: Ablex.
Information Today. (2003). Early pioneers tell their story, parts 1 and 4. Retrieved from http://www.infotoday.com/searcher/jun03/ardito_bjorner.shtml
Jackson, R. (2007). DBTextWorks tutorial. PowerPoint slides. Retrieved from http://ischool.sjsu.edu/courses/202/dbtext/DBTextWorksTutorial.ppt.
Kumar. R., Suri, P.K. Chauhan, R.K. (2005). Search engines evaluation. DESIDOC Bulletin of Information Technology, 25(2), 3-10.
NCBI. (2015). Welcome page. Retrieved from http://www.ncbi.nlm.nih.gov/
NIH. (2015). NIH Library Quick Links. Retrieved from http://nihlibrary.nih.gov/pages/quicklinks.aspx
Rowley, J. & Hartley, R. (2008). Organizing knowledge: An introduction to managing access to information. Surrey, UK: Ashgate Publishing, Ltd.
Sayers, A. (2007). Tips and tricks in performing a systematic review: Reference management and identifying search terms and keywords. The British Journal of General Practice: the journal of the Royal College of General Practitioners, 57(545), 999.
U.S. Census Bureau. (n.d.) Geodatabase development considerations: Schemas, attributes, and metadata. Retrieved from http://ggim.un.org/docs/meetings/Regional%20Workshop_Jordan/Session%203-Attributes,%20Metadata,%20and%20Schemas.pdf
U.S. National Library of Medicine. (2015). About MedlinePlus. Retrieved from https://www.nlm.nih.gov/medlineplus/aboutmedlineplus.html
Wikipedia. (2015). ArcGIS. Retrieved fromhttps://en.wikipedia.org/wiki/ArcGIS