Understanding Search Engine Terminology
The following terminology describes search and indexing as it has been implemented in Windows 7 and Windows Vista:
- Catalog The index with the property cache.
- Crawl scopes (inclusions and exclusions) Included and excluded paths within a search root. For example, if a user wants to index the D drive but exclude D:\Temp, he would add a crawl scope (inclusion) for "D:\*" and a crawl scope (Exclusion) for "D:\Temp\*". The Crawl Scope Manager would also add a start address for "D:\".
- Gathering The process of discovering and accessing items from a data store using protocol handlers and IFilters.
- IFilter A feature of the Windows Search engine that is used to extract text from documents so that it can be added to the index. (IFilters can also be used to extract format-specific properties, such as Subject or Author; however, in Windows Vista and Windows 7, property handlers are the preferred mechanism for extracting these properties.) Microsoft provides IFilters for many common document formats by default, while third-party vendors such as Adobe provide their own IFilters for indexing other types of content.
- Property handler A feature of Windows that is used to extract format-dependent properties. This feature is used both by the Windows Search engine to read and index property values and also by Windows Explorer to read and write property values directly in the file. Microsoft provides property handlers for many common formats by default.
- Indexing The process of building the system index and property cache, which together form the catalog.
- Master index A single index formed by combining shadow indexes together using a process called the master merge. This is a content index and conceptually maps words to documents or other items.
- Master merge The process of combining index fragments (shadow indexes) together into a single content index called the master index.
- Property cache The persistent cache of properties (metadata) for indexed items. Basic file properties (such as the file size or last date modified) are added to the property cache for each indexed item; additional properties are added for items with format-specific properties collected by a property handler or IFilter. Indexing item properties allows users to search quickly through this information and create rich pivoted views based on available metadata.
- Property store Another name for the property cache.
- Protocol handler A feature of the Windows Search engine that is used to communicate with and enumerate the contents of stores such as the file system, Messaging Application Program Interface (MAPI) e-mail database, and the CSC or offline files database. Like IFilters, protocol handlers are also extensible.
- Start address A Uniform Resource Locator (URL) that points to the starting location for indexed content. When indexing is performed, each configured starting address is enumerated by a protocol handler to find the content to be indexed.
- Search root The base namespace of a given protocol handler.
- Search defaults The default crawl scope(s) for a given search root.
- Shadow indexes Temporary indexes that are created during the indexing process and then combined into a single index called the master index.
- Shadow merge The process of combining index fragments (shadow indexes) together into the next level of index. The resulting index file will still be a shadow index, but merging indexes into bigger entities improves query performance.
- System index The entire index on the system, including the master index, shadow indexes, and various configuration files, log files, and temporary files.
Note Existing IFilters, such as the Plain Text filter, can also be used to index unregistered file types or file types that are not content indexed by default. For example, you can register the Plain Text filter for use with .cpp files.
In this tutorial:
- Managing Search
- Search and Indexing Enhancements
- Search in Windows XP
- Search in Windows Vista
- Search in Windows 7
- Understanding the Windows Search Versions
- Search Versions Included in Windows 7 and Windows Vista
- Search Versions Included in Windows Server 2008
- Search Versions Available for Earlier Versions of Windows
- How Windows Search Works
- Understanding Search Engine Terminology
- Windows Search Engine Processes
- Enabling the Indexing Service
- Windows Search Engine Architecture
- Understanding the Catalog
- Default System Exclusion Rules
- Understanding the FANCI Attribute
- Default Indexing Scopes
- Initial Configuration
- Understanding the Indexing Process
- Modifying IFilter Behavior
- How Indexing Works
- Rebuilding the index
- Viewing Indexing Progress
- Understanding Remote Search
- Managing Indexin
- Configuring the Index
- Configuring the Index Location Using Group Policy
- Configuring Indexing Scopes and Exclusions Using Group Policy
- Configuring Offline Files Indexing
- Configuring Indexing of Encrypted Files
- Configuring Indexing of Encrypted Files Using Control Panel
- Configuring Indexing of Similar Words
- Configuring Indexing of Text in TIFF Image Documents
- Other Index Policy Settings
- Using Search
- Configuring Search Using Folder Options
- Configuring What to Search
- Configuring How To Search
- Using Start Menu Search
- Searching Libraries
- Advanced Query Syntax
- Using Federated Search
- Deploying Search Connectors
- Troubleshooting Search and Indexing Using the Built-in Troubleshooter