Windows 7 / Getting Started

Understanding Search Engine Terminology

The following terminology describes search and indexing as it has been implemented in Windows 7 and Windows Vista:

  • Catalog The index with the property cache.
  • Crawl scopes (inclusions and exclusions) Included and excluded paths within a search root. For example, if a user wants to index the D drive but exclude D:\Temp, he would add a crawl scope (inclusion) for "D:\*" and a crawl scope (Exclusion) for "D:\Temp\*". The Crawl Scope Manager would also add a start address for "D:\".
  • Gathering The process of discovering and accessing items from a data store using protocol handlers and IFilters.
  • IFilter A feature of the Windows Search engine that is used to extract text from documents so that it can be added to the index. (IFilters can also be used to extract format-specific properties, such as Subject or Author; however, in Windows Vista and Windows 7, property handlers are the preferred mechanism for extracting these properties.) Microsoft provides IFilters for many common document formats by default, while third-party vendors such as Adobe provide their own IFilters for indexing other types of content.
  • Property handler A feature of Windows that is used to extract format-dependent properties. This feature is used both by the Windows Search engine to read and index property values and also by Windows Explorer to read and write property values directly in the file. Microsoft provides property handlers for many common formats by default.
  • Indexing The process of building the system index and property cache, which together form the catalog.
  • Master index A single index formed by combining shadow indexes together using a process called the master merge. This is a content index and conceptually maps words to documents or other items.
  • Master merge The process of combining index fragments (shadow indexes) together into a single content index called the master index.
  • Property cache The persistent cache of properties (metadata) for indexed items. Basic file properties (such as the file size or last date modified) are added to the property cache for each indexed item; additional properties are added for items with format-specific properties collected by a property handler or IFilter. Indexing item properties allows users to search quickly through this information and create rich pivoted views based on available metadata.
  • Property store Another name for the property cache.
  • Protocol handler A feature of the Windows Search engine that is used to communicate with and enumerate the contents of stores such as the file system, Messaging Application Program Interface (MAPI) e-mail database, and the CSC or offline files database. Like IFilters, protocol handlers are also extensible.
  • Start address A Uniform Resource Locator (URL) that points to the starting location for indexed content. When indexing is performed, each configured starting address is enumerated by a protocol handler to find the content to be indexed.
  • Search root The base namespace of a given protocol handler.
  • Search defaults The default crawl scope(s) for a given search root.
  • Shadow indexes Temporary indexes that are created during the indexing process and then combined into a single index called the master index.
  • Shadow merge The process of combining index fragments (shadow indexes) together into the next level of index. The resulting index file will still be a shadow index, but merging indexes into bigger entities improves query performance.
  • System index The entire index on the system, including the master index, shadow indexes, and various configuration files, log files, and temporary files.

Note Existing IFilters, such as the Plain Text filter, can also be used to index unregistered file types or file types that are not content indexed by default. For example, you can register the Plain Text filter for use with .cpp files.

[Previous] [Contents] [Next]

In this tutorial:

  1. Managing Search
  2. Search and Indexing Enhancements
  3. Search in Windows XP
  4. Search in Windows Vista
  5. Search in Windows 7
  6. Understanding the Windows Search Versions
  7. Search Versions Included in Windows 7 and Windows Vista
  8. Search Versions Included in Windows Server 2008
  9. Search Versions Available for Earlier Versions of Windows
  10. How Windows Search Works
  11. Understanding Search Engine Terminology
  12. Windows Search Engine Processes
  13. Enabling the Indexing Service
  14. Windows Search Engine Architecture
  15. Understanding the Catalog
  16. Default System Exclusion Rules
  17. Understanding the FANCI Attribute
  18. Default Indexing Scopes
  19. Initial Configuration
  20. Understanding the Indexing Process
  21. Modifying IFilter Behavior
  22. How Indexing Works
  23. Rebuilding the index
  24. Viewing Indexing Progress
  25. Understanding Remote Search
  26. Managing Indexin
  27. Configuring the Index
  28. Configuring the Index Location Using Group Policy
  29. Configuring Indexing Scopes and Exclusions Using Group Policy
  30. Configuring Offline Files Indexing
  31. Configuring Indexing of Encrypted Files
  32. Configuring Indexing of Encrypted Files Using Control Panel
  33. Configuring Indexing of Similar Words
  34. Configuring Indexing of Text in TIFF Image Documents
  35. Other Index Policy Settings
  36. Using Search
  37. Configuring Search Using Folder Options
  38. Configuring What to Search
  39. Configuring How To Search
  40. Using Start Menu Search
  41. Searching Libraries
  42. Advanced Query Syntax
  43. Using Federated Search
  44. Deploying Search Connectors
  45. Troubleshooting Search and Indexing Using the Built-in Troubleshooter