How Indexing Works
To illustrate the indexing process, consider what happens when a new document is added to an indexed location (a location that is configured for being indexed) on an NTFS volume. The following high-level description explains the steps that take place during the indexing of new file system content:
- The NTFS change journal detects a change to the file system and notifies the main indexer process (SearchIndexer.exe). To view the state of this flag for a file, open the file's properties in Windows Explorer and click Advanced. A file change notification is then recorded in the USN journal, and the indexing service listens to these notifications.
- The indexer process starts the Search Filter Host process (SearchFilterHost.exe) if it isn't currently running, and the system protocol handler loads the file protocol handler and Protocol Host.
- The file's URL is sent to the gatherer's queue. When the indexer retrieves the URL from the queue, it picks the file protocol handler to access the item (based on the file: scheme in the URL). The file protocol handler accesses the system properties (for example, name and size), calls the property handler if one is available, and then reads the content stream from the file system and sends it to the Search Filter Host.
- In the Search Filter Host, the appropriate IFilter is loaded and the filter returns text and property chunks to the indexer.
- Back in the indexer process, the chunks are tokenized using the appropriate language wordbreaker (each chunk has a locale ID), and the text is sent into the indexing pipeline.
- In the pipeline, the indexing plug-in sees the data and creates the in-memory word lists (word to item ID/occurrence counts index). Occasionally, these are written to shadow indexes and then to the master index via master merge.
- Another plug-in reads the property values and stores them in the property cache.
- If you have a Tablet PC, you may have activated another plug-in that looks for text you write and uses it to help augment handwriting recognition.
Note In Windows 7, both NTFS and FAT32 volumes support notification-based indexing (crawling or pull-type indexing). For NTFS volumes, the NTFS change journal enables notification- based indexing. For FAT volumes, an initial crawl is performed when the location is added and then recrawl is done whenever the location is disconnected (for example, when using an external universal serial bus (USB) drive formatted with FAT) or when the system is rebooted. Once the crawl is complete, however, the ReadDirectoryChangesW application programming interface (AP I) can be used to listen for any updates.
In this tutorial:
- Managing Search
- Search and Indexing Enhancements
- Search in Windows XP
- Search in Windows Vista
- Search in Windows 7
- Understanding the Windows Search Versions
- Search Versions Included in Windows 7 and Windows Vista
- Search Versions Included in Windows Server 2008
- Search Versions Available for Earlier Versions of Windows
- How Windows Search Works
- Understanding Search Engine Terminology
- Windows Search Engine Processes
- Enabling the Indexing Service
- Windows Search Engine Architecture
- Understanding the Catalog
- Default System Exclusion Rules
- Understanding the FANCI Attribute
- Default Indexing Scopes
- Initial Configuration
- Understanding the Indexing Process
- Modifying IFilter Behavior
- How Indexing Works
- Rebuilding the index
- Viewing Indexing Progress
- Understanding Remote Search
- Managing Indexin
- Configuring the Index
- Configuring the Index Location Using Group Policy
- Configuring Indexing Scopes and Exclusions Using Group Policy
- Configuring Offline Files Indexing
- Configuring Indexing of Encrypted Files
- Configuring Indexing of Encrypted Files Using Control Panel
- Configuring Indexing of Similar Words
- Configuring Indexing of Text in TIFF Image Documents
- Other Index Policy Settings
- Using Search
- Configuring Search Using Folder Options
- Configuring What to Search
- Configuring How To Search
- Using Start Menu Search
- Searching Libraries
- Advanced Query Syntax
- Using Federated Search
- Deploying Search Connectors
- Troubleshooting Search and Indexing Using the Built-in Troubleshooter