Windows 7 / Getting Started

Understanding the Indexing Process

Understanding how the indexing process works is helpful for troubleshooting issues regarding searching and indexing. The sections that follow outline different aspects of this process.

Types of Files Indexed

IFilters, property handlers, and the Windows property system are used to extract text from documents so that they can be indexed. Microsoft provides IFilters and property handlers for many common document formats by default, while installing other Microsoft applications may also install additional IFilters and property handlers to allow indexing of additional properties and content for documents created by these applications. In addition, third-party vendors may provide their own IFilters and property handlers for indexing proprietary document formats.

IFilters and property handlers are selected on the basis of the file's extension. IFilters understand file formats, whereas property handlers typically just understand file properties. For example, files having the extension .txt are scanned using the Plain Text filter, while files having the .doc extension are scanned using the Office filter and files having the .mp3 extension are scanned using the Audio property handler. All of these extensions are additionally scanned with the Windows property system to extract basic properties, such as file name and size. The Plain Text filter emits full-text content only because text files do not have extended properties (metadata). The Office filter, however, emits both full-text content and metadata because .doc files and other Office files can have extended properties such as Title, Subject, Authors, Date Last Saved, and so on.

Table-1 below lists common document formats, their associated file extensions, and the IFilter dynamic-link library (DLL) included in Windows 7 that is used to scan each type of document. (Table-2 then provides similar information for property handlers.) Note that the indexer scans files based on their file extension, not the type of content within the file. For example, a text file named Test.txt will have its contents scanned and indexed by the Plain Text filter, but a text file named Test.doc will not-the Office filter will be used to scan the file and will expect the file to be a .doc file and not a text file.

Note In Windows Vista, just over one hundred different file extensions were excluded by default from being indexed, including .bin, .chk, .log, .manifest, .tmp, and so on. Beginning with Windows 7, however, the indexer no longer excludes any file extensions by default. This change was made because many of these exclusions were no longer needed, while others had a good probability of reducing the relevance of search results. Some of these exclusions had also been in place to deal with performance issues that could arise if files were indexed. For instance, .log files can be updated very frequently, which in Windows Vista would have caused the indexer to index them repeatedly. Support for smart retry indexing, however, which was added in Windows 7, mitigate the impact of this type of issue. For more information concerning smart retry indexing, see the sidebar titled "Direct from the Source: Indexing and Libraries-Hard Disk Drives vs. Removable Storage" later in this tutorial.

Table-1 IFilters Included in Windows 7 by Document Format and File Extension

Document FormatFile ExtensionIFilter DLL
Plain Text.a, .ans, .asc, .asm, .asx, .bas, .bat, .bcp, .c, .cc, .cls, .cmd, .cpp, .cs, .csa, .csv, .cxx, .dbs, .def, .dic, .dos, .dsp, .dsw, .ext, .faq, .fky, .h, .hpp, .hxx, .i, .ibq, .ics, .idl, .idq, .inc, .inf, .ini, .inl, .inx, .jav, .java, .js, .kci, .lgn, .lst, .m3u, .mak, .mk, .odh, .odl, .pl, .prc, .rc, .rc2, .rct, .reg, .rgs, .rul, .s, .scc, .sol, .sql, .tab, .tdl, .tlh, .tli, .trg, .txt, .udf, .udt, .usr, .vbs, .viw, .vspscc, .vsscc, .vssscc, .wri, .wtxQuery.dll
Rich Text Format (RTF).rtfRTFfilt.dll
Microsoft Office Document.doc, .dot, .pot, .pps, .ppt, .xlb, .xlc, .xls, .xltOfffilt.dll
WordPad.docx, .otdWordpadFilter.dll
Multipurpose Internet Mail Extensions (MIME).dllMimefilt.dll
Hypertext Markup Language (HTML).ascx, .asp, .aspx, .css, .hhc, .hta, .htm, .html, .htt, .htw, .htx, .odc, .shtm, .shtml, .sor, .srf, .stm, .wdp, .vcprojNlhtml.dll
MIME HTML.mht, .mhtmlMimefilt.dll
Extensible Markup Language (XML).csproj, .user, .vbproj, .vcproj, .xml, .xsd, .xsl, .xsltXmlfilt.dll
Favorites.urlieframe.dll
Journal.jntJntfiltr.dll
XML Paper Specification (XPS).dwfx, .easmx, .edrwx, .eprtx, .jtx, .xpsMscoree.dll

Table-2 Property Handlers Included in Windows 7 by Document Format and File Extensions

Document FormatFile ExtensionIFilter DLL
Contacts.contactWab32.dll
System.cpl, .dll, .exe, .ocx, .rll, .sysShell32.dll
Fonts.fon, .otf, .ttc, .ttfShell32.dll
.Group Shell Extension.groupWab32.dll
Application Reference.appref-msDfshim.dll
Audio/Video Media.3gp, .3gp2, .3gpp, .aac, .adts, .asf, .avi, .dvr-ms, .m1v, .m2t, .m2ts, .m2v, .m4a, .m4b, .m4p, .m4v, .mod, .mov, .mp2, .mp2v, .mp3, .mp4, .mp4v, .mpe, .mpeg, .mpg, .mpv2, .mts, .ts, .tts, .vob, .wav, .wma, .wmvMf.dll
Internet Shortcut.urlLeframe.dll
Images.bmp, .dib, .gif, .ico, .jfif, .jpe, .jpeg, .jpg, .png, .rle, .tif, .tiff, .wdpPhotoMetadataHandler.dll
Installermsi, .msm, .msp, .mst, .pcpPropsys.dll
Library Folder.library-msShell32.dll
Microsoft XPS.xps, .dwfx, .easmx, .eadrwx, .eprtx, .jtxXpsshhdr.dll
Microsoft Office Document.doc, .dot, .pot, .ppt, .xls, .xlt, .msgPropsys.dll
Property Labels.labelShdocvw.dll
Search Connector.searchConnector-msShell32.dll
Search Folder.search-msShdocvw.dll
Shell Messages.eml, .nwsInetcomm.dll
Shortcut.lnkShell32.dll
Media Center Recorded TV.wtvSbe.dll

In Windows 7, all of the file types (extensions) listed in Table-2 are enabled for indexing by default. Note, however, that the Plain Text filter will scan files having the extension .txt but not files having the extension .log, even though the filter supports scanning of .log files. To configure the indexer to scan such files using the default filter, see the section "Modifying IFilter Behavior" later in this tutorial.

Two additional (implicit) IFilters and their extensions are not shown in Table-2:

  • File Properties filter This filter is used to index the file system properties only of files for which there is no registered IFilter or for which there is a registered IFilter but the user has explicitly gone into Control Panel and selected the Index Properties Only option for the extension. File extensions that use this filter include .cat, .evt, .mig, .msi, .pif, and about 300 other types of files. Note that the File Properties filter isn't really a filter per se, but instead represents the absence of a registered filter for these extensions. In other words, it relies on the File System Protocol Handler to provide the file properties.
  • Null filter This filter extracts the same properties as a File Properties filter and is used to deal with backward compatibility issues with older methods for registering IFilters. Again, this is not really a filter per se and relies upon the File System Protocol Handler to provide the file properties. The file extensions that use the Null filter are .386, .aif, .aifc, .aiff, .aps, .art, .asf, .au, .avi, .bin, .bkf, .bmp, .bsc, .cab, .cda, .cgm, .cod, .com, .cpl, .cur, .dbg, .dct, .desklink, .dib, .dl_, .dll, .drv, .emf, .eps, .etp, .ex_, .exe, .exp, .eyb, .fnd, .fnt, .fon, .ghi, .gif, .gz, .hqx, .icm, .ico, .ilk, .imc, .in_, .inv, .jbf, .jfif, .jpe, .jpeg, .jpg, .latex, .lib, .m14, .m1v, .mapimail, .mid, .midi, .mmf, .mov, .movie, .mp2, .mp2v, .mp3, .mpa, .mpe, .mpeg, .mpg, .mpv2, .mv, .mydocs, .ncb, .obj, .oc_, .ocx, .pch, .pdb, .pds, .pic, .pma, .pmc, .pml, .pmr, .png, .psd, .res, .rle, .rmi, .rpc, .rsp, .sbr, .sc2, .scd, .sch, .sit, .snd, .sr_, .sy_, .sym, .sys, .tar, .tgz, .tlb, .tsp, .ttc, .ttf, .url, .vbx, .vxd, .wav, .wax, .wll, .wlt, .wm, .wma, .wmf, .wmp, .wmv, .wmx, .wmz, .wsz, .wvx, .xix, .z, .z96, .zfsendtotarget, and .zip.

Note Beginning with Windows 7, you won't see the name Null Filter in the Indexing Options Control Panel any longer. Instead, extensions that use this IFilter will just be associated with the File Properties Filter. You are able to tell that the Null IFilter is being used for a file extension only if you looked up the appropriate entry in the registry. This change was made in Windows 7 because the name "Null Filter" was confusing to some users.

The Windows Search service can be enhanced by installing the Microsoft Filter Pack, which provides additional IFilters to support critical search scenarios across multiple Microsoft Search products. The Filter Pack includes the following IFilters:

  • Metro (.docx, .docm, .pptx, .pptm, .xlsx, .xlsm, .xlsb)
  • Visio (.vdx, .vsd, .vss, .vst, .vdx, .vsx, .vtx)
  • OneNote (.one)
  • Zip (.zip)

These IFilters are designed to provide enhanced search functionality for the following products: SPS2003, MOSS2007, Search Server 2008, Search Server 2008 Express, WSSv3, Exchange Server 2007, SQL Server 2005, SQL Server 2008, and WDS 3.01.

When you install the Filter Pack, the IFilters in the preceding list are installed and registered with the Windows Search service. Note that the Filter Pack does not need to be installed if Office 2007 is installed. The Filter Pack is available from
http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CC&displaylang=en
for both x86 and x64 versions of Windows 7, Windows Vista, Windows Server 2008 R2, Windows Server 2008, Windows XP, and Windows Server 2003.

[Previous] [Contents] [Next]

In this tutorial:

  1. Managing Search
  2. Search and Indexing Enhancements
  3. Search in Windows XP
  4. Search in Windows Vista
  5. Search in Windows 7
  6. Understanding the Windows Search Versions
  7. Search Versions Included in Windows 7 and Windows Vista
  8. Search Versions Included in Windows Server 2008
  9. Search Versions Available for Earlier Versions of Windows
  10. How Windows Search Works
  11. Understanding Search Engine Terminology
  12. Windows Search Engine Processes
  13. Enabling the Indexing Service
  14. Windows Search Engine Architecture
  15. Understanding the Catalog
  16. Default System Exclusion Rules
  17. Understanding the FANCI Attribute
  18. Default Indexing Scopes
  19. Initial Configuration
  20. Understanding the Indexing Process
  21. Modifying IFilter Behavior
  22. How Indexing Works
  23. Rebuilding the index
  24. Viewing Indexing Progress
  25. Understanding Remote Search
  26. Managing Indexin
  27. Configuring the Index
  28. Configuring the Index Location Using Group Policy
  29. Configuring Indexing Scopes and Exclusions Using Group Policy
  30. Configuring Offline Files Indexing
  31. Configuring Indexing of Encrypted Files
  32. Configuring Indexing of Encrypted Files Using Control Panel
  33. Configuring Indexing of Similar Words
  34. Configuring Indexing of Text in TIFF Image Documents
  35. Other Index Policy Settings
  36. Using Search
  37. Configuring Search Using Folder Options
  38. Configuring What to Search
  39. Configuring How To Search
  40. Using Start Menu Search
  41. Searching Libraries
  42. Advanced Query Syntax
  43. Using Federated Search
  44. Deploying Search Connectors
  45. Troubleshooting Search and Indexing Using the Built-in Troubleshooter