To exclude all the items in the document library or list from appearing in to search results:
People keep on writing custom code and doing hell lot of configuration using scopes, permissions etc
SharePoint provides an OOTB way to exclude items from a list/library from appearing in the search results
1. Navigate to the list/library.
2. Click on Settings –> List Settings –>Advanced Settings
4. Check “Search” section at bootom of the page
This issue occurs when the "mssph.dll" (Search Protocol Handlers) is not registered for searching the content sources. Try : Start–>Run regsvr32 "C:Program FilesCommon FilesMicrosoft Sharedweb server extensions12BINmssph.dll" Try Full crawl and it should work.
Recently while using search service in MOSS 2007 I got this error System.Web.Services.Protocols.SoapException: Server was unable to process request. —> Attempted to perform an unauthorized operation. Solution:
- Check if the current user has appropriate permissions on SharePoint site.
- The Web Application should be configured configured with Integrated Windows Authentication and anonymous access should be disabled/unchecked.
Go to IIS—>Website—>Properties—>Directory Security—>Uncheck Enable Anonymous Access
If you need to search a scanned pdf file in SharePoint. Adobe ifilter (free) does not have a capability to search through a scanned pdf file. Let’s dig out what does scanned pdf means for those who are scratching their heads so as to what’s this exactly is. Here you go…a scanned pdf is one that is created by scanning physical paper like pages of a book, legal documents, etc. see below While doing a proof-of concept exercise for a prospect we encountered this behavior (inability to search through scanned pdf files) and if you are interested how we overcame the issue…please read on After analyzing various aspects as to what best can be done so as to facilitate "scanned pdf" searching in SharePoint, we zeroed in on the following three options. The crux is to make the scanned pdf a searchable dual layer pdf which has not only the scanned image but also a layer of the text from the image. The technology to read text from "Image" is known as OCR (Optical Character Recognition)
- Use an OCR tool which converts the "scanned pdf" directly to "dual layer pdf" i.e. (image + OCR text) and upload the resulting pdf to SharePoint and the adobe ifilter will take care of indexing the document. Following are the few such products: Nuance PDF Converter Enterprise Solid PDF Tools X-Key These products come with an API as well, that means you can automate the complete process.
- Use an open source OCR tool to retrieve the text from the pdf file and store it as a metadata for the pdf document inside a SharePoint document library. Do a full crawl of the site and you are up and running with the solution to scan OCRopus Tesseract
- Use an Ifilter specifically targeted towards such pdf documents. Captaris
List of OCR Software’s Free:
Till next time…