When the ProQuest Historical Newspapers program was launched over 15 years ago, all newspapers were digitized at the article-level. This means that every page of newspaper content was “zoned” into its distinct articles and other component parts (editorials, advertisements, cartoons, etc. and shown by the numbers below), and each of those component parts was then run through OCR and treated as an individual entity in the database.
Beginning in 2016, some new historical newspaper titles in the ProQuest Historical Newspapers program are being digitized at the page-level. This means that the full-page image is run through OCR, and the full page of content is stored in its entirety in the database.
The basic searching function is identical for both article-level and page-level. Every part of every page of ProQuest Historical Newspapers is full-text searchable, whether they are digitized at the article-level or page-level. If you search for a term such as the name “John Kennedy” and it appears in the OCR text any place on a page, it will generate a hit for that page—whether it is in an article title, in an article, in an advertisement, etc.
Further, both article-level and page-level titles are fully cross-searchable with all other historical newspapers, contemporary newspapers, and any other content on the ProQuest Platform.
The primary difference in searching article-level titles is in the Advanced Search: because they include article-level metadata, newspapers digitized at the article-level provide users with the ability to restrict search results to different portions of the newspaper (articles, advertisements, cartoons, etc.).
The search results interface has been designed so both article-level and page-level results are presented side-by-side in an integrated and intuitive way. Most users are unaware that there is any difference between the following results:
In the example above, the first result is article-level title, and the second and third results are page-level. In each case the user is presented with context and clues that help them quickly determine their interest level in the result. The search text is highlighted to show the keywords in context. The difference is that the article-level result includes the article name, while the page-level result includes the issue date and page.
When a user selects an article-level search result, that article image is displayed with the search term highlighted. When a user selects a page-level result, the entire page of content is displayed with the search term highlighted.
(NOTE: Hit-term highlighting currently requires that users have the Adobe PDF plug-in installed as the default PDF viewer for their browser. Firefox and Internet Explorer 8 have the Adobe PDF plug-in installed by default. This link describes the use of the Adobe Plug-in with various browsers.)
The article-level result is a bitonal (black and white) image displayed at 300dpi (dots per inch). The page-level result is a high-resolution greyscale image displayed at 400dpi, which provides almost photographic-like quality of the microfilm images. In both cases the images may be scrolled and zoomed as needed, and saved to the user’s local storage.
The new page-level interface provides an enhanced browsing experience. When a user selects “Browse this Issue” they are presented with a new newspaper browsing interface optimized for page-level newspaper content. This interface features a scrolling thumbnail section at the bottom of the screen that enables users to quickly skim through a newspaper issue, and a highly intuitive interface that allows a user to manipulate the page image:
The slide viewer at the bottom of the screen allows for a seamless browsing experience. Zooming, aspect ratio, rotating, and downloading is available from the menu. NOTE: pages downloaded from this view will be .jpg images and not .pdf files.
The last difference between article-level and page-level digitization is with the image itself. The new page-level images are all searchable PDF files, which means that when a user downloads the PDF file it can be manipulated using the local PDF interface. The downloaded PDF image can be searched again and again using the PDF Reader search function. The OCR text can also be copy/pasted into other documents: