We shall also briefly discuss word level quality of the content and show a real research scenario for the data. In this paper we will describe how the export format was created, how other parties have packaged the same data and what the benefits are of the current approach. We consider these formats most useful to be provided as raw data for the researchers. For this purpose we introduced a new format, which contains three different information sets: the full metadata of a publication page, the actual page content as ALTO XML, and the raw text content. National Library of Finland has gotten several requests to provide the content of the digital collections as one offline bundle, where all the needed content is included. A recent user study noticed that a different type of researcher use is one of the key uses of the collection. The material up to 1910 can be viewed in the public web service, where as anything later is available at the six legal deposit libraries in Finland. The material ranges from the early Finnish newspapers from 1771 until the present day. Digital collections of the National Library of Finland (NLF) contain over 10 million pages of historical newspapers, journals and some technical ephemera.
0 Comments
Leave a Reply. |