If you don't see on the left side a menu click than on the text: geodesie.woelmuis.nl

Digitizing for the presentation of data in vector-form in a scanned map on the base of automatic vectorization and pixels.

(Paper presented on the EGIS-conference in Brussels, Belgium, 2-5 April 1991 (Second European Conference on Geographical Information Systems)) by Ir. A.C. Visser

ABSTRACT

  During centuries information about the terrain is presented on maps. During the last two decades presentation is made possible on computer screens. A lot of maps are in analog form available now. For presentation of these maps on screens it is necessary to digitize them. This also means that databases are prepared with information about the terrain.
  Although scanning of maps is possible during the last twenty years the method becomes only nowadays profitable. After scanning, the map is represented in pixels. Presentation on the screen is possible, but for a lot of possibilities on using information, it is necessary to select interesting information and to convert the information to vector-based information. In that case information is reduced in amount but not in usability.
   A method is described to create a database that will be based on data derived by means of automatic vectorization of a scanned map. The original pixel-base will also be brought with that method to increase the value of the database.
  Automatic vectorization is used to increase accuracy and to reduce costs.

1. INTRODUCTION

  Scanning is used for various purposes. One of the applications is to record a map image into the computer's memory such that extensive use can be made of the created digital file.
  In this application scanning consists of translating a visible image on a map to a symmetrical grid of dots (dot-matrix) spaced 0.1 mm or less. Scanners which operate in a black and white mode only record whether the dots are black or white. Other scanners are also able to record the gray-scale or color of each dot. We limit ourselves to the application of black and white scanners because such scanners produce an adequate amount of information. Furthermore, care is taken to limit the amount of information to be recorded of which its relativity is shown in the following example:
  When a map, which measures 100 x 70 cm, is scanned with a resolution of 300 dots per inch (300 dpi), it is necessary to store the information of (100/2.54*300)*(70/2.54*300)= 11811*8268= 97,653,348 dots. When we take into account 256 colour gradations per dot, the size of the file reaches 94 Mbyte. When we use 16 colour gradations or gray-scales, which is sufficient for most applications, the size of the file still reaches 24 Mbyte. In the above-mentioned applications, only the information whether a dot must be regarded as black or white is recorded for each dot in which case the size of the file reaches 12,206,669 bytes (12 Mbyte). This size is adequate for further use. The size of a file can be reduced substantially by using compression techniques. The possible reduction factor depends on the algorithmic approach and density of information on the map.  With the COMPRESS routine from UNIX, black/white pixel files are reduced from 12 Mbyte to approximately 0.3 Mbyte and 0.9 Mbyte (UNIX is a trademark of AT&T). When such a reduced file is used for further processing it must first be decompressed again.
  A map consists of areas, lines, symbols and text. We depart from the idea that it is not only important to use the information in pixel form for further use, but that we are also able to eventually gain access to a line file which contains vector information, ASCII text and that it is able to reproduce symbols coupled to a set of coordinates plus its code. Also we like to be able to couple identifiers to the information of lines and areas.

Fig 1: Part of a topographic map
Figure 1. Segment of topographic map (scale 1:10000) which shows information unsuited for placement in a digital line file but nevertheless also covers valid information such as the toponym BENNEKOM, details on traffic signs, level differences and symbols for woods, trees, gardens and embankments. Also grid marks play a role in this!

  A distinction should be made between scanning of text and scanning for the purpose of obtaining information about the lines which are visible on maps. Particularly during the process of  extracting the dot-information of characters and numbers, pattern recognition plays an important role. Substantial progress has already been made to correctly recognize the characters of typed text from a scanned image. The results of trying to process the  handwritten text on maps are less favourable. Often this is caused by not being able to isolate the text from the rest of the image. An example of this is shown in figure 1. Furthermore, information in the form of values can be spread over the map such that a certain amount of know-how is required before it is possible to unravel the various numbers. Also the orientation of the text on maps is more irregular than with text which has been neatly typed. Another problem with recognizing handwritten text is that characters and numbers in different locations on the map are not always shaped the same.
  In these processes, scanning is used as a tool to extract the line information from the map image. In order to minimize the cost it would be great when a computer could produce a digital file from the scanned image which would answer to the preconceived requirements without the intervention of people. Expectations are that this will remain wishful thinking. People are able to determine instantly if for example a group of dots from the scanned image is caused by smudges on the map. People decide if a broken line on the map should really be a continuous line. People decide which lines are identifiers and not just lines, but nevertheless have to be placed in memory as lines associated with information. People are able to decide instantly that only the coordinates (start & finish) of a non-continuous line with its associated information should be placed in memory. It remains necessary therefore to employ a method whereby a meaningful combination of automation and manual operation is used.
  What would be required when an almost fully automated method is used? Will it be necessary to delete all identifiers, symbols and text on the map in order to obtain a proper representation of the line image? Proper results are likely when only lines are present on the map. In other instances all identifiers, symbols and text would have to be covered first such that lines are not affected. When this is accomplished by applying a brush to the map or scanned pixel image it proves to be a very extensive and therefore costly process.
  It is better to choose a procedure whereby the available maps are scanned 'as is'. Vague lines will not be 'corrected' and identifiers for culverts and dams will not be changed. In order to arrive at a product which is suitable for further use it is necessary to do a fair amount of tidying up depending on the type of application. This means that a certain amount of information must be extracted from the roughly scanned maps for later use.
  Next, three methods of how to deal with this problem are described. The fist method departs from the idea that digital editing takes place directly in the scanned image. The other two methods depart from the idea that after scanning a 'vector file' is generated with the aid of a vector programme. Techniques such as 'skeletonizing' of the pixel file is not addressed any further. Information about these techniques can be found in additional literature on the subject. In the method at subject, the 'vector file' as well as the 'original' pixel file is used for further use.

2. PROS AND CONS OF DIGITAL EDITING IN A SCANNED IMAGE

  The pros and cons of digital editing whereby the map is scanned first versus a method whereby digital editing takes place directly in an image are listed below.
a. Although the accuracy of the scanner can be less than that of a high quality digitizer it is possible that a file obtained with the aid of a scanner is of better quality than the file which is generated with a digitizer. During digital editing, the operator determines visually the midpoint of a line whereas the computer determines the midpoint automatically. Accidental differences can be avoided when an adequate number of points on a line are scanned which in turn also improves the resolution of the scanned image. Particularly, the combination of a pixel image and vector image can substantially improve the quality because in an instant the operator is able to make a better choice with respect to the location of a bend in areas of the map which are less suitable for using a vector programme.
b. Digital editing in a scanned image can be carried out much faster because it is no longer necessary to determine where the locations of line bends should be defined (at least not in situations whereby vectors were made earlier with the aid of a programme). This method is surely faster because of the fact that one is able to see instantly which parts were edited before (when a digitizer is used it is important to clearly indicate which parts were edited earlier). The operator therefore has an accurate and clear view of his work.
c. Depending on the purchase price and depreciation of the equipment, digital editing in a scanned image can be more costly because expensive equipment is needed. This is often offset by the speed of operation.
   There are several methods possible for digital editing in a scanned image. Preference is given to a procedure whereby digital editing is carried out in a scanned vector image. This method, depending on the original map and desired accuracy, can also be used in combination with the pixel file. However, it is also possible to digitally edit directly in the scanned pixel image.
  The disadvantages to digitally edit directly in the pixel image versus methods whereby digital editing takes place in a vector image, in possible combination with a pixel image, are:
- the amount of information to be placed in memory is usually far more (results in higher storage cost)
- reproduction of unprocessed information is not possible with a pen-plotter
- a substantial amount of computer memory must be available for digital editing in the scanned image (usually far more than by digital editing in a vector image)
- to define the correct point in the file usually requires more attention which means more time which in turn results in higher costs. When a vector image is used, the location of the line bend is already determined by the computer when a vector programme is used. The operator defines the approximate locations of the important line bends on the screen. The computer subsequently carries out a search in the already existing file for the associated values. Not the operator, but the computer determines the accuracy of the locations of the line bends in the permanent file. It needs to be investigated if the computer is able to determine the exact midpoint of a line or the precise junction of lines when digital editing is carried out in a pixel image.
Advantages are:
- no need to allocate money for a vector programme and no need to allocate money for the use of such a programme
- no loss of information takes place. When a vector programme is used the programme itself selects the locations of the line bends. When people define the locations of the line bends they are able to make a better selection (however, it is not necessary to record all the locations of the line bends in the permanent file!)
- an electrostatic plotter is able to instantly process the image. In the event of a vector image this type of plotter has to convert the vector image back again to a grid image (this, however, is less important when a different scale is used to plot the image)
- people are able to act early on in the procedure. By doing so they are able to avoid mistakes caused by the vector programme.
  For many applications, preference is given to the combination of a pixel image and vector image. The advantage of this combi-method is the high accuracy of the vector file combined with the possibility to - based on the visible pixels - instantly make another selection from the information in the permanent file. A disadvantage of this combi-method is that it is less easy to use a completely arbitrary scale. Additional disadvantages are that it is necessary to read-in large files which also means that more computer memory must be available.

3. DIGITAL EDITING IN A SCANNED IMAGE

3.1 Transformation to coordinates of a national coordinate system

  In the event of scanning, as is the case with digital editing, a map is randomly located in the scanner. The X-axis of both the scanner and the map may not necessarily be perfectly lined up. The X-axis and Y-axis on the map may also have different scales. A scanned file, as is the case with digital editing, must be transformed such that coordinates available from the file are close to those used in the national coordinate system.
  In order to make this transformation possible it is permitted to use the available grid marks on the map. When these grid marks are absent, barely visible or insufficiently available, other points which are simple to identify should be used to generate the file. The coordinates defined by the scanner and those recognized by the international system should be known for each reference point.
  It is less desirable to record the coordinates of most likely several thousands of points on the basis of only 2 or 3 reference points! Possible inaccuracies linked with these few points add to those of other points! It is better to use a procedure whereby more reference points are used. The transformation of the measured coordinates to those of the national coordinate system is subsequently carried out by the computer on the basis of more reference points whereby a better result is obtained to approach the correct values. When a procedure is used whereby digital editing is carried out in a pixel image and vector image combined, it is undesirable to transform the coordinates of the pixel image because it introduces an additional inaccuracy. For the time being it is better to continue with the coordinates defined by the scanner and to carry out the transformation at a time when only the processed vector file is used for further operations.
  A commendable procedure is whereby, in first instance, a reference point on the screen is roughly defined. Next, the image area in the vicinity of the point is enlarged and displayed on the screen. Further, the reference point in the image is defined more accurately after which the computer looks for the coordinates of the point defined by the scanner. When a measurement is carried out in a vector file, also the intersect point of the line pieces can be displayed on the screen with the object to check if the correct reference point had been found. Finally, the coordinates of this point should be entered in the national system.
  When only the vector file is used it is appropriate to carry out the transformation of the coordinates from the vector image as fast as possible and to continue with the transformed values. When the pixel image is used as well it will be necessary to work with the scanned coordinates until the pixel image is no longer needed. This also means that the reference points must be recorded in the 'vector file' such that they are easily recognizable.
  After the coordinates (obtained from the scanning) of the selected reference points are known and the coordinates in the national system are entered in the computer, the computer calculates the transformation formula and transforms all the coordinates from the vector file to those in the national system.

3.2 Additional preparations for digital editing in a scanned image

  After a map is scanned and vector file information is obtained, data is generated which enables us to display an image on a screen or a plotter to draw a map. This data, however, is insufficient to fill a digital file. Usually too many lines are recognized and at this point no structure is present in the file as well. A concerted action between people and machine would have to create a file which is capable of meeting the requirements. The method at subject consists of the following activities:
- scan a map image
- convert pixel image to vector image with the aid of a vector programme
- indicate on the screen which lines are important for further use
- assign code to lines which have been defined.
  The first two activities have already been covered. We now direct our attention to the other activities.
  These activities are necessary to provide the vector file from the scanning with a structure. The available vector file must be displayed on a screen before these activities can be carried out.
  Usually the first thing to do is to enlarge a part of the screen which allows for better viewing of what is important. It is also good practice to clearly show in the enlargement in which locations the vector programme defines the bends in lines.
  At this point the operator defines a point or line piece on the screen. It is very important that the computer is able to find the previously defined coordinates of a point or line piece in the available file in very little time. When the computer is incapable of doing this, the operator must wait too often for the computer!
  A fast procedure has been developed for this application whereby in first instance it is only important to find the correct coordinates of bends on the basis of the approximate values which were defined on the screen. The activity is carried out in two phases. During the first phase, which comes into play only once when we deal with a map, the coordinates of the points are sorted to their X-value and a new file is generated with these coordinates. (The second phase is described in 3.3.)
  Even though the presence of the operator is no longer important (situation 1991!) after this part of the programme has been started, it still pays to also use a fast procedure to do the sorting. This is also supported by the fact that often a large number of coordinates have to be sorted. In the event of simple sort routines the computer time increases with the square of the increase of the number of values to be sorted. When for example the increase of the number of points is 100, the computer time will be 10000 times longer.
  For many years now people everywhere are trying to find a fast sort routine. At present there are only two methods which deserve attention and were already developed years ago. The first method, developed by Shell (1959), does not require additional memory space at all to store interim results and is favoured by the writer because of the fact that the length of the actual sort routine of a programme, in which this procedure is incorporated, is minimum. The second method, which is known as Quicksort, was developed by Hoare (1962) and is slightly faster than the first. The computer time of the Shell method as well as the Hoare method increases with approx. the number of values to be sorted multiplied by the logarithm of this number. When the number of values in the above-mentioned example is increased 100 times, the computer time is only approx. 200 times longer.
  When the X-value of the coordinates are all the same it is appropriate to sort to the Y-value. After completion of the sort routine it is simple to store the multiple recorded values of a point only once. After all, in order to retrieve the correct coordinates of a point it is not necessary to know that a point appears in several lines.

3.3 Digital editing in a scanned map image

  Depending on the selected procedure, two or three files become available for further operations. The first file contains the vector file obtained through scanning while the second file contains the sorted points on the X-coordinate. This may also be a file which contains no coordinates but which holds information where a point in the above-mentioned vector file can be retrieved! When the image is displayed on the screen, both files are read-in. The first file is used for making the connections between these points visible while the second file is used for marking points and search routines. The third file contains the pixel file on the basis of scanning.
  After a point - of which the 'correct' coordinates may be found in the second file - is roughly defined on the screen, the computer will search this file for the correct coordinates of this point.
  When a sorted array of numbers is searched, use is made of a technique known as 'binary search'. This procedure is based on logic. First, the measured coordinate on the screen is corrected for a shift and a preset scale. A value, linked to the search tolerance, is deducted from the obtained X-value. Suppose the file contains data of 25000 points. The first step is to look for the X-value of the coordinate of the point with serial number 12500 (half of 25000). When this value is lower than that of the point with serial number 12500, we look at the value of the point with serial number 6250 (half of 12500). When this value is too low, we look at the value of the point with serial number 9375 (between 6250 and 12500). This procedure is repeated until we approach the correct value. It is obvious that only a small number of steps are necessary to arrive at this point.
  From this point on the X-value as well as the Y-value of a point is taken into account. The distance of the measured point and that of the point which has been found is calculated with the Pythagorean theorem. Next, the sorted array is searched in numeric sequence till the X-value of a known point is higher than the X-value of the indicated point on the screen plus a preset tolerance. The distance between the measured point and a point present in the file is repeatedly calculated. Ultimately, the point we are looking for is the point with the shortest distance to the indicated point.
  By making use of the mentioned preparations and described procedures, the required time to search for the correct coordinates is shortened such that the operator has to wait very little or not at all. The manner in which further digital editing on the screen will take place depends on the specific wishes with regard to the structure of the digital file.

4. ACTIVITIES

  In the process of specifying the activities, the assumption is made that a different workstation will be used for scanning, separate from the ones used for digital editing on the screen. This workstation could possibly be located else where when third parties are contracted to do the scanning part. Such a procedure makes a lot of sense because it takes little time to scan a map while further processing proves to be very time consuming. For each workstation coupled to a scanner there are many more workstations required for the other important activities.

The following activities are carried out for each map:
1.    scan routine (converts map image to pixels)
2.*   if so desired, store pixel image in compressed format on diskette (1991) for exchange purposes
3.**  generate vectors (by calculating lines from the obtained pixels)
4.**  generate additional file for exchanging DXB-file or DXF-file
5.**  read-in DXB-file or DXF-file on the workstation which is used for further processing and convert this file to a more compact format for further use in the procedure
6.*   if required, read-in a pixel image on workstation which is used for further processing and decompress the file
7.**  remove short detached, seemingly less important, line pieces with the computer from the vector file on the basis of a preset factor
8.**  remove 'overshoots' and 'undershoots' from line pieces which cross over in places where this is of importance depending on the obtained 'vector file'
9.    transform coordinates obtained from scanning to a large number of grid marks with a multi-point affine transformation (it is better to generate this step later on in the procedure when use is also made of the pixel image [after # 14])
10.** use the programme to generate a line file for longer lines on the basis of a preset factor
11.   indicate on screen which line pieces should not be stored and which line pieces should be stored in the new line file at this point
12.   add points in lines with the computer in places where they touch other lines
13.   separate lines with the computer in places where junctions are formed
14.   couple lines together with the computer in places where junctions are no actual junctions
15.   assign a code to line segments (vector) with an interactive procedure (for example: roads, water runs, forestry, constructions)
16.   carry out a survey of the obtained product
17.   check the result and solve problems
18.   combine the digital files of several maps and unify the edges with an interactive process
19.   convert files to a format such that the client is able to read-in data fast.
(For information:
*  important when use is made of the pixel file.
** important when use is made of a vector file generated with the aid of a vector programme.
A line piece (vector) is understood to be the straight portion of a line segment showing no bends.
A line segment is understood to be confined by junctions and usually consists of several line pieces.
Lines may also consist of several line segments.)

5. ACCURACY

  Already at an earlier date some results of an investigation were published in Dutch about the accuracy with which coordinates can be extracted from maps (see Visser, 1974). This investigation concentrated on the aspect of digitizing topographical maps (scale 1:10000) with the aid of a digitizer. The results showed that deviations of 2.9 m are possible when a scale of 1:10.000 is used.
  Investigations into the accuracy of digitizers can also be found in other literature. For practical use of these findings it is not only important to know the accuracy of a digitizer when maps with precisely placed points are digitally measured but it is absolutely necessary to know what the accuracy is of digital editing in practice. The same holds basically true for scanners. Maps are available for this purpose which over time have been put together with great precision. Some initiatives for such an investigation is given in the following.
  An approximation to the accuracy of scanned map images is given in the following. During the transformation of the scanned coordinates to those of the national coordinate system, use is made of a multi-point affine transformation. After calculation of the components of the transformation formula, the correct coordinates of the grid marks are recalculated on the basis of the scanned coordinates. Next, the differences are calculated. The results of some processed maps are shown in Table 1.

Table 1. Differences which remain after a multi-point transformation on a number of reference points. The values of the calculated deviations are expressed in cm.

                             number of          deviation
                             reference          in cm in
Type of map      map scale    points     X-direction  Y-direction
 
GBKN map          1: 500        20            10           12
Municipal map     1: 500        23            13           11
GBKN map          1:1000        20            26           11
Cadastral map     1:1000        21            22           16
Cadastral map     1:2000        18            44           18

  Additional information can be derived from the scanning of a practical situation whereby the coordinates in the national coordinate system of the points are known as well. The Land Registry Office supplied most of the reference values. The City of Wageningen supplied these values for the municipal maps and are based on actual field measurements. The comparison of the supplied values with the values which have been derived from the scanning of the affected maps is given in table 2. The values in column 'Difference' are ascertained from the deviations in the X-direction and Y-direction by taking the square root of the sum of the X-differences squared, plus the sum of the Y-differences squared, divided by the number of values, minus 4.

Table 2. Comparison of the coordinates from a base file with coordinates resulting from scanning. The values of the calculated deviations are expressed in cm.

                             number           deviation
                               of             in cm in
Type of map     map scale    points    X-direction  Y-direction  Difference
 
Municipal map     1: 500        22          10           13          12
GBKN map          1:1000      1061          20           12          17
Cadastral map     1:1000      1518          26           23          25
Cadastral map     1:2000       445          42           24          34

6. PROGRAMME DESCRIPTION

  A programme has been developed for digital editing in scanned images whereby, in just a few steps, files can be generated which are suitable for further operations. This programme includes three different methods for digital editing in scanned images and the operator is also guided through the complete operating procedure in well explained steps.

List of literature

Hoare, C.A.R., 1962. Quicksort. Computer Journal 5
Shell, D.L., 1959. A High Speed Sorting Procedure. Communications of the ACM 2.7
Visser, A.C., 1974. Nauwkeurigheid van oppervlakteberekening uit gedigitaliseerde kaarten.
Geodesia 16.5


Beste Webhosting