GIS II: Geog 337: Data Normalization, Geocoding and Error Assessment

Goals and Objectives

The goal of this lab exercise was to geocode the locations of all the sand mines in western Wisconsin and compare our individual results with others in the class as well as the actual mine location point data. The raw data we received was from the Wisconsin DNR and was not properly normalized and therefore could not be directly geocoded using ArcGIS software. In order to use properly use this data we needed to navigate the excel spreadsheet of vague data, normalize it, geocode it and compare the resulting data. A more detailed view of this procedure can be seen below (Fig. 1):

(Fig. 1) This diagram gives a detailed description of the objectives for this lab exercise.

Methods

The first step in processing the mine data we received directly from the DNR was to normalize it. The original excel datasheet was very disorganized, messy and did not have consistent information. To normalize the data we separated the elements that made up the location field into individual fields including: address (or PLSS), city, state, zip code and the unique mine ID number. While some of the mines had complete addresses others had none and only listed the PLSS (Public Land Survey System) which is Wisconsin's land description system.

After the table was normalized it was brought in to ArcMap and once we logged into ArcGIS Online we enabled the geocoding tool. Since we connected to the ArcGIS server we were able to use the ESRI address database which located most of the normalized address information we had in our spreadsheet. Initially 18 of my addresses of the 21 total were matched leaving only 3 unmatched. If the program was not able to find the mine based on the address we were able to create a new point to match the attributes. We were also able to adjust the locations of the mines in this interface as well. Some of the unmatched points were due to the use of the PLSS information in place of the normal address fields. Trying to determine where the mines are based on the very vague locations given by the PLSS information was difficult though. We used other resources such as Google Earth to locate the mines and narrow down the search in ArcMap.

Once all of my geocoding was completed I compared my results to two of my fellow students who were tasked with coding the same mines. To do this I first added their shape file with the mine locations they geocoded and opened the attribute table. There I selected the mines based on the unique mine IDs which were the same as mine and created a new feature layer out of them. I also reprojected the two datasets from my peers to a projected coordinate system. This was done in order to convert the units of the data from degrees to meters so when we calculated the distance between our points it would be reported in meters. In addition to my peer's data I compared my geocoding results to the actual mine locations as measured by the Wisconsin DNR as points. The result of tall these datasets can be seen below in Fig. 2.

(Fig. 2) The yellow data points show the mine locations based on my geolocation and the red represent the actual mine locations. The data created by my peers are shown as blue and purple points.

The next step was to compare the distance between my data points and the actual mine locations in order to determine how accurate my geocoding was. To do this I used the point distance tool. I did the same with the data from both of my peers as well. (Originally we were meant to compare our data with a total of 4 other students however 2 of the students who shared the same mine points as me submitted their results.) The results of this distance tool can be seen in the results section of this report.

Results

Based on my results my geocoded mine locations were "off" compared to the actual sites. The output table showing the distances between my mine data points and the real-world locations can be seen in Fig. 3 and are visually represented in map form in Fig. 4.

(Fig. 3) This table shows the distance between my geocoded mine sites and the actual location of the mines in meters.

(Fig. 4) This map shows my geocoded mine locations in orange while the actual mine sites are shown in green. Overall this image is showing in a graphically format the accuracy of my geocoding.

Since each individual in the class was given a different list of unique mines to geocode for this assignment but there was a number of individuals who had overlapping sites. This was intentionally done to ensure that the mine locations could be compared. Fig. 5 and Fig. 6 show the distance between my geocoded mine sites and my two peers.

(Fig. 5) This table shows the distance between my geocoded mine sites and the geocoded mines of one of my peers in meters. On average we had a distance of 1,900 meters of distance between our points.

(Fig. 6) This table shows the distance between my geocoded mine sites and the geocoded mines of another one of my peers in meters. On average, we had a distance of about 3,500 meters of distance between our points.

Discussion

Overall the accuracy of my geocoded results compared to the actual mine locations was on average, 2,600 meters apart compared to my results with my peers being about 2,700 meters apart. These errors could have been caused by a number of factors. One could have been a feature coding error because the geocoding tool uses the inputted address to estimate an actual location. In some cases there were multiple mine sites in an area and it was difficult to pinpoint the exact location of the main mine location.

Another error which could have occurred is the field survey measurement error. Since there was a distinct difference between where the survey crew who created the actual mine location dataset recorded the point to represent the mine and where I recorded the mine during the geocoding process. For example, some of the points in the correct location dataset were taken in the middle of the mine site whereas all of my geocoded locations were based on the main entrance/exit from the mine to the nearest roadway.

Attribute data input could have been another factor which influenced the accuracy of my geocoded data. There might have been some error in the process of normalizing the data which was used to geocode the mine locations. Since we were primarily verifying the mine locations based simply on aerial imagery it is very possible that the wrong mine could have been located based on some incorrect attribute data being entered in the normalization process.

Conclusion

Despite the complicated procedure, this lab taught us just how important geocoding is to spatial analysis. In addition to this it became clear the role which data normalization plays when trying to create a consistent dataset that can be used to create an accurate geographic representation of the data in the form of a feature class. One factor which could have influenced the overall accuracy of my mine locations compared to the real-world sites is based on where the DNR took the data points in the field. In summary, geocoding is very useful but is not perfect and therefore the results of this method must be used as relative locations.

GIS II: Geog 337

Pages

Wednesday, October 28, 2015

Data Normalization, Geocoding and Error Assessment

No comments:

Post a Comment