Wednesday, October 21, 2015

Data Gathering and Accuracy Assessment

Goals and Objectives

The goal of this lab exercise was to become more familiar with the process of collecting and downloading data from a variety of different websites and organizations. We then needed to import the data into ArcGIS, join it and project it form different sources into a single, common coordinate system as well as design a geodatabase which can store the data. The main challenge of this lab was to keep all of the downloaded data properly organized and utilize coding methods in the program Python to accelerate the processing step. 

The output data of this lab will be used later in the overall project for this course which is to build a suitability and risk model for sand mining in the western region of the state of Wisconsin. Since our study will focus on Trempealeau County we subsetted the data we downloaded based on this county boundary. This way we can use the specific data just for our area of interest as well as minimize the amount of data we are storing to only what we require. 

Methods

The basic data flow model for this exercise can be seen below: 







The first step in this process was to obtain our data. The sources varied as can be seen below:

Once all the datasets were downloaded and unzipped into a working folder. Next, we determined which files were the rasters for the DEM, soils information, railroad vector feature class and land use/land cover data and separated these four sets of data into another folder. This data was then used in the next step which was processing the data altogether using Python coding. The coding process itself can be found here.

After all the data was finished being reprojected to the same coordinate system as the rest of the data in the Trempealeau County Geodatabase (NAD83 HARN WISCRS Trempealeau County Feet), was clipped the county boundary and loaded into the geodatabase we were able to access it. It was also crucial to limit the amount of storage space we were using to store unused data so we deleted all redundant data that we downloaded when unzipping the data from the above sources. After all the data was contained in a single geodatabase we were able to create various maps showing the USGS DEM, Land Use/Land Cover, Soil Drainage Index, Railroads and Cropland data for our area of interest, Trempealeau County (Fig. 1).

(Fig. 1) The following data was produced as a result of the Python script we created in order to process the data in a loop form rather than individual images.

Data Accuracy

It is important to determine the accuracy of the data which is used in projects, especially when data used in a single product comes from different sources. Searching through the metadata can help us to better understand where the data comes from, who has had access to it in the past, how frequently it is updated,  the scale etc. In the table below (Fig. 2) the data collected in this lab is organized in a table for each dataset. 


(Fig. 2) The data quality table above was created based on the individual metadata for each dataset. Some of the data was not available in the metadata and hence is represented by the "NA" label.
Conclusion

Being able to properly download and organize data from a variety of sources is critical in the field of geography no matter what specialty. It's also very important to understand the qualities of the data by looking at things like the attribute accuracy, lineage, temporal accuracy etc. The data which we processed using the Python script in this lab will prove to be very useful in our future work on sand mining in western Wisconsin over the course of the semester. 

1 comment:

  1. Thank you for providing the best information.I can sure this can help to update my knowledge.For more careers and latest GIS Jobs in Hyderabad.

    ReplyDelete