What is a GIS?
GIS (= geographical information system) is a computer-based information system which enables input and update, management and modeling, analysis and simulation & presentation and output of geodata.
What is geodata?
Geodata is data with reference to the earth. It simplifies aspects of the real world by creating a model which is associated with semantical data (f.e. attributes).
What are the three levels of modeling spacial data?
Metric/Geometry: description of structures as models of the real world; either vector data or tesselation (rasta data or TIN = triangular irregulated networks which are used for elevation modeling); shows position of said object -> where is the object?
Topology: locates the neighborhood (e.g. where is the nearest point?) -> relation to other object?
Semantics: description and attributes of said object; transforms data into information -> What IS the object? Whose house it it? Which colour is its roof?
What are the 4-component-models of geographical information systems?
functional components (IMAP)
Input and Update: Data acquisition either primary (directly from earth or images, e.g. geodesy or remote sensing, working with laserscanning etc.) or secondary (digitizing existing maps, either manually, semi-automatically or automatically); and then reading data from files (shp, xml, text files, georeferenced images)
Management and modeling: Defining a data model that fits the data either in GIS (with layers) or in DBMS (with tables) and managing it (store, change, retrieve, delete)
Analysis and Simluation: tranforming or combining data
Presentation and Output: creating maps and publishing them or the data itself
structural components (HSDU):
Hardware: PC, storage,… has the shortest life span (3-5y)
Software: programs and tools, either free (QGIS) or proprietary (ArcGIS, Bentley Map), life span of 7-15y)
Data: extract of the real world, longest life span (25-75y!), therefore very valuable
Users
How is GIS data organized? Why do we have vector data and continuous data?
GIS data consists of geometry (description of structures as models of the real world) and semantics (description and attributes of said object; transforms data into information). The geometry can be either a tessellation (continuous data) or vector data (discrete data). For tessellation, there is rasta data and TINs (triangulated irregular networks, which interpolate between known data points).
Continuous data can be useful for describing something like air pressure or precipitation; has just one attribute, but many values.
Vector data is good for things that have a defined structure, for example buildings, streets, etc. -> You can put all those on top of another, which then creates a pretty good image of the real world.
Vector data can and has to be analyzed differently than rasta data ror TINS. For example, we had a practice where we determined which house has to be evacuated (on the bases of how many residents live there, is it a residential building or not, how far is it from the danger source etc.) -> we could not do that with rasta data, because rasta data has just one attribute (i.e. elevation). On the other hand, with rasta data we can make flow direction and flow accumulation analysis.
What is a layer?
A layer is a container for displaying data, which is represented in points, lines, polygons or surfaces (raster data). A layer can contain only one type of geometry (e.g. rasta data, polygons, points…) so usually contend-related objects or features are grouped in layers (e.g. rivers as lines, forests as polygons). Layers can be overlaid with each other to extract needed data and to create new layers. Usually, at the bottom there is the basemap. However, to work with several layers they need to have a joint spacial reference system.
Which layer types exist?
Feature layer (vector layer): represents objects as vectors. Is usually stored in geodatabases.
Raster layers: Visualized as raster dataset or mosaic layer (for large collections of data). Data sources are raster data and images. The display depends e.g. on raster band count
Scene layers: Contains 3D aspects and is cached to optimize the display of those
Service layers: reference features (WFS!), tiles, vector tiles and OGC (Open Geospatial Consortium) services; e.g. map service layers (WMS). Service layers are web-released.
Other layer types are Query layers (saved SQL commands), Selection layers, subtype layers, Voxel layers and Graphics layers.
Which types of geodatabases exist and which ones are we going to use?
Geodatabases are collections of geographical datasets which store data in relations (tables) which consist of rows (datasets) and columns (attrubites).
Types of Geodatabases:
personal: single user editing, a size limit of 2 GB -> NOT in ArcGIS Pro! We don’t use this one.
file GDB: works cross platform, size limit of 1 TB per table
enterprise GDB: stored in an enterprise DBMS (e.g. PostgreSQL, Oracle, MS SQL server, IBM DB2), works for extremely large datasets and supports multi-user-editing
We will work with file GDB and enterprise GDBs.
What does the term “Thematic Modelling” mean?
Thematic modelling identifies entitys with the same properties and structures the world as such, e.g. in vegetation, power networks, road networks etc.
What are projections and why would you rather use them istead of conventional coordinates?
Projections are projected coordinates. It’s a mathematical method to transfer a curved survace onto a two-dimensional map.
The problem with conventional (geographical) coordinates is that with them it’s difficult to calculate distances and aeras, which is why we use UTM or Gauss-Krüger-Systems for maps. (But (!) projections are based on geographical coordinates??).
One can project in different ways, wither by using a zylinder (mercator systems -> UTM and G-K), a cone or a planar projection (e.g. for displaying the poles).
Which steps are needed for a projection?
choose a model for the three-dimensional earth surface (e.g. zylinder, cone, plane); cone: conical projection with poles in the center; cylinder: Mercator procection (-> rectangle); plane: orthographic projection also with pole at the center (but just one of them)
transform the geographics coordinates onto a cartesian coordinate system:
using a cylinder for projection, where the cylinder touches the modelled earth surface (WGS84) is the true meridian (there, the coordinates are the most accurate too! The distortion of scale increases to the outer zone boundaries) -> Gauss-Krüger (3° zones); UTM works a little different, but here we also use a cylinder for projection. For UTM, we have an error of 40cm/km (6° zones).
-> information on the different reference systems: EPSG codes
What are examples of both procejcted and conventional coordinate systems?
conventional coordinate system:
Global geodetic coordinates (WGS 84) -> GPS (uses mathematically defined surface of the earth)
projected coordinate system:
Gauss-Krüger (Transverse-Mercator-Projection) -> often stil, used in germany
UTM (Universal Transverse-Mercator-Projection) -> used internationally
What is a basemap?
A basemap is the map that’s already there when one opens a new ArcGIS-project. It’s useful for checking if one is in the right location. With the button “Basemaps”, several different basemaps (i.e. topography, satellite etc.) can be selected as basemap for the project.
How can you symbolize a layer in ArcGIS?
Symbolizing a layer in ArcGIS can be done with right-clicking the layer and the clicking “Symbology”.
I.e. if we have a layer with different polygons that represent different types of landscape in an attribute called “Landuse”, we can choose “Unique Values” and thus display the data based on that field so it’s easier to see what we are looking at.
How can you merge a layer and why would you do that?
Merging a layer works by clicking the “Geoprocessing” Toolbox and choosing the “Merge” Tool. There, one can choose all the input data and where to save the output file.
Merging is useful if you have a lot of feature layers that display similar kinds of data, e.g. rainforest and boreal forest, which could bne displayed as one layer named “forest”.
Does merging a layer change the input data?
No.
What is the diffrence between a map and a picture?
A map has spacial info like a grid, a scale etc.
What is a map?
A map is a two-dimensional, simplified, scaled, geometrically accurate representation of 3D space. Unlike a picture, a map has spacial info like a grid, a scale etc.
What does a map consist of?
spacial information
map field/area
base elements (points, lines, polygons)
signatures
legend
author/additional info
What do we have to do to make a map avaiable on GIS?
scan map
create a new project and georeference the map
create a feature class and digitize features
How many control points are needed for georeferencing?
as many as possible, but at least 3.
What is interpolation and why do we need it?
If we want to measure e.g. the precipitation of an area, we can measure the precipitation of several points in the area. But how do we know how the precipitation is in between those points? That’s where interpolation comes in handy.
Interpolation estimates values of unknown locations via known data, so it’s used to fill in the gaps of a dataset.
We need it to get from point information to area information.
What are some spacial interpolation methods?
Spacial Interpolation methods are:
geometric-algorithmic: Voronoi diagrams (Thiessen polygons / Dirichlet decomposition), irregular triangulation
deterministic: Natural Neighbor (uses Voronoi diagrams), Inverse Distance Weighting (IDW)
statistical: Kriging
How does the Inverse Distance Weighting Method work?
Inverse Distance Weighting (IDW) is a local interpolation method.
It calculates a weighted average of the desired location while weighing closer data points higher. The method assumes that the closer two points are, the more similar the information must be. Thus, the closer the point, the more weight they receive.
While doing so, the Power Parameter determines how much more weight the closer points get - a power parameter of 10 means that closer points have a much stronger influence on the area to be interpolated while a lower power paramter means the weight is more equally distributed.
To estimate how good our interpolation is and how high the power paramter should be, one can use cross validation which compares true vs. estimated value.
How does the Kriging method work?
Kriging is a statistical method invented in 1951. It models the spacial dependency of the data to make assumptions on the unknown locations. It takes not only the distance to account (like IDW does) but assumes that every variable is correlated to some extent.
In the first step, the method checks how the point values change depending on the distance between them using pairs of values. Then, that information is sued to create a semivariogram to describe how dependent the values are from each other. Based on that semivariogram, unknown location data can be estimated.
[A semivariogram is a function describing the average squared difference between pairs of values at different distances]
What is the power factor and how does it work?
The power parameter is used in interpolation methods, e.g. in the IDW method.
The Power Parameter determines how much more weight the closer points get - a power parameter of 10 means that closer points have a much stronger influence on the interpolated area while a lower power paramter means the weight is more equally distributed.
P = 1
P = 10
P = 20
What is the difference between Kriging and IDW?
IDW is a deterministic method, Kriging a statistical method.
IDW calculates a weighted average of the desired location while weighing closer data points higher. The method assumes that the closer two points are, the more similar the information must be. Thus, the closer the point, the more weight they receive.
Kriging models the spacial dependency of the data to make assumptions on the unknown locations. It takes not only the distance to account (like IDW does) but assumes that every variable is correlated to some extent. So Kriging goes further than IDW.
What is the main principle of cross validation?
Cross validation from the geostatistical wizard (in the ribbon: analysis) can be used to determine how good our interpolation is. It compares the interpolated value with the measured value at this location. That difference forms a local error.
What we get from cross validation is the calibration line (which has a slope of 1) and the line of our interpolation (which ideally also has a slope of 1, but obv. that’s not always the case).
What is buffer analysis?
A buffer creates a polygon around the input object (can be lines, points, polygons) in an extra layer. The size of that polygon is determined by the distance the ArcGIS-User puts in the toolbox field.
Which operations belong to the overlay tool? Sketch examples.
Identity: Includes the area of the input polygons and the areas of the overlay polygons that lie within them. Attribute values are taken from input and overlay polygons
Erase: Includes the part of the input polygons that is not in the area of the overlay polygons. Attribute values are taken over from the input polygon
Union: Includes the union of input and overlay polygons. Attribute values are taken over from the input polygon
Intersect: Includes parts of the input and overlay polygons that exist in both layers.
What happens if one joins a table to a shapefile and then afterwards removes the shapefile and puts it back in the project?
The join is gone.
There is one layer named “trees”. In the attributes, you can see that there is a field named “kind”, where the info is stored wether the tree is an olive tree, willow, pine tree etc.
You know that the tree with the ID 4 has the bark beetle. You have to save the trees around it; but only pine trees are inhabited by bark beetles. They travel distances of around 600 m.
Explain which analysis has to be performed to determine how many trees could be affected.
First, the tree with the ID 4 has to be found. So, in the attribute table, with “Selection by attributes”, the object “ID” = 4 can be found. After selecting and exporting it (right click on the layer -> data -> export features) the buffer analysis can be performed.
Because the bark beetle travels 600m far, the buffer distance needs to be those 600 meters.
After creating the buffer layer, it can be intersected with our tree layer from the beginning. In the intersect tool, the buffer layer needs to be selected first.
The intersected layer contains all tree objects within that 600m distance.
Now, within the attribute table, right click on the “kind” field and select “summarize”. In that menu, select “kind” as case and statistic field and choose “count” as statistic method.
The outcome should be a table which shows how many trees of each kind exist. In our intersect layer, again with “Selection by attributes”, all pine trees can be selected and shown on the map. If wanted, we can export those into another layer.
What is tesselation?
Tesselation is the covering of the surface leaving no gaps and/or overlaps using tiles (cells) which store the data.
Can be archieved with interpolating data, i.e. precipitaion or air pressure.
How are rasters structured?
Rasters consist of cells with a specific value, i.e. air pressure. The data is represented by a matrix with rows and columns.
Rasters are georeferenced and they have a coordinate system. Other properties include the cell size (resolution) and the count of rows and columns.
What is the difference between discrete and continuous data?
Discrete data are single entities or objects. Typically, they can be displayed as vector data. Examples are: steets, cars, trees, buildings…
Continuous data on the other hand is data that has no boundaries. It’s displayed as a field and at every point, data exists - i.e. air pressure. On every point on earth there is a specific air pressure.
Other examples are: The earth surface/relief, properties of soil
What’s the difference between Flowdirection and Flow Accumulation and which data can be analyzed with those methods?
Flowdirection and Flow Accumulation are methos for analyzing raster data. They have different algoriths and can be found under Spacial Analyst Tools -> Hydrology.
Flowdirection estimates in which neighbouring cell a cell drains in case of an overflow, i.e. heavy rainfall. there are two different algorithms for this: D8 and D∞.
Flow accumulation on the other hand estimates how many neighbouring cells drain into the cell in question. It takes a flowdirection raster as input. This method is useful for detecting river systems.
What is a snapping tool?
The snapping tool enables us to draw/digitize vector data directly at an already existing vector feature. Example: We are digitizing streets and want to end a street (line) exactly where the other street begins. Snapping enables us to do so, as the line/our cursor snaps there.
How does the (general) layout of ArcGIS look?
In the middle, there is the map. On the right side, I usually have the geoprocessing toolbox enabled (you can also find that toolbar in the ribbons at the top) - also useful: The catalog pane (we usually had that in practice lessons. There, you can see connected folders and our database for that project). At the top, there are ribbons where you can find basically everything you want to do, so we have the geoprocessing toolbox, data can be inserted there, we can georeference, set a specific basemap, create a new map layout etc.
On the left side, there is the catalog where we can see our layers. If we right-click on the layers, we can change the symbolization of those or see the attribute table.
What is the model builder?
A model builder is used for repetitive tasks to automate and document spacial analysis and data management processes. a Model in ModelBuilder is represented as a diagram that chains together sequences of processes and geoprocessing tools. Those sequences are connected with a “Connector”, which connects data and values to tools. There are four types of connectors:
data
environment
precondition
feedback
Here are the different elements visualized:
Last changed10 months ago