undefined

by Julia S.

How do data, database systems, database management systems and databases connect? What are the definitions for each of these terms?

Database System = Database Management System (DBMS) + Databases

Databases: either relational, object-oriented or object-relational; collection of data that’s logically related to each other; can store, manipulate & retrieve data

DBMS: software (PostgreSQL, MySQL, Oracle Database, SQLite) that allows to create, define & manipulate DBs, allows users for data to be stored persistently (is not unintentionally deleted!), updated, analyzed and queried (by SQL); controlls data to check consistency (definitions of data types, update rules… -> integrity constraints), can organize DBs, access to DBs is only possible through DBMS; has a well-defined stable user interface which is long-lasting and only extended, not changed; allows data independence through abstraction

Data: collection of discrete values

The client accesses the databases through a DBMS. Examples for clients: SQL shell, pgAdmin, python

How can a transaction end?

List the Properties of a Relation.

Give an explanation for following terms:

relation
tuple
attribute
Cardinality
Degree
Primary Key
Foreign Key
Domain

Why are geodatabases useful?

Every 2 years, amount of digital data is doubled -> large datasets that need to be organized somehow
storing data in files is not future-proof (for example due to discontinued file formats)
data processing can be performed directly with the database
is efficient for storing large amounts of data -> less redundancy, which leads to less errors and inconsistencies

Explain the concept of data models. Which data models exist?

Data models are a collection of concepts for the desription of an abstract version of the real world. Concepts can be objects (vector data) or continuous (raster data). Which one to use is application specific. They are modelled using UML (unified modelling language) which can create models and convert them to database schemas.

There are:

relational models (tables, what we learn about): simple structure, redundancy is avoided by splitting data across several tables; columns and rows, each column is assigned to one data type; fundamental operations: projection onto certain columns, selection of certain rows, joining data from different tables
object-oriented models: more complex than relational model, uses concepts of object-oriented programming (classes i.e.)
objet-relational models (what we use): an extension of the relational model in which user-defined types are possible; is necessary for spatial information

What is necessary to store geographical data in a database? Give examples of some geographical data forms.

A software extension of an object-relational DBMS which supports spatial data models. Examples are Oracle Spatial and PostGIS.

Geographical data forms can be

point, line and polygon objects
raster data
3D point clouds
networks

Shortly explain SQL and its sublanguages.

SQL is a popular query language found in all relational DBMS. It’s a declarative language which is divided into several sublanguages:

Data Definition Language DDL for creating and modifying tables (CREATE, ALTER, DROP)
Data Manipulation Language DML for inserting, updating and querying data (SELECT, INSERT; DELETE, UPDATE)
Data Control Language DCL for defining access privileges of users (GRANT, REVOKE)
Transdaction Control Language TCL for managing changes by DML (COMMIT, ROLLBACK)
…

What is the ACID paradigm?

The ACID paradigm enforces rules for transactions (groups several data base operations to one unit)

Atomicity: transaction is a non-seperable unit; either all changes are made or none

Consistency: transaction leaves a consistent state after it ends or is undone completely

Isolation: concurrent transactions do not affect each other, every transaction is executed as if it’s the only one running

Durability: effect of transaction stays permanently in the database, even after system failure

Explain the differences between an object and a class.

An object holds certain properties/attributes and reacts with a defined behavior (either a change of an objects state or an interaction with the environment -> other objects) to its surrounding; a class is like a template for objects and typically is an aggregation of similar but distinguishable objects.

A class holds class attributes which the objects can inherit.

Going from objects to superclasses generalizes concepts: A superclass represents a concept, a subclass adopts and/or overrides methods of the superclass and defines new, additional attributes and methods. The objects are then instances of the subclasses.

Explain the differences between Aggregation and Generalization.

Aggregation is the concept that a class is an aggregation of several objects, i.e. a car and its parts; Generalization relates not objects but classes to each other. A superclass would be the Generalization of a more specialized (sub-)class.

Example:

Aggregation: A car consists of several parts.

Generalization: A rectangle is a geometrical figure.

Why is Encapsulation useful?

Encapsulation reduces the visibility of attributes outside of the class and therefore protects the data so that less mistakes happen when data is changed. The attributes can be read and/or accessed by methods, i.e. set bachelor degree ().

What is the difference between Aggregation and Composition?

Aggregation and Composition are both special types of association between objects.

In an Aggregation, an objects is an aggregation of several other objects, i.e. a car and its parts.

In a Composition, which is a strict form of aggregation, the existence of the parts depend on the compositional object. If the compositional object is deleted, the parts are deleted as well. This helps with data integrity.

Which steps need to be done to translate a UML schema to a relational schema?

Each class which doesn’t take part in the inheritance hierarchy is projected onto a table (class name = table name, class attributes = table attributes; sometimes, data types have to be adapted to SQL data types!)
Each association, besides inheritance, is projected onto a table (name of association or new name = table name, key attributes of the involved classes = attributes)

Refinement: Some tables can be combined (for example some tables which originate from associations -> for 1:n relationships (key of one class can be added to the other class as foreign key)! For n_m relationships the association table should stay)
1. other example: 1:1 relationships could be combined
For inheritance relationships: only the classes are translated into tables but not the inheritance relationships (for object-realtional schemas, tables can inherit attributes from parent tables); using keyword INHERITS
1. either one table per class in the hierarchy: key attributes are the root in the hierarchy -> foreign key of subclasses

2. a single table for the entire hierarchy: all class attributes that appear in hierarchy are attributes, not corresponding attributes are set to 0

What keys can exist in relational databases? sHortly explain the concepts.

super key: attribute(s) that can uniquely identify all rows in a table
composite key: a key that contains more than one attribute, the largest composite key would be a key involving all the comulns in the table
candidate key: a key that contains the least possible attributes to uniquely idetnify any table row (for example a unique id!); a table may have more than 1 candidate key
primary key: a key that uniquely identifies each row, is chosen from sets of all canditate keys, can be a composite key or only one; has to be unique and cannot have null values; used to enforce the entity integrity of the table (make each row unique & accessible), helps to set up relationships between tables
- examples: Student ID, Social security number, Vehicle Identification number, automatically generated numbner (auto increment, i.e. BIGSERIAL)
foreign key: links to the primary key of another table -> established relationship between two tables, helps with cascading updates and deletes (important to mapping), can improve query performance by using joins; if unique: 1:1-relation, if 1:n or n:m the foreign key is NOT unique (but for n:m relationships, a junction table is necessary)

Which types of joins exist? Shortly explain them.

In Postgres, INNER JOIN is the default join, therefore, if we type JOIN, an inner join will be performed

LEFT JOIN = LEFT OUTER JOIN

RIGHT JOIN = RIGHT OUTER JOIN

What are recursive associations?

A recursive association connects a single class type to itself.

Example: books who reference each other

Explain point set models with the following words: interior, boundary, closure and exterior.

interior: All points in the neighborhood of a point are completely in the point set
boundary: The neighboring points are either inside or outside of the point set
closure: The set of points which are in the interior and the boundary.
exterior: All points that are not part of the closure

List topological relations and explain them.

disjoint: overlay of two geometries is empty

touches: Boundaries intersect, interiors don’t.

equals: identical geometries, including dimension, geometr type, number of support points, x- and y-coordinates

inside/contains: one geometry is completely inside another geometry

covers: X covers Y if Y is inside X, but touches the boundary of X

overlaps

Explain the concept of the 9-intersection model

The 9-intersection-model specifies many topological relationships between two geometries using intersections between the boundary, interior and exterior of two point sets.

For this, the two geometries X and Y need to be spatial objects in a 2D-plane.

The result is a 3x3 matrix which has the values empty/not empty.

What topological relations are these 9-Intersection models about?

What is the DE 9-IM?

The DE 9-IM is the 9-Intersection model with a dimension extension (dimension extended 9-Intersection-model).

The dimension is defined as follows:

0D = Point(s)

1D = Line(s)

2D = Polygon(s)

Another version is to visualize

not empty = T
empty = F
not relevant = *

Example:

Model the topological relations from the examples below with help of the DE 9-IM.

The following DE-9IM matrices represent which topological relation?

Explain in own words what these topological relations mean:

Why is Database Normalization useful? Why would you not do it?

Database Normalization avoids redundant data which would lead to anomalies caused by insertion, updating and deletion.

It also leads to a consistent database design which safeguards the tables against inconsistencies.

For example, if information is stored more than once in the same table, updating the information may lead to data inconsistencies as only one tuple is updated.

Inserting incomplete tuples leads to insertion anomalies.

And deleting information by dropping a row may also delete related, but important information that should be kept.

However, sometimes there are reasons for not normalizing tables in a database.

Explain the Fist Normal Form (1NF).

A relation is in the First Normal Form if all attributes have atomic values (one word/character/date etc., not several ones).

Example: A table with books and its author. One book was written by several authors, so in the corresponding tuple is a list in the author attribute instead of just one name.

So what instead?

several columns Author1, Author3, etc. -> multiple columns with the same information should be avoided, plus what if a book has more than 3 authors?
put the same tuple in twice, but with different authors -Y high redundancy through repitition
split the tables! For example one table with title, place, year and the other with title and author.

What does Functional Dependency mean? What does Transitive Dependency mean?

X functionally determines Y if each X value in the table is associated with exactly one Y value.

the table then satisfies the functional dependency X -> Y.

There is trivial functional dependency and non-trivial functional dependency.

For trivial dependency, Y is a subset of X. For non-trivial functional dependency, this is not the case.

The Functional Depedndency is important for the 2NF.

Transitive dependency: Imagine a relation R with attributes A, B and C. C is transitively dependent on A if:

A -> B but NOT B -> A
B -> C

Transitive Dependency is important for the 3NF.

In this table, list the functional dependencies.

What kinds of keys do exist? Explain the concepts.

Superkey -> set of attributes which uniquely identify a tuple
Candidate key -> minimum set of superkeys (prime attributes), a table can have more than one
Artificial key -> if no candidate key exists & duplicate tuples could occur, an atrifivial key is introduced which is used as primary key (like a BIGSERIAL number)
Primary key -> uniquely identifies a tuple in a table
Foreign key -> uniquely relates a tuple to another tuple in another table
Alternate key -> must be a candidate key, is to restrict a column to store only unique entries

Explain the Second Normal Form (2NF).

A relation is in the second normal form if it’s in the first normal form and additionally, every non-prime attribute is functionally dependent on the whole canditate key of the relation (and not just on a part of the candidate key).

If not in normal form, what to do?

split relations

Explain the Third Normal Form (3NF).

A relation is in third normal form if

the relation is in 2nd normal form and
no non-prime-attribute is transitively dependent on the primary key: each non-key attribute in a table should depend on the key, the whole key, and nothing but the key

To fix this, place attributes with transitive dependency in a new relation to obtain 3NF.

What is the Boyce-Codd Normal Form? (BCNF)

The BCNF is slightly stronger than the 3NF, and only in rare cases does a relation in 3NF not meet the requirements of the BCNF.

A relation is in BCNF if for every one of its functional dependencies X -> Y, at least one of the following holds:

X -> Y is a trivial functional dependency (Y is a subset of X)
X is a superkey for the relation

Is this relation in any normal form?

If not, tranform it to the highest form.

The relation is in 3NF, but not in BCNF because of the dependency Year-Month → Release Year and the attribute Year-Month is not a candidate key by itself.

Here’s a solution that eliminates the functional dependency:

What is the normal form of this relation?

What is the Simple Feature Geometry Model? Draw it.

The simple feature (geometry) model is a model for representing discrete objects in vector format as a hierarchy of geometry classes for 1-,2- and 3-dimensional objects.

The feature in question is an abstraction of a phenomenon in the real world (cities, buildings, …).

Object types are: Points, Curves (LineString, LinearRing, Line (LineString with 2 Points)), Surfaces (Polygons), Geometry Collections (collection of different or the same type of geometries).

It was developed by the Open Geospatial Consortium (OGC).

List 2 standards of spatial and geographical data.

ISO SQL/MM Spatial, which also includes curved segments
Simple Feature Geometry Model

What are examples of a Geographic/Spatial DBMS?

Oracle SPatial
PostGIS

What are Features of Geodatabases and why are they different from normal Databases?

supports 2D and 3D data (lines, surface, collection, points)
topological operations (overlap, touches etc.)
Spatial indexing
has a Coordinate Reference System within the DB
algorithms like shortest paths, maximum flows
advanced geometric processing (split geometry, simplification, triangulation…)
Geocoding of spatial data
representation of grids
Digital Elevation models, slopes…

What are the criteria for valid polygons?

List some methods of processing/analyzing geometries.

Write short SQL queries that output

Lakes that lie completely within the states Berlin or Bavaria
Citis that are neighboured to “Paunzhausen”
Cities with a river flowing through it

In which column is the geometry of a spatial object stored?

Why is Data Integrity important and what are the two methods for the definition of integrity constraints?

Data Integrity is important because “invalid” states of data lead to inconsistencies and confusion.

For example, if grades range from 1-10, no one should be assigned a grade 11.

The methods for the definition of integrity constraints are:

specification of the value domain of each column: Data type (numeric, integer, string, …) and i.e. the number of characters allowed, for example numeric(5,0) would mean the number can have a max of 5 digits
prohibition of NULL values, for example name VARCHAR(30) NOT NULL

Also, to prevent inconsistencies primary keys / foreign keys / keys in general can be used to prevent two same tuples in the same relation.

Additional constraints on the level of attributes can be made with the CHECK clause, e.g. if the value for an attribute is one of the three possible strings.

What is Referential Integrity and what are its three strategies?

Referential Integrity describes the concept of two or more tables linked together to still be intregrated.

Example: For every class, there is a lecturer. For every lecturer, there is a name; every lecturer id should be in the lecturer table as well.

This can be archieved by setting a foreign key constraint, which rejects invalid changes.

The other strategy is to implement cascading changes.

Example: If a lecturer leaves the university, he should be deleted from the lecturer database AND from the classes database.

The third strategy would be to insert a default value, for example NULL (Example: ON DELETE SET NULL).

What is Interoperability? What are its two types?

Interoperability is a set of agreements on technological standards, policies and institutional agreements which enables the use of geospatial information by users.

There is

syntactic interoperability
- Data exchange formats or interfaces, i.e. always encode metadata in xml
- applications and service are able to work with the same data formats
semantic interoperability
- common vocabularies or semantic reference systems
- users have the same understanding of the terminology used

Why is standardization important for geodata?

Geodata is expensive and take up much space on computers, therefore the storage needs to be efficient & sustainable, repeated measurements should be avoided and the data should be high-quality.

Explain the concept of a Geo Data Infrastructure and its fundamental principles.

A Geo Data Infrastructure (GDI) implements a framework of geographical data, metadata, users and services that are interactively connected in order to use the data efficiently and flexibly. It can be regional (GDI-RE), national (GDI-DE) or international (EU INSPIRE).

The fundamental principles are:

subsidiarity (responsibility for providing, maintenance, quality assurance, consistency)
interoperability, like common standards and reference systems
scalability (data/services should be used for different purposes -> should be extendable in a technical and organizational sense)
transparency
reliability
effiency & effectiveness

What is the difference between a de jure standard and a de facto standard?

De jure standards, or standards according to law, are endorsed by a formal standards organization. The organization ratifies each standard through its official procedures and gives the standard its stamp of approval. De facto standards, or standards in actuality, are adopted widely by an industry and its customers.

de facto standards: dxf, shp, xml, …

de jure standards by e.g. American National Standards Institute, DIN, …

What are Web services for geographical data and why are they important? Give an example.

Web services provide a standardized method of communication between web-accessible applications. They follow common communication protocols but are different from general services, but don’t require a lot of technical knowledge.

It’s important for GIS applications that use the Internet to share data; they share geoinformation through maps, data and processing.

They provide remote data source access for customers and a place to share data for providers.

An example would be the Open Geospatial Consortium (OGC), who develops the specifications like WMS and WFS to standadize the ways of sharing data.

What are OGC Specifications and how do you get them?

OGC = Open Geospatial Consortium.

OGC develops Specifications to standardize the ways of sharing data. Examples are: WMS, WFS, WCS, GML.

Requests are made per URL after a question mark.

Examples for Requests are: GetCapabilities, GetFeature (provides XML) or GetMap (provides an image).

What is the difference between WMS, WCS and WFS?

WMS = Web Map Service

WMS provides a way to send map images over the web.

WFS = Web Feature Service

In a WFS, the feature data is returned to the client in GML (Geography markup language), therefore it returns geographic features like geometries and thematic data in vector representation.

WCS = Web Coverage Service

The WCS web service is used to transfer "coverages", ie. objects covering a geographical area.

It allows clients to request specific coverage data from a server and perform operations on the data.

What are Fundamental Spatial Queries? Explain the types.

Fundamental spatial queries such as point, window, or region queries are used to relate spatial objects, possibly followed by a sequence of additional queries or conditions. To get information, one can use the Spatial Join.

A point query computes the set of geo-referenced objects which contain a specific point p. Example: In which town (polygon) is point P located?

A window query computes the set of objects that overlap a rectangular window r. Example: What towns (polygons) are within this rectangle r?

A region query identifies for a given query polygon pol all objects which intersect pol. Example: What towns (polygons) are inside this country (polygon) pol?

What is a R-Tree and why is it useful?

An R-Tree is a balanced search tree to, for example, fin the minimal overlapping bounding boxes.

This can be useful for spatial queries to improve computation time. For a spatial query, if the predicate on the minimum bounding boxes BB is checked first (Filter) and THEN the exact computation (Refine), this saves time.

Explain the structure of a R-Tree.

A R-Tree is a balanced search tree where each tree level groups and aggregates the elements of the deeper levels. For all nodes, there is a minimum number of entries m and a maximum number M with m<=(M+1)/2. For the root the condition m>=2 applies.

Example:

The nodes are organized through their spatial neighborhood, all leaf nodes have the same distance to the root. The subtrees overlap as seen in the picture, it is therefore non-disjunctive.

How would one determine how a point is located in a polygon in a search tree (R-Tree)?

start at the root and work the way down
determine at every polygon within the tree if the point is inside or not
if necessary, multiple subtrees have to be searched

How would a new object be inserted to an R-Tree?

Starting from the root, the box that is chosen for containing the new object is the one that has to be extended minimally. Sometimes, the bounding boy already contains the object, then one would choose this subtree.

In case of overflow: split leaf

How would an object be deleted in an R-Tree?

Search for the entry to be deleted
remove entry
apply changes to the existing bounding boxes
1. underflow: remove remaining entries from node for later re-insertion
2. in case of everflow, split again

Join Course

Preview

Author

Julia S.

Information

Last changed
5 months ago

Report course