How do data, database systems, database management systems and databases connect? What are the definitions for each of these terms?
Database System = Database Management System (DBMS) + Databases
Databases: either relational, object-oriented or object-relational; collection of data that’s logically related to each other; can store, manipulate & retrieve data
DBMS: software (PostgreSQL, MySQL, Oracle Database, SQLite) that allows to create, define & manipulate DBs, allows users for data to be stored persistently (is not unintentionally deleted!), updated, analyzed and queried (by SQL); controlls data to check consistency (definitions of data types, update rules… -> integrity constraints), can organize DBs, access to DBs is only possible through DBMS; has a well-defined stable user interface which is long-lasting and only extended, not changed; allows data independence through abstraction
Data: collection of discrete values
The client accesses the databases through a DBMS. Examples for clients: SQL shell, pgAdmin, python
List the Properties of a Relation.
Give an explanation for following terms:
relation
tuple
attribute
Cardinality
Degree
Primary Key
Foreign Key
Domain
Why are geodatabases useful?
Every 2 years, amount of digital data is doubled -> large datasets that need to be organized somehow
storing data in files is not future-proof (for example due to discontinued file formats)
data processing can be performed directly with the database
is efficient for storing large amounts of data -> less redundancy, which leads to less errors and inconsistencies
Explain the concept of data models. Which data models exist?
Data models are a collection of concepts for the desription of an abstract version of the real world. Concepts can be objects (vector data) or continuous (raster data). Which one to use is application specific. They are modelled using UML (unified modelling language) which can create models and convert them to database schemas.
There are:
relational models (tables, what we learn about): simple structure, redundancy is avoided by splitting data across several tables; columns and rows, each column is assigned to one data type; fundamental operations: projection onto certain columns, selection of certain rows, joining data from different tables
object-oriented models: more complex than relational model, uses concepts of object-oriented programming (classes i.e.)
objet-relational models (what we use): an extension of the relational model in which user-defined types are possible; is necessary for spatial information
What is necessary to store geographical data in a database? Give examples of some geographical data forms.
A software extension of an object-relational DBMS which supports spatial data models. Examples are Oracle Spatial and PostGIS.
Geographical data forms can be
point, line and polygon objects
raster data
3D point clouds
networks
Shortly explain SQL and its sublanguages.
SQL is a popular query language found in all relational DBMS. It’s a declarative language which is divided into several sublanguages:
Data Definition Language DDL for creating and modifying tables (CREATE, ALTER, DROP)
Data Manipulation Language DML for inserting, updating and querying data (SELECT, INSERT; DELETE, UPDATE)
Data Control Language DCL for defining access privileges of users (GRANT, REVOKE)
Transdaction Control Language TCL for managing changes by DML (COMMIT, ROLLBACK)
…
What is the ACID paradigm?
The ACID paradigm enforces rules for transactions (groups several data base operations to one unit)
Atomicity: transaction is a non-seperable unit; either all changes are made or none
Consistency: transaction leaves a consistent state after it ends or is undone completely
Isolation: concurrent transactions do not affect each other, every transaction is executed as if it’s the only one running
Durability: effect of transaction stays permanently in the database, even after system failure
Explain the differences between an object and a class.
An object holds certain properties/attributes and reacts with a defined behavior (either a change of an objects state or an interaction with the environment -> other objects) to its surrounding; a class is like a template for objects and typically is an aggregation of similar but distinguishable objects.
A class holds class attributes which the objects can inherit.
Going from objects to superclasses generalizes concepts: A superclass represents a concept, a subclass adopts and/or overrides methods of the superclass and defines new, additional attributes and methods. The objects are then instances of the subclasses.
Explain the differences between Aggregation and Generalization.
Aggregation is the concept that a class is an aggregation of several objects, i.e. a car and its parts; Generalization relates not objects but classes to each other. A superclass would be the Generalization of a more specialized (sub-)class.
Example:
Aggregation: A car consists of several parts.
Generalization: A rectangle is a geometrical figure.
Why is Encapsulation useful?
Encapsulation reduces the visibility of attributes outside of the class and therefore protects the data so that less mistakes happen when data is changed. The attributes can be read and/or accessed by methods, i.e. set bachelor degree ().
What is the difference between Aggregation and Composition?
Aggregation and Composition are both special types of association between objects.
In an Aggregation, an objects is an aggregation of several other objects, i.e. a car and its parts.
In a Composition, which is a strict form of aggregation, the existence of the parts depend on the compositional object. If the compositional object is deleted, the parts are deleted as well. This helps with data integrity.
Which steps need to be done to translate a UML schema to a relational schema?
Each class which doesn’t take part in the inheritance hierarchy is projected onto a table (class name = table name, class attributes = table attributes; sometimes, data types have to be adapted to SQL data types!)
Each association, besides inheritance, is projected onto a table (name of association or new name = table name, key attributes of the involved classes = attributes)
Refinement: Some tables can be combined (for example some tables which originate from associations -> for 1:n relationships (key of one class can be added to the other class as foreign key)! For n_m relationships the association table should stay)
other example: 1:1 relationships could be combined
For inheritance relationships: only the classes are translated into tables but not the inheritance relationships (for object-realtional schemas, tables can inherit attributes from parent tables); using keyword INHERITS
either one table per class in the hierarchy: key attributes are the root in the hierarchy -> foreign key of subclasses
2. a single table for the entire hierarchy: all class attributes that appear in hierarchy are attributes, not corresponding attributes are set to 0
What keys can exist in relational databases? sHortly explain the concepts.
super key: attribute(s) that can uniquely identify all rows in a table
composite key: a key that contains more than one attribute, the largest composite key would be a key involving all the comulns in the table
candidate key: a key that contains the least possible attributes to uniquely idetnify any table row (for example a unique id!); a table may have more than 1 candidate key
primary key: a key that uniquely identifies each row, is chosen from sets of all canditate keys, can be a composite key or only one; has to be unique and cannot have null values; used to enforce the entity integrity of the table (make each row unique & accessible), helps to set up relationships between tables
examples: Student ID, Social security number, Vehicle Identification number, automatically generated numbner (auto increment, i.e. BIGSERIAL)
foreign key: links to the primary key of another table -> established relationship between two tables, helps with cascading updates and deletes (important to mapping), can improve query performance by using joins; if unique: 1:1-relation, if 1:n or n:m the foreign key is NOT unique (but for n:m relationships, a junction table is necessary)
Which types of joins exist? Shortly explain them.
In Postgres, INNER JOIN is the default join, therefore, if we type JOIN, an inner join will be performed
LEFT JOIN = LEFT OUTER JOIN
RIGHT JOIN = RIGHT OUTER JOIN
What are recursive associations?
A recursive association connects a single class type to itself.
Example: books who reference each other
Explain point set models with the following words: interior, boundary, closure and exterior.
interior: All points in the neighborhood of a point are completely in the point set
boundary: The neighboring points are either inside or outside of the point set
closure: The set of points which are in the interior and the boundary.
exterior: All points that are not part of the closure
List topological relations and explain them.
disjoint: overlay of two geometries is empty
touches: Boundaries intersect, interiors don’t.
equals: identical geometries, including dimension, geometr type, number of support points, x- and y-coordinates
inside/contains: one geometry is completely inside another geometry
covers: X covers Y if Y is inside X, but touches the boundary of X
overlaps
Explain the concept of the 9-intersection model
The 9-intersection-model specifies many topological relationships between two geometries using intersections between the boundary, interior and exterior of two point sets.
For this, the two geometries X and Y need to be spatial objects in a 2D-plane.
The result is a 3x3 matrix which has the values empty/not empty.
What topological relations are these 9-Intersection models about?
What is the DE 9-IM?
The DE 9-IM is the 9-Intersection model with a dimension extension (dimension extended 9-Intersection-model).
The dimension is defined as follows:
0D = Point(s)
1D = Line(s)
2D = Polygon(s)
Another version is to visualize
not empty = T
empty = F
not relevant = *
Model the topological relations from the examples below with help of the DE 9-IM.
The following DE-9IM matrices represent which topological relation?
Explain in own words what these topological relations mean:
Why is Database Normalization useful? Why would you not do it?
Database Normalization avoids redundant data which would lead to anomalies caused by insertion, updating and deletion.
It also leads to a consistent database design which safeguards the tables against inconsistencies.
For example, if information is stored more than once in the same table, updating the information may lead to data inconsistencies as only one tuple is updated.
Inserting incomplete tuples leads to insertion anomalies.
And deleting information by dropping a row may also delete related, but important information that should be kept.
However, sometimes there are reasons for not normalizing tables in a database.
Explain the Fist Normal Form (1NF).
A relation is in the First Normal Form if all attributes have atomic values (one word/character/date etc., not several ones).
Example: A table with books and its author. One book was written by several authors, so in the corresponding tuple is a list in the author attribute instead of just one name.
So what instead?
several columns Author1, Author3, etc. -> multiple columns with the same information should be avoided, plus what if a book has more than 3 authors?
put the same tuple in twice, but with different authors -Y high redundancy through repitition
split the tables! For example one table with title, place, year and the other with title and author.
What does Functional Dependency mean? What does Transitive Dependency mean?
X functionally determines Y if each X value in the table is associated with exactly one Y value.
the table then satisfies the functional dependency X -> Y.
There is trivial functional dependency and non-trivial functional dependency.
For trivial dependency, Y is a subset of X. For non-trivial functional dependency, this is not the case.
The Functional Depedndency is important for the 2NF.
Transitive dependency: Imagine a relation R with attributes A, B and C. C is transitively dependent on A if:
A -> B but NOT B -> A
B -> C
Transitive Dependency is important for the 3NF.
In this table, list the functional dependencies.
What kinds of keys do exist? Explain the concepts.
Superkey -> set of attributes which uniquely identify a tuple
Candidate key -> minimum set of superkeys (prime attributes), a table can have more than one
Artificial key -> if no candidate key exists & duplicate tuples could occur, an atrifivial key is introduced which is used as primary key (like a BIGSERIAL number)
Primary key -> uniquely identifies a tuple in a table
Foreign key -> uniquely relates a tuple to another tuple in another table
Alternate key -> must be a candidate key, is to restrict a column to store only unique entries
Explain the Second Normal Form (2NF).
A relation is in the second normal form if it’s in the first normal form and additionally, every non-prime attribute is functionally dependent on the whole canditate key of the relation (and not just on a part of the candidate key).
If not in normal form, what to do?
split relations
Explain the Third Normal Form (3NF).
A relation is in third normal form if
the relation is in 2nd normal form and
no non-prime-attribute is transitively dependent on the primary key: each non-key attribute in a table should depend on the key, the whole key, and nothing but the key
To fix this, place attributes with transitive dependency in a new relation to obtain 3NF.
What is the Boyce-Codd Normal Form? (BCNF)
The BCNF is slightly stronger than the 3NF, and only in rare cases does a relation in 3NF not meet the requirements of the BCNF.
A relation is in BCNF if for every one of its functional dependencies X -> Y, at least one of the following holds:
X -> Y is a trivial functional dependency (Y is a subset of X)
X is a superkey for the relation
Is this relation in any normal form?
If not, tranform it to the highest form.
The relation is in 3NF, but not in BCNF because of the dependency Year-Month → Release Year and the attribute Year-Month is not a candidate key by itself.
Here’s a solution that eliminates the functional dependency:
What is the normal form of this relation?
What is the Simple Feature Geometry Model? Draw it.
The simple feature (geometry) model is a model for representing discrete objects in vector format as a hierarchy of geometry classes for 1-,2- and 3-dimensional objects.
The feature in question is an abstraction of a phenomenon in the real world (cities, buildings, …).
Object types are: Points, Curves (LineString, LinearRing, Line (LineString with 2 Points)), Surfaces (Polygons), Geometry Collections (collection of different or the same type of geometries).
It was developed by the Open Geospatial Consortium (OGC).
List 2 standards of spatial and geographical data.
ISO SQL/MM Spatial, which also includes curved segments
Simple Feature Geometry Model
What are examples of a Geographic/Spatial DBMS?
Oracle SPatial
PostGIS
What are Features of Geodatabases and why are they different from normal Databases?
supports 2D and 3D data (lines, surface, collection, points)
topological operations (overlap, touches etc.)
Spatial indexing
has a Coordinate Reference System within the DB
algorithms like shortest paths, maximum flows
advanced geometric processing (split geometry, simplification, triangulation…)
Geocoding of spatial data
representation of grids
Digital Elevation models, slopes…
What are the criteria for valid polygons?
List some methods of processing/analyzing geometries.
Write short SQL queries that output
Lakes that lie completely within the states Berlin or Bavaria
Citis that are neighboured to “Paunzhausen”
Cities with a river flowing through it
In which column is the geometry of a spatial object stored?
Why is Data Integrity important and what are the two methods for the definition of integrity constraints?
Data Integrity is important because “invalid” states of data lead to inconsistencies and confusion.
For example, if grades range from 1-10, no one should be assigned a grade 11.
The methods for the definition of integrity constraints are:
specification of the value domain of each column: Data type (numeric, integer, string, …) and i.e. the number of characters allowed, for example numeric(5,0) would mean the number can have a max of 5 digits
prohibition of NULL values, for example name VARCHAR(30) NOT NULL
Also, to prevent inconsistencies primary keys / foreign keys / keys in general can be used to prevent two same tuples in the same relation.
Additional constraints on the level of attributes can be made with the CHECK clause, e.g. if the value for an attribute is one of the three possible strings.
What is Referential Integrity and what are its three strategies?
Referential Integrity describes the concept of two or more tables linked together to still be intregrated.
Example: For every class, there is a lecturer. For every lecturer, there is a name; every lecturer id should be in the lecturer table as well.
This can be archieved by setting a foreign key constraint, which rejects invalid changes.
The other strategy is to implement cascading changes.
Example: If a lecturer leaves the university, he should be deleted from the lecturer database AND from the classes database.
The third strategy would be to insert a default value, for example NULL (Example: ON DELETE SET NULL).
What is Interoperability? What are its two types?
Interoperability is a set of agreements on technological standards, policies and institutional agreements which enables the use of geospatial information by users.
There is
syntactic interoperability
Data exchange formats or interfaces, i.e. always encode metadata in xml
applications and service are able to work with the same data formats
semantic interoperability
common vocabularies or semantic reference systems
users have the same understanding of the terminology used
Why is standardization important for geodata?
Geodata is expensive and take up much space on computers, therefore the storage needs to be efficient & sustainable, repeated measurements should be avoided and the data should be high-quality.
Explain the concept of a Geo Data Infrastructure and its fundamental principles.
A Geo Data Infrastructure (GDI) implements a framework of geographical data, metadata, users and services that are interactively connected in order to use the data efficiently and flexibly. It can be regional (GDI-RE), national (GDI-DE) or international (EU INSPIRE).
The fundamental principles are:
subsidiarity (responsibility for providing, maintenance, quality assurance, consistency)
interoperability, like common standards and reference systems
scalability (data/services should be used for different purposes -> should be extendable in a technical and organizational sense)
transparency
reliability
effiency & effectiveness
Last changed9 days ago