4. Core Technologies of AI

Buffl

Artificial Intelligence

von Marie R.

Core Idea of Machine Learning

Systems learn from data instead of explicit rules
Example:
- email routing (spam, refunds, support)
Advantage:
- handles ambiguity better than rule-based systems
Key idea:
- infer patterns from examples → not manual programming

Definition of Machine Learning

Systems improve performance through experience (data)
Identify patterns and structures in data
Generalize to unseen inputs

Tom Mitchell definition

Learning = improvement in task performance through experience

Core Components of ML

Task (T) → what to solve (classification, prediction, recommendation)
Experience (E) → training data
Performance (P) → evaluation metric (accuracy, error, etc.)

ML Workflow

Define problem
Collect & prepare data
Train model
Evaluate performance
Improve iteratively (loop)

Dataset

Structured collection of:
- labeled or unlabeled data
Used for training & evaluation

Features vs Labels

Features → input variables (data description)
Labels → target outputs (what to predict)

Supervised Learning

Trained on input–output pairs
Learns mapping: X → Y

Tasks:

Classification:
- discrete labels (spam / not spam)
Regression:
- continuous values (house price, temperature)

Unsupervised Learning

No labels provided
Goal: find hidden structure

Key tasks:

clustering (group similar data)
dimensionality reduction (simplify data)

Applications:

customer segmentation
visualization
anomaly detection / fraud detection

Reinforcement Learning

Agent learns via interaction with environment
Feedback = rewards / penalties

Core elements:

State → current situation
Actions → possible moves
Transition → environment dynamics
Reward → feedback signal
Policy → strategy for decisions

Goal:

maximize long-term reward

ML System Pipeline

Problem Definition
Data Collection & Preparation
Model Selection
Training
Validation
Testing

Problem Definition

Define task clearly
Define success metric

Data Collection & Preparation

Data sources:
- databases, sensors, logs, web
Steps:
- cleaning
- feature extraction
- normalization
- handling missing values
Quality of data = critical factor

Model selection

Model = mapping from input → output
Trade-off:
- simple models → interpretable, less powerful
- complex models → powerful, risk overfitting

Common models:

Linear regression / logistic regression
Decision trees
Random forests / gradient boosting
SVM
Neural networks

Training

Model learns from data
Adjusts parameters iteratively
Goal: generalize, not memorize

Validation

Evaluate on validation set
Detect overfitting
Tune model

Testing

Final evaluation on unseen test data
Measures real-world performance

Regression Metrics

MAE → average error
MSE → penalizes large errors more

Classification Outcome

True positives
True negatives
False positives
False negatives

Kex metrics

Precision → correctness of positive predictions
Recall → completeness of detection
F1-score → balance of precision & recall

Deployment & Monitoring

Model deployed in real system
Continuous monitoring required
Retraining if performance degrades

Underfitting

Model too simple
High bias
Poor learning

Overfitting

Model too complex
Learns noise + details
High variance

Bias–Variance Tradeoff

Bias → error from simplicity
Variance → sensitivity to data changes
Goal:
- low bias + low variance

Regularization

prevents overfitting
penalizes complexity

Irreducible Error

noise in data
cannot be eliminated

Responsible Machine Learning

⚖️ Fairness

biased data → biased models

🧾 Interpretability

black-box problem in complex models

🛡️ Robustness

vulnerable to noisy/adversarial inputs

🌱 Sustainability

high energy / computational cost

Natural Language Processing (NLP) – Overview

Field of AI focused on:
- understanding human language
- generating human-like language
Goal:
- enable communication between humans and machines
Applications:
- chatbots
- summarization
- translation
- search engines
- information extraction

Why NLP is Difficult

Human language is:
- ambiguous
- context-dependent
- highly variable
Requires understanding at multiple levels:
- structure
- meaning
- context
Cannot be processed directly → must be encoded into machine-readable form

Levels of Language Understanding

Morphology
Syntax
Semantics
Pragmatics
Discourse Analysis

Morphology

Structure of words
Breaks words into morphemes (prefixes, suffixes)
Links word forms:
- train / trainer / training

Syntax

Sentence structure and grammar
Identifies:
- subject
- verb
- object
Uses parsing rules

Semantics

Meaning of words and sentences
Resolves ambiguity based on context
Example:
- “agent” = person or AI system

Pragmatics

Meaning depends on context & intention
Uses world knowledge
Example:
- “I have an early flight” → suggests need for alarm/ride

Discourse Analysis

Meaning across sentences / conversations
Tracks:
- references (“it”, “they”)
- coherence over time
Important for chatbots, summaries

Text Preprocessing

Tokenization
Normalization

Tokenization

Splits text into tokens:
- words, subwords, punctuation
Handles languages with no clear word boundaries (e.g. Chinese, Arabic)

Normalization

Standardizes text:
- lowercasing
- removing punctuation
- expanding contractions

Word normalization:

Stemming:
- rule-based shortening
- fast but imprecise
Lemmatization:
- dictionary + grammar-based
- more accurate
- returns correct base form

Part-of-Speech (POS) Tagging

Assigns grammatical role:
- noun, verb, adjective
Uses context to resolve ambiguity
Modern approaches:
- machine learning + statistical models

Parsing

Analyzes sentence structure
Determines relationships between words

Types:

Constituency parsing
- phrase structure trees (NP, VP)
Dependency parsing
- word-to-word relationships
- identifies “who does what to whom”

Text Representation

Machines cannot process raw text
Convert text → numerical vectors

Bag of Words (BoW)

Counts word occurrences
Ignores:
- order
- grammar
Pros:
- simple, fast
Cons:
- sparse, no semantics

TF-IDF

Measures word importance
Combines:
- term frequency (local importance)
- inverse document frequency (global rarity)
Highlights meaningful words
Reduces impact of common words (“the”, “and”)

Vector Semantics

Distributional Hypothesis

“Words are defined by their context”
Similar contexts → similar meanings

Word Embeddings

Words → dense vectors
Capture semantic similarity
Improve over BoW/TF-IDF

Word Embedding Models

Word2Vec
GloVe
fastText

Word2Vec

Predict-based model
Learns meaning via context prediction

Architectures:

Skip-gram:
- predicts context from word
CBOW:
- predicts word from context
Captures semantic relationships
Enables analogies (king - man + woman ≈ queen)

GloVe

Uses global co-occurrence statistics
Builds word relationships from entire corpus
Captures broader semantic structure
Complementary to Word2Vec

fastText

Uses subword (character n-grams)
Example: “robotics” → rob, bot, tic
Advantages:
- handles rare words
- works with misspellings
- supports new words (OOV problem solved)

Problem with static embeddings

Same word → same vector
No context sensitivity

BERT

Transformer-based model
Bidirectional context understanding

Training tasks:

masked language modeling
next sentence prediction

Key feature:

word meaning depends on context

Large Language Models (LLMs)

Extremely large Transformer models
Trained on massive datasets
Use autoregressive prediction

Strengths:

text generation
reasoning
translation
summarization

Learning modes:

zero-shot learning
few-shot learning

Limitations:

bias
hallucinations
high computational cost

NLP Pipeline

Problem Definition
Preprocessing
Feature Extraction
Classification
Output Generation
Evaluation

Problem Definition

Define task (e.g., spam detection)

Preprocessing

tokenization
normalization
cleaning text

Feature Extraction

TF-IDF
word embeddings
hybrid features

Classification

models:
- logistic regression
- SVM
- random forest
- neural networks

output generation

prediction:
- spam / not spam
confidence scores

Evaluation

precision
recall
accuracy
F1-score

Continuous Improvement

user feedback loop
retraining with new data
adapts to changing language patterns

Computer Vision – Core Idea

Field of AI that enables machines to:
- interpret visual data (images, videos)
- extract semantic meaning from pixels
Goal:
- transform raw visual input → structured understanding
Key applications:
- autonomous driving
- face recognition (smartphones)
- warehouse robotics
- medical imaging

What Computer Vision Does

Input: pixel data (images/videos)
Output:
- objects
- positions
- relationships
- actions/events
Key transformation:
- low-level pixels → high-level meaning

CV vs Image Processing

Image processing:
- improves image quality (no interpretation)
Computer vision:
- interprets content and meaning

Human vs Machine Vision

Humans:
- robust perception
- context-aware
- good with occlusion & ambiguity
Machines:
- require large labeled datasets
- sensitive to noise and variation
- limited generalization

Core Computer Vision Tasks

image classification
object localization
object detection
image segmentation

Image Classification

Assigns one label to whole image
No object localization
Examples:
- disease detection in medical scans
- crop monitoring
- product tagging

Object Localization

Detects:
- object + bounding box
Output:
- class + (x, y, w, h)
Use cases:
- robotics
- warehouse automation

Object Detection

Detects multiple objects per image
Outputs:
- multiple labels + bounding boxes
Applications:
- self-driving cars
- surveillance
- industrial inspection

Image Segmentation

Pixel-level classification
Produces:
- detailed scene map
Use cases:
- crowd tracking
- robotics manipulation

Key challenges of Computer Vision

lighting, viewpoint, distance variation
occlusion (objects hidden)
deformation (changing shapes)
noise (blur, low quality, compression)
dataset bias and domain shift
expensive labeled data (especially pixel-level)

Vision pipeline

Problem Definition
Image Acquisition & Preprocessing
Feature Extraction
Learning Models (CNNs)

Problem Definition (Computer Vision)

define:
- task (classification, detection, etc.)
- input/output format

Image Acquisition & Preprocessing

sources:
- cameras, satellites, medical devices
steps:
- resizing
- normalization (0–255 → scaled values)
- noise reduction
- batch processing

Feature Extraction

goal:
- convert pixels → meaningful patterns

Convolution

detects:
- edges, textures, shapes
produces:
- feature maps

Pooling

reduces spatial size
types:
- max pooling (strongest signal)
- average pooling (smoothed representation)

Learning Models (CNNs)

exploit spatial structure of images
layers:
- convolution → pooling → fully connected
build:
- hierarchical features (edges → objects)

Key CNN Architectures

LeNet-5

early CNN
digit recognition
alternating conv + pooling layers

GoogLeNet

inception modules (multi-scale features)
efficient deep architecture
global average pooling

ResNet

uses skip connections
solves vanishing gradient problem
enables very deep networks

Training Vision Models

training loop:
- forward pass → loss → backpropagation → update
loss function:
- cross-entropy (classification)

Overfitting prevention:

data augmentation:
- rotation, flipping, brightness changes
regularization:
- dropout
- early stopping
- weight decay
- batch normalization

Deployment Challenges

high computational cost (GPU/TPU needed)
real-time inference requirements
dataset bias and domain shift
adversarial attacks (manipulated inputs)
need for robustness in real environments

Transfer Learning

use pretrained models (e.g., ImageNet)
fine-tune for new tasks
reduces:
- training time
- data requirements

Generative Vision Models

move beyond recognition → generation
can create:
- images
- videos
- synthetic data

Applications:

entertainment (animation, VFX)
gaming & VR
medicine (simulation)
data augmentation

Multimodal AI

combines:
- language + vision
example:
- text → image generation (e.g., DALL·E-style systems)

Ethical Challenges

deepfakes & misinformation
bias in generated outputs
lack of control over content quality
need for regulation and transparency

Beitreten

Vorschau

Author

Marie R.

Informationen

Zuletzt geändert
vor 2 Monaten

Kurs melden