Core Idea of Machine Learning
Systems learn from data instead of explicit rules
Example:
email routing (spam, refunds, support)
Advantage:
handles ambiguity better than rule-based systems
Key idea:
infer patterns from examples → not manual programming
Definition of Machine Learning
Systems improve performance through experience (data)
Identify patterns and structures in data
Generalize to unseen inputs
Learning = improvement in task performance through experience
Core Components of ML
Task (T) → what to solve (classification, prediction, recommendation)
Experience (E) → training data
Performance (P) → evaluation metric (accuracy, error, etc.)
ML Workflow
Define problem
Collect & prepare data
Train model
Evaluate performance
Improve iteratively (loop)
Dataset
Structured collection of:
labeled or unlabeled data
Used for training & evaluation
Features vs Labels
Features → input variables (data description)
Labels → target outputs (what to predict)
Supervised Learning
Trained on input–output pairs
Learns mapping: X → Y
Classification:
discrete labels (spam / not spam)
Regression:
continuous values (house price, temperature)
Unsupervised Learning
No labels provided
Goal: find hidden structure
clustering (group similar data)
dimensionality reduction (simplify data)
customer segmentation
visualization
anomaly detection / fraud detection
Reinforcement Learning
Agent learns via interaction with environment
Feedback = rewards / penalties
State → current situation
Actions → possible moves
Transition → environment dynamics
Reward → feedback signal
Policy → strategy for decisions
maximize long-term reward
ML System Pipeline
Problem Definition
Data Collection & Preparation
Model Selection
Training
Validation
Testing
Define task clearly
Define success metric
Data sources:
databases, sensors, logs, web
Steps:
cleaning
feature extraction
normalization
handling missing values
Quality of data = critical factor
Model selection
Model = mapping from input → output
Trade-off:
simple models → interpretable, less powerful
complex models → powerful, risk overfitting
Linear regression / logistic regression
Decision trees
Random forests / gradient boosting
SVM
Neural networks
Model learns from data
Adjusts parameters iteratively
Goal: generalize, not memorize
Evaluate on validation set
Detect overfitting
Tune model
Final evaluation on unseen test data
Measures real-world performance
Regression Metrics
MAE → average error
MSE → penalizes large errors more
Classification Outcome
True positives
True negatives
False positives
False negatives
Kex metrics
Precision → correctness of positive predictions
Recall → completeness of detection
F1-score → balance of precision & recall
Deployment & Monitoring
Model deployed in real system
Continuous monitoring required
Retraining if performance degrades
Underfitting
Model too simple
High bias
Poor learning
Overfitting
Model too complex
Learns noise + details
High variance
Bias–Variance Tradeoff
Bias → error from simplicity
Variance → sensitivity to data changes
Goal:
low bias + low variance
Regularization
prevents overfitting
penalizes complexity
Irreducible Error
noise in data
cannot be eliminated
Responsible Machine Learning
biased data → biased models
black-box problem in complex models
vulnerable to noisy/adversarial inputs
high energy / computational cost
Natural Language Processing (NLP) – Overview
Field of AI focused on:
understanding human language
generating human-like language
enable communication between humans and machines
Applications:
chatbots
summarization
translation
search engines
information extraction
Why NLP is Difficult
Human language is:
ambiguous
context-dependent
highly variable
Requires understanding at multiple levels:
structure
meaning
context
Cannot be processed directly → must be encoded into machine-readable form
Levels of Language Understanding
Morphology
Syntax
Semantics
Pragmatics
Discourse Analysis
Structure of words
Breaks words into morphemes (prefixes, suffixes)
Links word forms:
train / trainer / training
Sentence structure and grammar
Identifies:
subject
verb
object
Uses parsing rules
Meaning of words and sentences
Resolves ambiguity based on context
“agent” = person or AI system
Meaning depends on context & intention
Uses world knowledge
“I have an early flight” → suggests need for alarm/ride
Meaning across sentences / conversations
Tracks:
references (“it”, “they”)
coherence over time
Important for chatbots, summaries
Text Preprocessing
Tokenization
Normalization
Splits text into tokens:
words, subwords, punctuation
Handles languages with no clear word boundaries (e.g. Chinese, Arabic)
Standardizes text:
lowercasing
removing punctuation
expanding contractions
Stemming:
rule-based shortening
fast but imprecise
Lemmatization:
dictionary + grammar-based
more accurate
returns correct base form
Part-of-Speech (POS) Tagging
Assigns grammatical role:
noun, verb, adjective
Uses context to resolve ambiguity
Modern approaches:
machine learning + statistical models
Parsing
Analyzes sentence structure
Determines relationships between words
Constituency parsing
phrase structure trees (NP, VP)
Dependency parsing
word-to-word relationships
identifies “who does what to whom”
Text Representation
Machines cannot process raw text
Convert text → numerical vectors
Bag of Words (BoW)
Counts word occurrences
Ignores:
order
grammar
Pros:
simple, fast
Cons:
sparse, no semantics
TF-IDF
Measures word importance
Combines:
term frequency (local importance)
inverse document frequency (global rarity)
Highlights meaningful words
Reduces impact of common words (“the”, “and”)
Vector Semantics
Distributional Hypothesis
“Words are defined by their context”
Similar contexts → similar meanings
Word Embeddings
Words → dense vectors
Capture semantic similarity
Improve over BoW/TF-IDF
Word Embedding Models
Word2Vec
GloVe
fastText
Predict-based model
Learns meaning via context prediction
Skip-gram:
predicts context from word
CBOW:
predicts word from context
Captures semantic relationships
Enables analogies (king - man + woman ≈ queen)
Uses global co-occurrence statistics
Builds word relationships from entire corpus
Captures broader semantic structure
Complementary to Word2Vec
Uses subword (character n-grams)
Example: “robotics” → rob, bot, tic
Advantages:
handles rare words
works with misspellings
supports new words (OOV problem solved)
Problem with static embeddings
Same word → same vector
No context sensitivity
BERT
Transformer-based model
Bidirectional context understanding
masked language modeling
next sentence prediction
word meaning depends on context
Large Language Models (LLMs)
Extremely large Transformer models
Trained on massive datasets
Use autoregressive prediction
text generation
reasoning
zero-shot learning
few-shot learning
bias
hallucinations
high computational cost
NLP Pipeline
Preprocessing
Feature Extraction
Classification
Output Generation
Evaluation
Define task (e.g., spam detection)
tokenization
cleaning text
word embeddings
hybrid features
models:
logistic regression
random forest
neural networks
output generation
prediction:
spam / not spam
confidence scores
precision
recall
accuracy
F1-score
Continuous Improvement
user feedback loop
retraining with new data
adapts to changing language patterns
Computer Vision – Core Idea
Field of AI that enables machines to:
interpret visual data (images, videos)
extract semantic meaning from pixels
transform raw visual input → structured understanding
Key applications:
autonomous driving
face recognition (smartphones)
warehouse robotics
medical imaging
What Computer Vision Does
Input: pixel data (images/videos)
Output:
objects
positions
relationships
actions/events
Key transformation:
low-level pixels → high-level meaning
CV vs Image Processing
Image processing:
improves image quality (no interpretation)
Computer vision:
interprets content and meaning
Human vs Machine Vision
Humans:
robust perception
context-aware
good with occlusion & ambiguity
Machines:
require large labeled datasets
sensitive to noise and variation
limited generalization
Core Computer Vision Tasks
image classification
object localization
object detection
image segmentation
Image Classification
Assigns one label to whole image
No object localization
Examples:
disease detection in medical scans
crop monitoring
product tagging
Object Localization
Detects:
object + bounding box
class + (x, y, w, h)
Use cases:
robotics
warehouse automation
Object Detection
Detects multiple objects per image
Outputs:
multiple labels + bounding boxes
self-driving cars
surveillance
industrial inspection
Image Segmentation
Pixel-level classification
Produces:
detailed scene map
crowd tracking
robotics manipulation
Key challenges of Computer Vision
lighting, viewpoint, distance variation
occlusion (objects hidden)
deformation (changing shapes)
noise (blur, low quality, compression)
dataset bias and domain shift
expensive labeled data (especially pixel-level)
Vision pipeline
Image Acquisition & Preprocessing
Learning Models (CNNs)
Problem Definition (Computer Vision)
define:
task (classification, detection, etc.)
input/output format
sources:
cameras, satellites, medical devices
steps:
resizing
normalization (0–255 → scaled values)
noise reduction
batch processing
goal:
convert pixels → meaningful patterns
detects:
edges, textures, shapes
produces:
feature maps
reduces spatial size
types:
max pooling (strongest signal)
average pooling (smoothed representation)
exploit spatial structure of images
layers:
convolution → pooling → fully connected
build:
hierarchical features (edges → objects)
Key CNN Architectures
early CNN
digit recognition
alternating conv + pooling layers
inception modules (multi-scale features)
efficient deep architecture
global average pooling
uses skip connections
solves vanishing gradient problem
enables very deep networks
Training Vision Models
training loop:
forward pass → loss → backpropagation → update
loss function:
cross-entropy (classification)
data augmentation:
rotation, flipping, brightness changes
regularization:
dropout
early stopping
weight decay
batch normalization
Deployment Challenges
high computational cost (GPU/TPU needed)
real-time inference requirements
adversarial attacks (manipulated inputs)
need for robustness in real environments
Transfer Learning
use pretrained models (e.g., ImageNet)
fine-tune for new tasks
reduces:
training time
data requirements
Generative Vision Models
move beyond recognition → generation
can create:
images
videos
synthetic data
entertainment (animation, VFX)
gaming & VR
medicine (simulation)
data augmentation
Multimodal AI
combines:
language + vision
example:
text → image generation (e.g., DALL·E-style systems)
Ethical Challenges
deepfakes & misinformation
bias in generated outputs
lack of control over content quality
need for regulation and transparency
Zuletzt geändertvor 23 Tagen