What is a recommender system?
• recommender systems help match users with items
#
Long form:
Recommender systems are software agents that elicit the interests and preferences of individual consumers […] and make recommendations accordingly. They have the potential to support
and improve the quality of the decisions consumers make while searching for and selecting products online.
Why use Recommender Systems (User perspective)?
explosion of choice - information overload
Tools to guide through decision process
increacesed number of options
exploratory search
User: finds new things
narrow down set of choices
help explore space of options
entertainment
Why use Recommender Systems (system owner perspective)?
Personalized experience
promotion
persuasion
knowledge
Increase trust
increase loyalty
Increase sales
increase click trough rates
increase conversion etc.
Why use Recommender Systems (content providers)?
• Targeted exposure
• Create brand awareness
• Increase trust and loyalty
• Increase sales, click trough rates, conversion etc
What is the long tail in popular distribution?
popularity distributions have a long tail
• Pareto principle 80/20 rule: 80% of plays comes from
20% of items
Types of recommender systems
demographic-based
collaborative filtering
show me what people with similar taste like
content-based
knowledge-based
hybrid recommenders
composition of various recommenders
Is content information necessary for CF about the imtems?
No it is different to content-based recommenders, domain agnostic: ideas apply to tother types of items
What is the input data of CF?
rantings
Users give ratings to items
What are assumptions made for the ratings given by the user?
users’ behavior (i.e., ratings) is guided by their preferences
users’ preferences remain stable and consistent over time
thus, we can expect similar users to prefer similar items
Classification of CF
What is the difference between a memory-based methods and models based methods
and one example each
Memory-based methods remeber the entire history of user-item interaction
e.g Neighborhood methods
model-based methods build a model that describes the history and make recommendatiosn from that.
e.g. Matrix factorization
Memory based -> Item-Item CF erklären
Item-Item-CF empfiehlt ähnliche Artikel basierend auf der Ähnlichkeit zwischen den Bewertungen von Artikeln durch verschiedene Benutzer. Die Methode geht davon aus, dass Benutzer, die ein bestimmtes Produkt mögen, ähnliche Produkte auch mögen könnten.
Beispiel:
Wenn ein Benutzer beispielsweise ein Buch über Geschichte bewertet hat, wird Item-Item-CF ähnliche Bücher aus dem Bereich Geschichte empfehlen. Das Modell berechnet Ähnlichkeiten zwischen den einzelnen Items und gibt dann eine Empfehlung basierend auf der Bewertung des Benutzers und den Ähnlichkeiten zwischen den einzelnen Items.
ChatGPT
Memory based -> User-User CF erklären
User-User Memory-based CF: User-User-CF empfiehlt ähnliche Artikel basierend auf der Ähnlichkeit zwischen den Bewertungen von Benutzern. Das Modell geht davon aus, dass Benutzer, die ähnliche Dinge mögen, wahrscheinlich auch in Zukunft ähnliche Dinge mögen werden.
Beispiel
Wenn ein Benutzer beispielsweise mehrere Artikel im Bereich Sport bewertet hat, wird User-User-CF anderen Benutzern empfehlen, die ähnliche Artikel im Bereich Sport bewertet haben. Das Modell berechnet die Ähnlichkeit zwischen den Bewertungen von Benutzern und gibt dann Empfehlungen basierend auf der Bewertung des Benutzers und den Ähnlichkeiten zwischen den einzelnen Benutzern.
If you predict the rating of a user in user-user cf what are you trying to predict for the target user?=
The rating of a target user to a target item
What formular is used for the non-personalizeds user-user cf?
Take the average rating -> score = all ratings that exist for that item / number of users
Personalized user-user CF without pearson corellation
we calculate the weighted average rating instead of average
score = (similartity weight between users u,v * all ratings that exist for that item) / normalized by similartity weight between users u,v
Filtering to predict the rating that a user u would give to an item i, based on the ratings of similar users. (chatGPT)
What are rating biases?
For some is good 3/5 for some 5/5
Central tendency bias: This occurs when users tend to rate items in the middle of the rating scale, rather than at the extremes. This can make it difficult to distinguish between items that are truly mediocre and those that are exceptional or terrible.
Leniency bias: This occurs when users tend to rate items higher than they actually deserve. This can be due to a desire to be positive or a lack of critical evaluation of the item.
Severity bias: This occurs when users tend to rate items lower than they actually deserve. This can be due to a desire to be critical or a negative impression of the item.
Item popularity bias: This occurs when users rate popular items more favorably than less popular items, regardless of their actual quality. This can lead to a "rich get richer" effect, where popular items become even more popular due to their high ratings.
User preference bias: This occurs when users rate items based on their personal preferences, rather than the quality of the item itself. This can lead to recommendations that reflect the user's existing preferences, rather than exposing them to new or diverse items.
(Chat GPT)
What can be done against rating biases?
Instead of using ratings -> use deviations of how far off ratings are from mean ratings
-> center around the mean
What is the changed formular if center around the mean rating is used.
r(v) = source user
r(u) = target user
Can there be a predicted rating of 6 if the scale goes from 1 to 5?
Yes if my mean rating is 4/5 and all users give +2 above theier average, my predicted rating is 6
-> in practicse, restrict the predicted rating at a final step
What is the gold standard for computing user similarties?
Pearson correlation (PCC)
with some small adaptation
similar to the mean-centered cosine similarity
Formular for pearson correlation
We want the similarity of the users u,v the formular is:
u = target user
w(u,v) = items rated by both users and deviations of agreement / idividual rating by earch user normalized to [-1,1]
What is the difference between the normal Pearson correlation coefficient and the adjusted Pearson correlation?
Normal PC means (x̄ and ȳ) are calculated over all items in the dataset, regardless of whether they have been rated by both users or not. This means that the means can be affected by differences in the rating scales or rating behavior between users.
adjusted pearson means (x̄ and ȳ)are only calculated over the items that have been rated by both users. This helps to eliminate the influence of differences in rating scales or rating behavior between users, as only the common items are used in the calculation.
By using the adjusted Pearson correlation coefficient, the similarity between users is based only on their rating patterns for the common items, which can improve the accuracy and reliability of the similarity metric in Collaborative Filtering.
(ChatGPT)
What forumlar is used for the cosine similarity, explain the similarity in easy words
it measures how closely aligned the two vectors representing the rating patterns are.
u and v are the vectors representing the rating patterns of two users (or items)
u ⋅ v is the dot product of the two vectors
||u|| and ||v|| are the magnitudes (or lengths) of the two vectors
Final user-user CF Formulart with user neighborhood and pearson correlation
and how is the weight / similarity?
What does neighborhood N contains?
neighborhood contains users with the highest similarity to the target user
in practice neighborhood contains only users that have rated the target item
Which users should go into the neighborhood of the target?
application/domain dependent
best practise: 25-100 most similar users - neighborhood
How high is the complexity of U-U CF?
Very high, that is why there are many optimisation techniques.
Name optimizations for U-U CF
Neigborhood creation
look among users that have rated at least one of those items
fehlt noch was
Item recommendation
only itmes that have been rated by at least one neighbor -> non rated items cannto be recommended
What is the sparsity of ratings?
Much. more users than items
few ratings compared to product of items and users ( rating per user per product)
What is the most exmpsive part of computing in U-U CF?
pairwise user similarities -> cost = number of user^2
Zuletzt geändertvor 2 Jahren