Introduction & CF

Recommender Systems - TU Wien Business Informatics Flash Cards

by nils K.

What is a recommender system?

• recommender systems help match users with items

Long form:

Recommender systems are software agents that elicit the interests and preferences of individual consumers […] and make recommendations accordingly. They have the potential to support

and improve the quality of the decisions consumers make while searching for and selecting products online.

Why use Recommender Systems (User perspective)?

explosion of choice - information overload
Tools to guide through decision process
increacesed number of options
exploratory search
User: finds new things
- narrow down set of choices
- help explore space of options
- entertainment

Why use Recommender Systems (system owner perspective)?

Personalized experience

promotion

persuasion

knowledge

Increase trust

increase loyalty

Increase sales

increase click trough rates

increase conversion etc.

Why use Recommender Systems (content providers)?

• Targeted exposure

• Create brand awareness

• Increase trust and loyalty

• Increase sales, click trough rates, conversion etc

What is the long tail in popular distribution?

popularity distributions have a long tail

• Pareto principle 80/20 rule: 80% of plays comes from

20% of items

Types of recommender systems

demographic-based

collaborative filtering
- show me what people with similar taste like
content-based
knowledge-based
hybrid recommenders
- composition of various recommenders

Is content information necessary for CF about the imtems?

No it is different to content-based recommenders, domain agnostic: ideas apply to tother types of items

What is the input data of CF?

rantings

Users give ratings to items

What are assumptions made for the ratings given by the user?

users’ behavior (i.e., ratings) is guided by their preferences
users’ preferences remain stable and consistent over time
thus, we can expect similar users to prefer similar items

Classification of CF

What is the difference between a memory-based methods and models based methods

and one example each

Memory-based methods remeber the entire history of user-item interaction

e.g Neighborhood methods

model-based methods build a model that describes the history and make recommendatiosn from that.

e.g. Matrix factorization

Memory based -> Item-Item CF erklären

Item-Item-CF empfiehlt ähnliche Artikel basierend auf der Ähnlichkeit zwischen den Bewertungen von Artikeln durch verschiedene Benutzer. Die Methode geht davon aus, dass Benutzer, die ein bestimmtes Produkt mögen, ähnliche Produkte auch mögen könnten.

Beispiel:

Wenn ein Benutzer beispielsweise ein Buch über Geschichte bewertet hat, wird Item-Item-CF ähnliche Bücher aus dem Bereich Geschichte empfehlen. Das Modell berechnet Ähnlichkeiten zwischen den einzelnen Items und gibt dann eine Empfehlung basierend auf der Bewertung des Benutzers und den Ähnlichkeiten zwischen den einzelnen Items.

ChatGPT

Memory based -> User-User CF erklären

User-User Memory-based CF: User-User-CF empfiehlt ähnliche Artikel basierend auf der Ähnlichkeit zwischen den Bewertungen von Benutzern. Das Modell geht davon aus, dass Benutzer, die ähnliche Dinge mögen, wahrscheinlich auch in Zukunft ähnliche Dinge mögen werden.

Beispiel

Wenn ein Benutzer beispielsweise mehrere Artikel im Bereich Sport bewertet hat, wird User-User-CF anderen Benutzern empfehlen, die ähnliche Artikel im Bereich Sport bewertet haben. Das Modell berechnet die Ähnlichkeit zwischen den Bewertungen von Benutzern und gibt dann Empfehlungen basierend auf der Bewertung des Benutzers und den Ähnlichkeiten zwischen den einzelnen Benutzern.

ChatGPT

If you predict the rating of a user in user-user cf what are you trying to predict for the target user?=

The rating of a target user to a target item

What formular is used for the non-personalizeds user-user cf?

Take the average rating -> score = all ratings that exist for that item / number of users

Personalized user-user CF without pearson corellation

we calculate the weighted average rating instead of average

score = (similartity weight between users u,v * all ratings that exist for that item) / normalized by similartity weight between users u,v

Filtering to predict the rating that a user u would give to an item i, based on the ratings of similar users. (chatGPT)

What are rating biases?

For some is good 3/5 for some 5/5

Central tendency bias: This occurs when users tend to rate items in the middle of the rating scale, rather than at the extremes. This can make it difficult to distinguish between items that are truly mediocre and those that are exceptional or terrible.
Leniency bias: This occurs when users tend to rate items higher than they actually deserve. This can be due to a desire to be positive or a lack of critical evaluation of the item.
Severity bias: This occurs when users tend to rate items lower than they actually deserve. This can be due to a desire to be critical or a negative impression of the item.
Item popularity bias: This occurs when users rate popular items more favorably than less popular items, regardless of their actual quality. This can lead to a "rich get richer" effect, where popular items become even more popular due to their high ratings.
User preference bias: This occurs when users rate items based on their personal preferences, rather than the quality of the item itself. This can lead to recommendations that reflect the user's existing preferences, rather than exposing them to new or diverse items.
(Chat GPT)

What can be done against rating biases?

Instead of using ratings -> use deviations of how far off ratings are from mean ratings

-> center around the mean

What is the changed formular if center around the mean rating is used.

r(v) = source user

r(u) = target user

Can there be a predicted rating of 6 if the scale goes from 1 to 5?

Yes if my mean rating is 4/5 and all users give +2 above theier average, my predicted rating is 6

-> in practicse, restrict the predicted rating at a final step

What is the gold standard for computing user similarties?

Pearson correlation (PCC)

with some small adaptation
similar to the mean-centered cosine similarity

Formular for pearson correlation

We want the similarity of the users u,v the formular is:

u = target user

w(u,v) = items rated by both users and deviations of agreement / idividual rating by earch user normalized to [-1,1]

What is the difference between the normal Pearson correlation coefficient and the adjusted Pearson correlation?

Normal PC means (x̄ and ȳ) are calculated over all items in the dataset, regardless of whether they have been rated by both users or not. This means that the means can be affected by differences in the rating scales or rating behavior between users.

adjusted pearson means (x̄ and ȳ)are only calculated over the items that have been rated by both users. This helps to eliminate the influence of differences in rating scales or rating behavior between users, as only the common items are used in the calculation.

By using the adjusted Pearson correlation coefficient, the similarity between users is based only on their rating patterns for the common items, which can improve the accuracy and reliability of the similarity metric in Collaborative Filtering.

(ChatGPT)

What forumlar is used for the cosine similarity, explain the similarity in easy words

it measures how closely aligned the two vectors representing the rating patterns are.

u and v are the vectors representing the rating patterns of two users (or items)
u ⋅ v is the dot product of the two vectors
||u|| and ||v|| are the magnitudes (or lengths) of the two vectors

Final user-user CF Formulart with user neighborhood and pearson correlation

and how is the weight / similarity?