class: center, middle, inverse, title-slide # ABM-MAS Project
Algorithmic Bias in
Echo Chamber Formation
### 2020-12-21 |
University of Turin
**Goal**: to assess the impact of the recommendation algorithm on Twitter echo chambers.

#### 1. [*Definitions*](#definitions)
#### 2. [*Data*](#data)
#### 3. [*Model*](#model)
#### 4. [*Exploratory Data Analysis*](#exploratory-data-analysis)
#### 5. [*Future Developments*](#future)

--- name: definitions class: center ## Definitions

### Recommendation algorithm
A parametric feed-ranking system which is said to be
* *free* if completely determined by the follower graph topology and users activity chronology;
* *biased* otherwise.

### Echo Chamber
Let's consider a metagraph `\(G\)`, a measure of node attribute homogeneity `\(\xi:2^G \to \mathbb{R}\)` and a clustering algorithm `\(\mathcal{C}\)`.

We define a `\((\mathcal{C}, p)\)`-*echo chamber* as a subgraph `\(H \subseteq G\)` such that
1. `\(H\)` is a `\(\mathcal{C}\)`-cluster.
2. `\(\xi(H) \geq p\)`. --- name: data class: center # Data

### 2012 collection
1. **Tweets mining**: all relevant tweet objects from 2012 data.
2. **Hashtag-based users mining**: U.S. election related hashtags have been searched and the users who adopted them selected.
3. **Subscription-based users mining**: the set `\(X\)` of users who survived from 2012 to 2020 has been considered ( `\(\sim 100,000\)` ).

### 2020 collection
1. **Interaction network scraping**: user timeline-aggregated interaction network has been extracted `\(\forall x \in X\)` in order to filter the subset `\(Y \subset X\)` of those who are still active and sufficiently connected ( `\(\sim 10,000\)` ).
2. **Users activity monitoring**: A detailed temporal activity monitoring pipeline has been launched `\(\forall y \in Y\)`.

--- name: exploratory-data-analysis class: center # Exploratory Data Analysis

--- name: tweet-dynamics class: center ## Tweet Dynamics: *X vs. Y*
.pull-left[
### Year     
]
.pull-right[
### Election Period     
]

--- name: retweet-dynamics class: center
## Retweet Dynamics: *X vs. Y*
.pull-left[
### Year     
]
.pull-right[
### Election Period     
]

--- name: hashtag-dynamics class: center
## Hashtag Dynamics: *X vs. Y*
.pull-left[
### Year     
]
.pull-right[
### Election Period     
]

--- name: activity-ensemblesx class: center
## Activity Ensembles: *X*
.pull-left[
### Year  
]
.pull-right[
### Election Period  
]

--- name: activity-ensemblesy class: center
## Activity Ensembles: *Y*
.pull-left[
### Year  
]
.pull-right[
### Election Period  
]

--- name: favorite-activity class: center
## Favorite Activity: *X vs. Y*
.pull-left[
### Year   
]
.pull-right[
### Election Period   
]

--- name: hashtag-activity class: center
## Hashtag Activity: *X vs. Y*
.pull-left[
### Year   
]
.pull-right[
### Election Period   
]

--- name: friend-activity class: center
## Friend Activity: *X vs. Y*
.pull-left[
### Year   
]
.pull-right[
### Election Period   
]

--- name: friend-activity class: center
## Mention Activity: *X vs. Y*
.pull-left[
### Year   
]
.pull-right[
### Election Period   
]

--- name: hashtag-frequency class: center
## Hashtag Frequency: *X vs. Y*
.pull-left[
### Year   
]
.pull-right[
### Election Period   
]

--- name: hashtag-heterogeneity class: center
## Hashtag Heterogeneity: *X vs. Y*
.pull-left[
### Year   
]
.pull-right[
### Election Period   
]

--- name: interacting-user-heterogeneity class: center
## Interaction Heterogeneity: *X vs. Y*
.pull-left[
### Year   
]
.pull-right[
### Election Period   
]

--- name: activity-correlogram-X class: center
## Activity Correlations: *X*
.pull-left[
### Year  
]
.pull-right[
### Election Period  
]

--- name: activity-correlation-matrix-X class: center
## Activity Correlations: *X*
.pull-left[
### Year  
]
.pull-right[
### Election Period  
]

--- name: activity-correlogram-Y class: center
## Activity Correlations: *Y*
.pull-left[
### Year  
]
.pull-right[
### Election Period  
]

--- name: activity-correlation-matrix-Y class: center
## Activity Correlations: *Y*
.pull-left[
### Year  
]
.pull-right[
### Election Period  
]

--- name: CCDF-Y-year class: center  

--- name: CCDF-Y-election class: center
## Election CCDF: *Y*  

--- name: timespan-users class: center
### Activity Timespan Distributions: *X vs. Y*
.pull-left[
### Year  
]
.pull-right[
### Election Period   
]

--- name: timespan-users-Y class: center
### Hashtag Timespan Distributions: *X vs. Y*
.pull-left[
### Year  
]
.pull-right[
### Election Period   
]

--- name: model-architecture class: center
## Model Architecture
.small[
#### Ambient Space
A **multiplex graph** `\((V,(E_{\text{follow}}, E_{\text{retweet}}, E_{\text{favorite}}, E_{\text{mention}}))\)`

.pull-left[
.left[
#### Tweet
* `id` `\(\in \mathbb{N}\)`
* `user_id` `\(\in \mathbb{N}\)`
* `retweet_status` `\(\in \{T,F\}\)`
* `content` `\(\in \mathbb{R}\)`
* ...

#### User
* `id` `\(\in \mathbb{N}\)`
* `opinion` `\(\in \mathbb{R}\)`
* `activity_rate` `\(\in \mathbb{R}\)`
* ...
]]

.pull-right[
.left[
#### Micro-dynamics
* `Tweet!`
* `Update!`: opinion dynamics
* `Follow!`: social dynamics
* `Unfollow!`: social dynamics
* ...

#### Macro-dynamics
* Centrality metrics
* Clustering metrics
* ...
]]
]

--- name: model-simple1 class: center
## Minimal Model .small[ .pull-left[ .left[ #### Tweet Type * `id` `\(\in V \subset \mathbb{N}\)` * `user_id` `\(\in \mathbb{N}\)` * `retweet_status` `\(\in \mathbb{B}\)` * `retweeted_id` `\(\in \mathbb{N}\)` * `retweeted_user_id` `\(\in \mathbb{N}\)` * `content` `\(\in \mathbb{R}\)` * `favorite_count` `\(\in \mathbb{N}\)` * `retweet_count` `\(\in \mathbb{N}\)` * `step` `\(\in \mathbb{N}\)` #### User Type * `id` `\(\in \mathbb{N}\)` * `pos` `\(\in \mathbb{N}\)` * `feed` `\(\subset \mathcal{T}\)` * `tweets` `\(\subset \mathcal{T}\)` * `favorites` `\(\subset \mathcal{T}\)` * `opinion` `\(\in \mathbb{R}\)` * `tweet_rate` `\(\in \mathbb{R}\)` * `retweet_rate` `\(\in \mathbb{R}\)` * `favorite_rate` `\(\in \mathbb{R}\)` * `follow_rate` `\(\in \mathbb{R}\)` * `unfollow_rate` `\(\in \mathbb{R}\)` ] ] .pull-right[ .left[ #### Data * Degree distributions * Activity distributions * Tweet * Favourite * Follow #### Parameters * Population size `\(N\)` * Number of replicates `\(R\)` * Number of time steps `\(T\)` * Controversialness `\(\alpha\)` * Follow threshold `\(f\)` * Unfollow threshold `\(u\)` #### Properties * Population size `N` * Time step `step` * Tweet ID counter `tweet_id` * Tweet objects `tweets` * Controversialness `α` * Follow threshold `follow_threshold` * Unfollow threshold `unfollow_threshold` ] ] ] --- name: model-simple2 class: center ## Minimal Model .small[ .pull-left[ .left[ #### Initialization * `id`: sequentially initialized * `pos`: sequentially initialized * `feed`: initialized as an empty array * `tweets`: initialized as an empty array * `favorites`: initialized as an empty array * `opinion`: drawn from `\(\mathcal{U}(-1,1)\)` * `tweet_rate` and `retweet_rate`: sampled from 2012 activity data * `favorite_rate`: sampled from 2012 activity data * `follow_rate` and `unfollow_rate`: sampled from 2012 activity data * `follower_graph`: friends (outneighbors) of each user (node) have been drawn from a `\(\Gamma(2,1)\)` #### Calibration Not discussed yet. #### Validation Not discussed yet. ]] .pull-right[ .left[ #### Micro-Dynamics * Opinion dynamics (`Update!`): governed by the equation in [Baumann et al. (2020)](https://www.nature.com/articles/s41562-020-0884-z)
* Follow dynamics (`Follow!`): uniform sampling from the complement of set of friends according to `follow_rate` and `follow_threshold`
* Unfollow dynamics (`Unfollow!`): uniform sampling from the set of friends according to the user's `unfollow_rate` and `unfollow_threshold`

#### Macro-Dynamics
* Time step `step` is incremented by one at each iteration
* Tweet ID counter `tweet_id` is incremented by one every time a tweet is authored (or retweeted)
* Tweet objects `tweets` is enriched with every tweet object authored (or retweeted)
]]
]

--- name: model-simple3 class: center
## Minimal Model
.small[
.pull-left[
.left[
### Thoughts & Doubts
* How to define distinct time scales ? `$$T_{F} \gg T_{RT} > T_{L}$$`
* Choose `nsteps` taking into account the constraints induced by Monitor resolution.
* Highlight the latency of the opinion variable: users do not have direct access to the *cognitive opinion* of others but they have indirect access to *behavioral opinion* leading to a non-zero probability of misinterpretation (we may model it via probability distributions centered around the true value)
* Think about how to implement possible polarization/segregation mitigation strategies (e.g. *homophily-induced heterophily*)
* How to reliably mine temporal behavioral opinion and reaction mechanisms (e.g. net positive interaction rate: net of T,RTs,L of known polarity / total number of interactions)
]]

.pull-right[
.left[
### To-Do List
* Overwrite all graph-related functions in `Agents.jl` to ensure compatibility with `MultiplexGraphs.jl` (e.g. `get_node_agents()`,...)
* Think about all elementary variables, parameters and timing distributions
* Explore the portfolio of possible recommendation systems (e.g. GNN, bipartite, centrality-based, simple weighted sum,...)
* Implement `Read!`, `Tweet!`, `Like!`, `RT!` behaviors
* Implement the simplest recommendation system (in `model_step!`)
* Implement non-trivial interpretation distribution `\(\rho_i\)`
* Think of the best parametrization that allows effective calibration
]]
]

--- name: future class: center
# Future Developments

### Initialization, calibration & validation
* A time step is selected so that it encompasses a statistically significant portion of 2012 data. The distributions extracted from the first temporal slice is used to initialize the model.
* The rest, except for the last slice, is instead adopted to calibrate the model (parameters such as changes in activity rates)
* The last slice will be used for validation.

--- name: future class: center
# Future Developments

### Recommendation algorithm
* Training and validation of a model able to perform link prediction on the multiplex graph of the ABM from the monitor data.
* Fit the parametric algorithm on the dynamics predicted by the link prediction model on the multiplex graph.
* Predicted feeds of all users will let us draw a temporal directed **tweet network**, whose nodes are users and edges encode reading relationships. This new network encodes the actual information flow on Twitter, the one where the concept of echo chambers makes sense. .footnote[ .left[.small[For further details, please read the [report](]]] ]