Sentiment Metrics Specifications

Reference for the metrics and visualizations planned for the Pythia Social Metrics pilot. Metrics descriptions include expected behaviour and formulas to be used. Document to be updated and further detailed as development progresses.

This document contains the specifications for the media statistics visualization. The contents are divided in three sections: data filters, which describes the criteria that can be used for the selection of documents to be analyzed, plotting, which describes the data visualization methods, and additional processing, which describes various modifiers that can be applied to the graphs.

Library Overview

Import library:

import pythiastats as pst

Configure plotting backend ("webapp" for plotly, "notebook" for matplotlib/seaborne):

pst.set_view(
        context="notebook",
        style="default",
        height=3.5,
        aspect=.8
        )

Filter object for selection of entries that contain either one of the argument strings (i.e. "Banca Nationala" OR "BNR"):

f1 = pst.filter_target(
        "Banca Nationala",
        "BNR"
        )

Filter object for selection of entries published within the specified interval:

start = date(2020, 1, 1)
end = date(2021, 1, 1)
f2 = pst.filter_date(start, end)

Dataset object containing all entries fulfiling all criteria (i.e. filter1 AND filter2):

articles = pst.select_entries(f1, f2)

Weekly cumulative sentiment scatter plot:

p1 = pst.scatterplot(
        data=articles,
        kind="weekly cumulative sentiment"
        )

Add running average with a 14 day interval over the scatter plot:

p2 = p1.running_average(lag=14)

Print the figure containing the plots:

g = pst.show(p1, p2)

Export the plot:

g.savefig("paper_demo.pdf")

Data Filters

Data filters are the criteria based on which documents are selected and included in the analysis. Multiple filters can be applied concurrently to enable more granular selections. Filters of the same kind are not by default mutually exclusive.

filter_source Selection of documents published by a specific news outlet.

filter_target Selection of documents including a specified target. This is the standard approach used so far in the MVP.

filter_date Selection of documents published in the interval between two given dates.

filter_topic Selection of documents containing a specific topic. Applies to sessions where a topic model has already been trained for a specific set of documents.

Plotting

The statistics for a set of documents can be visualized using the following functions:

Entry sentiment Scatter plot where each point is the sentiment of an entry from the data set.

Weekly average sentiment Scatter plot where each point is the arithmetic mean of the sentiments of all data set entries in a week.

Weekly cumulative sentiment Scatter plot where each point is the sum of the sentiments of all data set entries in a week.

Weekly coverage volume Scatter plot where each point is the number of data set entries in a week.

Running average Method applying to any scatter plot that takes an averaging interval expressed in days.

Public perception Line plot that takes a characteristic time T and implements numerically the following formula:

\[ P(t)=\int_{0}^{t}s(t_{x})\exp\left(-\frac{\left(t-t_{x}\right)}{T}\right)dt_{x} \]

Additional Processing

Co-mention weighting Derivative