tscfat.Analysis package¶

Submodules¶

tscfat.Analysis.calculate_novelty module¶

Created on Thu Jul 2 13:54:12 2020

Function for time series novelty score calculation. Function requires a similairty matrix and calculates the novelty score with sliding windows gaussian kernel. For further reference, check: https://www.audiolabs-erlangen.de/resources/MIR/FMP/C4/C4S4_NoveltySegmentation.html

tscfat.Analysis.calculate_novelty.compute_novelty(simmat, edge=7, sigma=1.0, mu=0.0)[source]¶

Compute novelty score using the self similarity matrix and gaussian checkerboard convolution kernel, calculating the convolution along the self similarity matrix diagonal.

Parameters

simmat (numpy ndarray) – N x N self similarity matrix.
edge (float, optional) – Gaussian kernel window length / 2. The default is 7.
sigma (float, optional) – Variance for the gaussian kernel construction. The default is 1.0.
mu (float, optional) – Mean for the gaussian kernel construction. The default is 0.0.

Returns

nov (numpy ndarray) – 1D novelty score vector.
kernel (numpy ndarray) – 2D gaussian convolution kernel.

tscfat.Analysis.calculate_similarity module¶

Created on Thu Jul 2 12:28:09 2020

@author: arsi

Functions for distance matrix and similarity matrix calculation. Numpy pdist function is used for the calculation. Full reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html

tscfat.Analysis.calculate_similarity.calculate_distance(X, metric='Euclidean')[source]¶

Calculate a similarity matrix.

Parameters: X (Numpy ndarray) – An m by n array of m original observations in an n-dimensional space.

metricstr or function, optional: The default is “Euclidean”-

Returns: Y_square – Returns a condensed distance matrix Y.
Return type: Numpy ndarray

tscfat.Analysis.calculate_similarity.calculate_similarity(X, metric='Euclidean')[source]¶

Calculate a distance matrix.

Parameters: X (Numpy ndarray) – An m by n array of m original observations in an n-dimensional space.

metricstr or function, optional: The default is “Euclidean”-

Returns: Y_sim – Returns a similarity matrix Y.
Return type: Numpy ndarray

tscfat.Analysis.calculate_stability module¶

Created on Fri Feb 26 09:57:53 2021

@author: arsi

Calculate a rolling window stability index value for given timeseries. Function requires a similairty matrix for the calculation. Reference for the stability index: https://www.nature.com/articles/s41537-020-00123-2

tscfat.Analysis.calculate_stability.compute_stability(simmat, edge=7)[source]¶

Calculate stability index for given similarity matrix, that represents a timeseries.

Parameters

simmat (np.array) – Similarity / self-similarity matrix
edge (int, optional) – Window size used for stability calculation. The default is 7.

Returns

stability – An array containing the rolling stability index values..

Return type

np.array

tscfat.Analysis.cluster_timeseries module¶

Created on Thu Dec 17 12:29:57 2020

@author: arsii

Functions for time series clustering and for cluster visualization. Plot decorartor is used to handle image saving.

tscfat.Analysis.cluster_timeseries.cluster_timeseries(ts, FIGNAME, FIGPATH, title='Clustered timeseries', n=3, mi=5, mib=5, rs=0, metric='dtw', highlight=None, ylim_=None)[source]¶

Cluster timeseries given as an numpy array. Function uses tslearn TimeSeriesKMeans. For full reference check: https://tslearn.readthedocs.io/en/stable/gen_modules/clustering/tslearn.clustering.TimeSeriesKMeans.html

Parameters

ts (numpy array) – A m x n matrix containing the data points
FIGNAME (str) – Figure savename
FIGPATH (path object) – Figure savepath
title (str) – Figure title
n (int, optional) – Number of clusters. The default is 3.
mi (int, optional) – Maximum number of iterations for the algorithm. The default is 5.
mib (int, optional) – N iter used for the barycenter calculation. The default is 5.
rs (int, optional) – A random state used to initialize the centers. The default is 0.
metric (str. optional) – Metric used for the cluster assigment. The default is “dtw”.
highlight (TYPE, optional) – DESCRIPTION. The default is None
ylim (tuple) – Tuple containing the y-limit values.

Returns

labels – An array containing the assigned cluster labels.

Return type

numpy array

tscfat.Analysis.decompose_timeseries module¶

Created on Wed Jul 1 14:40:46 2020

@author: arsi

Calculate STL decomposition for given time series and plot the components. The decomposition is based on statsmodels STL decomposition. Full reference: https://www.statsmodels.org/devel/generated/statsmodels.tsa.seasonal.STL.html

tscfat.Analysis.decompose_timeseries.STL_decomposition(series, title, test=False, savepath=False, savename=False, ylabel='Battery Level (%)', xlabel='Date', dates=False)[source]¶

Decompose timeseries into Model, Trend, Seasonal and Residual parts. Plot the components and their distributions. Optionally save the figure.

Parameters

series (Numpy ndarray) – Time series to be decomposed
title (str) – Figure title.
savepath (Path object, optional) – Figure save path The default is False.
savename (str, optional) – Figure save name. The default is False.
ylabel (str, optional) – Figure ylabel. The default is “Battery Level (%)”.
xlabel (str, optional) – Figure xlabel. The default is “Date”.
dates (array, optional) – List of daytes to be highlighted in the figure. The default is False.

Raises

Exception –

given series is not a numpy array.

Returns

Result – Object containing the decomposition results.

Return type

statsmodels.tsa.seasonal.DecomposeResult object

tscfat.Analysis.degree_of_distribution module¶

Created on Fri Oct 9 13:57:14 2020

@author: arsii

Calculate a distribution degree D for given timeseries. D measures the scattering of the time series values within the range of possible values. For the reference: Schiepek, Günter, and Guido Strunk. “The identification of critical fluctuations and phase transitions in short term and coarse-grained time series—a method for the real-time monitoring of human change processes.” Biological cybernetics 102.3 (2010): 197-207.

tscfat.Analysis.degree_of_distribution.distribution_degree(y, scale, window)[source]¶

Calculate distribution degree for given time series.

Parameters

y (numpy array) – A Time series
scale (int) – Flutuation scale: abs(max value - min value)
window (int) – A window for calculation

Returns

D – Calculated distribution degree.

Return type

float

tscfat.Analysis.fluctuation_intensity module¶

Created on Thu Oct 8 13:02:27 2020

@author: arsii

Calculate a fluctuation intensity F for given timeseries. F is sensitive to amplitude and frequency changes in time signal. For the reference: Schiepek, Günter, and Guido Strunk. “The identification of critical fluctuations and phase transitions in short term and coarse-grained time series—a method for the real-time monitoring of human change processes.” Biological cybernetics 102.3 (2010): 197-207.

tscfat.Analysis.fluctuation_intensity.fluctuation_intensity(y, scale, window)[source]¶

Calculate fluctuation intensity for the given time series.

Parameters

y (numpy array) – A Time series
scale (int) – Flutuation scale: abs(max value - min value)
window (int) – A window for calculation

Returns

F – Calculated fluctuation intensity.

Return type

float

tscfat.Analysis.plot_similarity module¶

Created on Mon Jul 6 14:07:02 2020

@author: arsi

Plot and save self similarity matrix, convolution kernel and novelty score.

tscfat.Analysis.plot_similarity.plot_similarity(sim, nov, stab, title='Similarity and novelty', doi=None, savepath=False, savename=False, ylim=0, 0.05, threshold=0, axis=None, kernel=False, test=False)[source]¶

Plot the similarity matrix. Optionally save the figure, plot the kernel, and plot the similarity score.

Parameters

sim (Numpy ndarray) – m x m array containing similarity values
nov (Numpy ndarray) – m x 1 array containing novelty scores
stab (Numpy ndarray) – m x 1 array containing stability scores
doi (tuple) – (float, float) values used to highlight certain region of interest.
title (str, optional) – Similarity plot title. The default is “Similarity and novelty”.
savepath (Path object, optional) – Path for figure saving. The default is False.
savename (str object, optional) – Savename for the figure. The default is False.
ylim (tuple, optional) – (float,float) ylimits for the plot. The default is (0,0.05).
threshold (float, optional) – Similarity score threshold for showing in the plot. The default is 0.
axis (pandas.core.indexes.base.Index, optional) – Date range used in the novelty score plot. The default is False.
kernel (Numpy ndarray, optional) – m x m convolution kernel used for novelty score calculation. T he default is False.
test (boolean) – Indicates whether the function is tested by pytest. he default is False.

Raises

Exception –

Requested save folder does not exist - Savename and/or savename are not given - Novelty score is not a numpy array - Stability score is not a numpy array

Returns

Return type

None.

tscfat.Analysis.plot_timeseries module¶

Created on Thu Mar 18 15:44:59 2021

@author: arsi

Function for plotting dataframe columns containing the timeseries.

tscfat.Analysis.plot_timeseries.plot_timeseries(data, columns, title, roll=False, xlab='Time', ylab='Value', ylim=False, savename=False, savepath=False, highlight=False, test=False)[source]¶

Plot the selected columns of the given dataframe. The dataframe index should be datetime object.

Parameters

data (pandas dataframe) – Pandas dataframe containing the timeseries.
columns (list) – A list of strings, containing the column names.
title (str) – Figure name.
roll (int, optional) – Rolling window length. The default is False.
xlab (str, optional) – Figure x-label. The default is “Time”.
ylab (str, optional) – Figure y-label. The default is “Value”.
ylim (tuple, optional) – (float, float) ylimit for the figure. The default is False.
savename (str, optional) – Figure savename. The default is False.
savepath (path, optional) – Figure savepath. The default is False.
highlight (tuple, optional) – Tuple containing the start and end point for the region highlighting. The default is False.
test (bool, optional) – Indicates whether the function is tested by pytest. The default is False.

Returns

fig – A figure containing the plotted timeseries.

Return type

matplotlib figure

tscfat.Analysis.rolling_statistics module¶

Created on Fri Dec 18 13:45:10 2020

@author: arsii

Calculate rolling windows statistics for the given time series and plot them.

The following are calculated using rolling window lenght(n):

Average
Variance
Autocorrelation
Mean square of successive differences (MSDD)
Probability of acte change (PAC)

tscfat.Analysis.rolling_statistics.rolling_statistics(ts, w, doi=None, savename=False, savepath=False, test=False)[source]¶

Calculate and plot several rolling statistics.

Parameters

ts (pandas dataframe) – A dataframe containing time as index and one column of data
w (int) – Rolling statistics window size
doi (tuple) – A tuple containing tuples of dates. The default is None.
savename (str (default = False)) – Name used as plot save name. Has to be a type of str
savepath (Path -object (default = False)) – path where plot is to be saved. Path has to exist before calling this function.
test (Boolean, optional) – Flag for test function. The default is False.

Raises

Exception –

given time series is not a pandas dataframe - given windows size is not an integer - given window length is larger than the time series length

Returns

Return type

None or matplotlib.pyplot figure is test if True.

tscfat.Analysis.summary_statistics module¶

Created on Mon Dec 21 13:56:55 2020

@author: ikaheia1

Calculate the following summary statistics for the given timeseries and plot the results:

Histogram

Lag plot with lag 1

Autocorrelation

Partial autocorrelation function

Autocorrelation function

tscfat.Analysis.summary_statistics.summary_statistics(series, title='Time series summary', window=14, savepath=False, savename=False, test=False)[source]¶

Calculate summary statistics for the give timeseries.

Parameters

series (Pandas Series) – A time series for which the summary is calculated
title (str, optional) – Summary plot title. The default is “Time series summary”.
window (int) – Rolling window size. The default is 14.
savepath (Path object, optional) – Figure save path. The default is False.
savename (Path object, optional) – Figure save name. The default is False.
test (Boolean, optional) – Flag for test function. The default is False.

Returns

Return type

None.

tscfat.Analysis package¶

Submodules¶

tscfat.Analysis.calculate_novelty module¶

tscfat.Analysis.calculate_similarity module¶

tscfat.Analysis.calculate_stability module¶

tscfat.Analysis.cluster_timeseries module¶

tscfat.Analysis.decompose_timeseries module¶

tscfat.Analysis.degree_of_distribution module¶

tscfat.Analysis.fluctuation_intensity module¶

tscfat.Analysis.plot_similarity module¶

tscfat.Analysis.plot_timeseries module¶

tscfat.Analysis.rolling_statistics module¶

tscfat.Analysis.summary_statistics module¶

Module contents¶