tscfat.Analysis package

Submodules

tscfat.Analysis.calculate_novelty module

Created on Thu Jul 2 13:54:12 2020

Function for time series novelty score calculation. Function requires a similairty matrix and calculates the novelty score with sliding windows gaussian kernel. For further reference, check: https://www.audiolabs-erlangen.de/resources/MIR/FMP/C4/C4S4_NoveltySegmentation.html

tscfat.Analysis.calculate_novelty.compute_novelty(simmat, edge=7, sigma=1.0, mu=0.0)[source]

Compute novelty score using the self similarity matrix and gaussian checkerboard convolution kernel, calculating the convolution along the self similarity matrix diagonal.

Parameters
  • simmat (numpy ndarray) – N x N self similarity matrix.

  • edge (float, optional) – Gaussian kernel window length / 2. The default is 7.

  • sigma (float, optional) – Variance for the gaussian kernel construction. The default is 1.0.

  • mu (float, optional) – Mean for the gaussian kernel construction. The default is 0.0.

Returns

  • nov (numpy ndarray) – 1D novelty score vector.

  • kernel (numpy ndarray) – 2D gaussian convolution kernel.

tscfat.Analysis.calculate_similarity module

Created on Thu Jul 2 12:28:09 2020

@author: arsi

Functions for distance matrix and similarity matrix calculation. Numpy pdist function is used for the calculation. Full reference: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html

tscfat.Analysis.calculate_similarity.calculate_distance(X, metric='Euclidean')[source]

Calculate a similarity matrix.

Parameters

X (Numpy ndarray) – An m by n array of m original observations in an n-dimensional space.

metricstr or function, optional

The default is “Euclidean”-

Returns

Y_square – Returns a condensed distance matrix Y.

Return type

Numpy ndarray

tscfat.Analysis.calculate_similarity.calculate_similarity(X, metric='Euclidean')[source]

Calculate a distance matrix.

Parameters

X (Numpy ndarray) – An m by n array of m original observations in an n-dimensional space.

metricstr or function, optional

The default is “Euclidean”-

Returns

Y_sim – Returns a similarity matrix Y.

Return type

Numpy ndarray

tscfat.Analysis.calculate_stability module

Created on Fri Feb 26 09:57:53 2021

@author: arsi

Calculate a rolling window stability index value for given timeseries. Function requires a similairty matrix for the calculation. Reference for the stability index: https://www.nature.com/articles/s41537-020-00123-2

tscfat.Analysis.calculate_stability.compute_stability(simmat, edge=7)[source]

Calculate stability index for given similarity matrix, that represents a timeseries.

Parameters
  • simmat (np.array) – Similarity / self-similarity matrix

  • edge (int, optional) – Window size used for stability calculation. The default is 7.

Returns

stability – An array containing the rolling stability index values..

Return type

np.array

tscfat.Analysis.cluster_timeseries module

Created on Thu Dec 17 12:29:57 2020

@author: arsii

Functions for time series clustering and for cluster visualization. Plot decorartor is used to handle image saving.

tscfat.Analysis.cluster_timeseries.cluster_timeseries(ts, FIGNAME, FIGPATH, title='Clustered timeseries', n=3, mi=5, mib=5, rs=0, metric='dtw', highlight=None, ylim_=None)[source]

Cluster timeseries given as an numpy array. Function uses tslearn TimeSeriesKMeans. For full reference check: https://tslearn.readthedocs.io/en/stable/gen_modules/clustering/tslearn.clustering.TimeSeriesKMeans.html

Parameters
  • ts (numpy array) – A m x n matrix containing the data points

  • FIGNAME (str) – Figure savename

  • FIGPATH (path object) – Figure savepath

  • title (str) – Figure title

  • n (int, optional) – Number of clusters. The default is 3.

  • mi (int, optional) – Maximum number of iterations for the algorithm. The default is 5.

  • mib (int, optional) – N iter used for the barycenter calculation. The default is 5.

  • rs (int, optional) – A random state used to initialize the centers. The default is 0.

  • metric (str. optional) – Metric used for the cluster assigment. The default is “dtw”.

  • highlight (TYPE, optional) – DESCRIPTION. The default is None

  • ylim (tuple) – Tuple containing the y-limit values.

Returns

labels – An array containing the assigned cluster labels.

Return type

numpy array

tscfat.Analysis.decompose_timeseries module

Created on Wed Jul 1 14:40:46 2020

@author: arsi

Calculate STL decomposition for given time series and plot the components. The decomposition is based on statsmodels STL decomposition. Full reference: https://www.statsmodels.org/devel/generated/statsmodels.tsa.seasonal.STL.html

tscfat.Analysis.decompose_timeseries.STL_decomposition(series, title, test=False, savepath=False, savename=False, ylabel='Battery Level (%)', xlabel='Date', dates=False)[source]

Decompose timeseries into Model, Trend, Seasonal and Residual parts. Plot the components and their distributions. Optionally save the figure.

Parameters
  • series (Numpy ndarray) – Time series to be decomposed

  • title (str) – Figure title.

  • savepath (Path object, optional) – Figure save path The default is False.

  • savename (str, optional) – Figure save name. The default is False.

  • ylabel (str, optional) – Figure ylabel. The default is “Battery Level (%)”.

  • xlabel (str, optional) – Figure xlabel. The default is “Date”.

  • dates (array, optional) – List of daytes to be highlighted in the figure. The default is False.

Raises

Exception

  • given series is not a numpy array.

Returns

Result – Object containing the decomposition results.

Return type

statsmodels.tsa.seasonal.DecomposeResult object

tscfat.Analysis.degree_of_distribution module

Created on Fri Oct 9 13:57:14 2020

@author: arsii

Calculate a distribution degree D for given timeseries. D measures the scattering of the time series values within the range of possible values. For the reference: Schiepek, Günter, and Guido Strunk. “The identification of critical fluctuations and phase transitions in short term and coarse-grained time series—a method for the real-time monitoring of human change processes.” Biological cybernetics 102.3 (2010): 197-207.

tscfat.Analysis.degree_of_distribution.distribution_degree(y, scale, window)[source]

Calculate distribution degree for given time series.

Parameters
  • y (numpy array) – A Time series

  • scale (int) – Flutuation scale: abs(max value - min value)

  • window (int) – A window for calculation

Returns

D – Calculated distribution degree.

Return type

float

tscfat.Analysis.fluctuation_intensity module

Created on Thu Oct 8 13:02:27 2020

@author: arsii

Calculate a fluctuation intensity F for given timeseries. F is sensitive to amplitude and frequency changes in time signal. For the reference: Schiepek, Günter, and Guido Strunk. “The identification of critical fluctuations and phase transitions in short term and coarse-grained time series—a method for the real-time monitoring of human change processes.” Biological cybernetics 102.3 (2010): 197-207.

tscfat.Analysis.fluctuation_intensity.fluctuation_intensity(y, scale, window)[source]

Calculate fluctuation intensity for the given time series.

Parameters
  • y (numpy array) – A Time series

  • scale (int) – Flutuation scale: abs(max value - min value)

  • window (int) – A window for calculation

Returns

F – Calculated fluctuation intensity.

Return type

float

tscfat.Analysis.plot_similarity module

Created on Mon Jul 6 14:07:02 2020

@author: arsi

Plot and save self similarity matrix, convolution kernel and novelty score.

tscfat.Analysis.plot_similarity.plot_similarity(sim, nov, stab, title='Similarity and novelty', doi=None, savepath=False, savename=False, ylim=0, 0.05, threshold=0, axis=None, kernel=False, test=False)[source]

Plot the similarity matrix. Optionally save the figure, plot the kernel, and plot the similarity score.

Parameters
  • sim (Numpy ndarray) – m x m array containing similarity values

  • nov (Numpy ndarray) – m x 1 array containing novelty scores

  • stab (Numpy ndarray) – m x 1 array containing stability scores

  • doi (tuple) – (float, float) values used to highlight certain region of interest.

  • title (str, optional) – Similarity plot title. The default is “Similarity and novelty”.

  • savepath (Path object, optional) – Path for figure saving. The default is False.

  • savename (str object, optional) – Savename for the figure. The default is False.

  • ylim (tuple, optional) – (float,float) ylimits for the plot. The default is (0,0.05).

  • threshold (float, optional) – Similarity score threshold for showing in the plot. The default is 0.

  • axis (pandas.core.indexes.base.Index, optional) – Date range used in the novelty score plot. The default is False.

  • kernel (Numpy ndarray, optional) – m x m convolution kernel used for novelty score calculation. T he default is False.

  • test (boolean) – Indicates whether the function is tested by pytest. he default is False.

Raises

Exception

  • Requested save folder does not exist - Savename and/or savename are not given - Novelty score is not a numpy array - Stability score is not a numpy array

Returns

Return type

None.

tscfat.Analysis.plot_timeseries module

Created on Thu Mar 18 15:44:59 2021

@author: arsi

Function for plotting dataframe columns containing the timeseries.

tscfat.Analysis.plot_timeseries.plot_timeseries(data, columns, title, roll=False, xlab='Time', ylab='Value', ylim=False, savename=False, savepath=False, highlight=False, test=False)[source]

Plot the selected columns of the given dataframe. The dataframe index should be datetime object.

Parameters
  • data (pandas dataframe) – Pandas dataframe containing the timeseries.

  • columns (list) – A list of strings, containing the column names.

  • title (str) – Figure name.

  • roll (int, optional) – Rolling window length. The default is False.

  • xlab (str, optional) – Figure x-label. The default is “Time”.

  • ylab (str, optional) – Figure y-label. The default is “Value”.

  • ylim (tuple, optional) – (float, float) ylimit for the figure. The default is False.

  • savename (str, optional) – Figure savename. The default is False.

  • savepath (path, optional) – Figure savepath. The default is False.

  • highlight (tuple, optional) – Tuple containing the start and end point for the region highlighting. The default is False.

  • test (bool, optional) – Indicates whether the function is tested by pytest. The default is False.

Returns

fig – A figure containing the plotted timeseries.

Return type

matplotlib figure

tscfat.Analysis.rolling_statistics module

Created on Fri Dec 18 13:45:10 2020

@author: arsii

Calculate rolling windows statistics for the given time series and plot them.

The following are calculated using rolling window lenght(n):
  1. Average

  2. Variance

  3. Autocorrelation

  4. Mean square of successive differences (MSDD)

  5. Probability of acte change (PAC)

tscfat.Analysis.rolling_statistics.rolling_statistics(ts, w, doi=None, savename=False, savepath=False, test=False)[source]

Calculate and plot several rolling statistics.

Parameters
  • ts (pandas dataframe) – A dataframe containing time as index and one column of data

  • w (int) – Rolling statistics window size

  • doi (tuple) – A tuple containing tuples of dates. The default is None.

  • savename (str (default = False)) – Name used as plot save name. Has to be a type of str

  • savepath (Path -object (default = False)) – path where plot is to be saved. Path has to exist before calling this function.

  • test (Boolean, optional) – Flag for test function. The default is False.

Raises

Exception

  • given time series is not a pandas dataframe - given windows size is not an integer - given window length is larger than the time series length

Returns

Return type

None or matplotlib.pyplot figure is test if True.

tscfat.Analysis.summary_statistics module

Created on Mon Dec 21 13:56:55 2020

@author: ikaheia1

Calculate the following summary statistics for the given timeseries and plot the results:

  • Histogram

  • Lag plot with lag 1

  • Autocorrelation

  • Partial autocorrelation function

  • Autocorrelation function

tscfat.Analysis.summary_statistics.summary_statistics(series, title='Time series summary', window=14, savepath=False, savename=False, test=False)[source]

Calculate summary statistics for the give timeseries.

Parameters
  • series (Pandas Series) – A time series for which the summary is calculated

  • title (str, optional) – Summary plot title. The default is “Time series summary”.

  • window (int) – Rolling window size. The default is 14.

  • savepath (Path object, optional) – Figure save path. The default is False.

  • savename (Path object, optional) – Figure save name. The default is False.

  • test (Boolean, optional) – Flag for test function. The default is False.

Returns

Return type

None.

Module contents