In this vignette (tutorial), I want to demonstrate you, how the TSrepr package is simply extensible. Its methods (functions) can be extended (or combined) for arbitrary feature extraction method from a time series or by a new time series representation method. This useful feature supports several implemented functions in TSrepr package. They can be split into two groups according to a number of features extracted:
The first of the mentioned scenarios supports methods (functions):
PAA (repr_paa
), Mean Seasonal
Profile (repr_seas_profile
) and
FeaTrend (repr_featrend
). The second
scenario supports functions: repr_windowing
and
repr_matrix
.
The PAA representation method aggregates subsequence of a time series by one value - in original by an average value. However, it can be also used for extracting other useful features. For example, it can be median, sum or minimum and maximum. For instance, we want to aggregate (sum) pairs of values in a time series. Let’s show it on real data:
library(TSrepr)
library(ggplot2)
library(data.table)
data_ts <- as.numeric(elec_load[1,])
length(data_ts)
## [1] 672
## [1] 336
ggplot(data.table(Time = 1:length(data_ts_sums),
Value = data_ts_sums),
aes(Time, Value)) +
geom_line() +
theme_bw()
We can also extract some advanced useful features from a time series
like skewness or kurtosis (implemented in package moments
).
Let’s extract skewness from every day of the time series (frequency is
48).
library(moments)
data_ts_skew <- repr_paa(data_ts, q = 48, func = skewness)
ggplot(data.table(Time = 1:length(data_ts_skew),
Value = data_ts_skew),
aes(Time, Value)) +
geom_line() +
theme_bw()
The second scenario is extracting multiple values (features) from a
subsequence of time series. Here, we can use windowing
method that is implemented by repr_windowing
function.
There is just one simple restriction for a custom representation method
function and that it must return a vector. Let’s create function
(repr_fea_extract
) that will extract some basic features
from a time series.
And use it with windowing function on our data.
data_fea <- repr_windowing(data_ts, win_size = 48, func = repr_fea_extract)
ggplot(data.table(Time = 1:length(data_fea),
Value = data_fea),
aes(Time, Value)) +
geom_line() +
theme_bw()
I will show you now, how to apply it on whole dataset (by function
repr_matrix
), cluster final
representations and then how to interpret results. Before applying
clustering on electricity consumption data, normalisation is needed. We
can use classical z-score (norm_z
) or min-max
(norm_min_max
) normalisation methods for every consumers
time series. However, there is a possibility to use directly, in
function repr_matrix
, arbitrary defined normalisation
function. For instance, let’s use a simple self-defined max
normalisation.
data_mat <- repr_matrix(elec_load,
func = repr_fea_extract,
windowing = T,
win_size = 48,
normalise = T,
func_norm = norm_max)
set.seed(123)
clus_res <- kmeans(data_mat, centers = 5, nstart = 10)
Let’s plot the final clusters with corresponding centroids (red line).
# prepare data for plotting
data_plot <- melt(data.table(ID = 1:nrow(data_mat),
class = clus_res$cluster,
data_mat),
id.vars = c("ID", "class"),
variable.name = "Time",
variable.factor = FALSE
)
data_plot[, Time := as.integer(gsub("V", "", Time))]
# prepare centroids
centers <- melt(data.table(ID = 1:nrow(clus_res$centers),
class = 1:nrow(clus_res$centers),
clus_res$centers),
id.vars = c("ID", "class"),
variable.name = "Time",
variable.factor = FALSE
)
centers[, Time := as.integer(gsub("V", "", Time))]
# plot the results
ggplot(data_plot,
aes(Time, value, group = ID)) +
facet_wrap(~class, ncol = 2, scales = "free_y") +
geom_line(color = "grey10", alpha = 0.65) +
geom_line(data = centers,
aes(Time, value),
color = "firebrick1", alpha = 0.80, size = 1.2) +
labs(x = "Time", y = "Load (normalised)") +
theme_bw()
Let’s see also frequency table of occurrence in clusters.
##
## 1 2 3 4 5
## 2 12 12 17 7
There are three dominant clusters (n. 1, 2 and 3). Time series in clusters n. 4 and 5 are irregular against other time series, so they were assigned to own clusters.
In this vignette, I showed you how simple it is to use arbitrary functions for feature extraction from time series in order to create your own time series representations alongside implemented methods in the package TSrepr.