The world’s leading publication for data science, AI, and ML professionals.

Feature Extraction for Time Series, from Theory to Practice, with Python

Here's everything you need to know when extracting features for Time Series analysis

Photo by Harman Sandhu on Unsplash
Photo by Harman Sandhu on Unsplash

Time series are a special animal.

When I started my Machine Learning career I did it because I loved Physics (weird reason to start Machine Learning) and from Physics I understood that I also loved coding and Data Science a lot. I didn’t really care about the type of data. All I wanted was to be in front of a computer writing 10k lines of code per day.

The truth is that even when you don’t care (I still really don’t) your career will drift you to some kinds of data rather than others.

If you work at SpaceX, you probably won’t do a lot of NLP but you will do a lot of signal processing. If you work at Netflix, you might end up working with a lot of NLP and recommendation systems. If you work at Tesla you will most definitely be a Computer Vision expert and work with images.

When I started as a Physicist, and then I kept going with my PhD in Engineering, I was immediately thrown into the world of signals. This is just the natural world of engineering: every time you have a setup and extract the information from it, at the end of the day, you treat a signal. Don’t get me wrong, engineering is not the only world where signals are the celebrities of the movie. Another very famous example is the one of finance and stock prices time series. That is another example of signals (time vs price). But if for whatever reason you are dealing with signals you should remember the first sentence of this blog post:

Time series are a special animal.

This means that a lot of transformation/operation/processing techniques that you would do with tabular data or images have another meaning (if they even have a meaning) for time series. Let’s take feature extraction, for example.

The idea of "feature extraction" is to "work" on the data that we have and make sure that we extract all the meaningful features that we can so that the next step (typically the Machine Learning application) can benefit from them. In other words, it is a way of "helping" the machine learning step by feeding important features and filtering out all the less important ones.

This is the full feature extraction process:

Image made by author
Image made by author

Now, when we consider feature extractors for, let’s say, tabular data and signals we are playing two completely different sports.

For example, the concept of peak and valley, the idea of Fourier Transform or Wavelet Transform, and the concept of Independent Component Analysis (ICA) only really make sense when dealing with signals. I’m doing all this talking and showing just to convince you that there is a set of feature extraction techniques that only belong to signals.

Now there are two macro classes of methods to do feature extractions:

  • Data driven based methods: Those methods aim to extract features by just looking at the signals. We ignore the Machine Learning step and its goal (e.g. classification, forecasting, or regression) and we only look at the signal, work on it, and extract information from it.
  • Model based methods: Those methods look at the whole pipeline and aim to find the features for that specific problem to solve.

The pros of the data-driven methods are that they are usually computationally simple to use and don’t require the corresponding target output. The cons of them are that the features are not specific to your problem. For example, doing a Fourier Transform of a signal and using that as a feature might be suboptimal to use specific features trained in an end to end model.

For the sake of this blog post, we’ll talk about data driven methods only. In particular we’ll talk about domain specific based methods, frequency based methods, time based methods and statistical based methods. Let’s get started!

1. Domain Specific Feature Extraction

The first one I’m going to describe is a little bit intentionally vague. The reality is that the best way to extract features is to consider the specific problem that you are facing. For example, let’s say you are dealing with a signal from an engineering experiment and you really care about the amplitude after t = 6s. Those are cases where the feature extraction doesn’t really make sense in general (for a random case t=6s might not be more special than t =10s) but it’s actually extremely relevant for your case. That is what we mean by domain-specific feature extraction. I know this is not a lot of math and coding, but this is what is meant to be as it is extremely dependent on your specific situation.

2. Frequency based Feature Extraction

2.1 Explanation

This method is related to the spectral analysis of our time series/signal. What do we mean by that? If we look at our signal we have a natural domain. The natural domain is the simplest way to look at a signal, and it is the time domain, meaning that we consider the signal as a value (or vector) at a given time.

For example, let’s consider this signal, in its natural domain:

If we plot it we get this:

Image made by author
Image made by author

This is the natural (time) domain, and it is the simplest domain of our dataset. We can convert this in the frequency domain. As we saw in the symbolic expression, our signal has three periodic components. The idea of the frequency domain is to decompose the signal in its periodic components frequencies, amplitudes and phases.

The Fourier Transform Y(k) of the signal y(t) is the following:

Image made by author
Image made by author

This describes the amplitude and phase of the component with frequency k. In terms of extracting the meaningful features, we can extract the amplitudes, phases, and frequency values for the 10 main components (the one with the highest amplitudes). These will be 10×3 features (amplitude, frequency, and phase x 10 ) that will describe your time series based on the spectral information.

Now, this method can be expanded. For example, we can decompose our signal not in based on the sines/cosines functions but based on wavelets, which are another form of periodic wave. ** That kind of decomposition is called Wavelet Decompositio**n.

I understand this is a lot to digest, so let’s start with the coding part to show you what I mean…

2.2 Code

Now, let’s build it in real life. Let’s start with the very simple Fourier Transform.

First we need to invite some friends to the party:

Now let’s take this signal as an example:

This signal has three major components. One with amplitude = 1 and frequency = 1, one with amplitude = 0.4 and frequency = 2 and one with amplitude = 2 and frequency = 3.2. We can recover them by running the Fourier Transform:

We can clearly see three peaks with corresponding amplitudes and frequency.

Now, we don’t really need any fancy plotting (that was just to show that this method works), but we can just do everything with a very simple function, which would be this one:

So you give me the signal y and (optionally):

  • the x or time array
  • the number of features (or peaks) to consider
  • the largest frequency that you are willing to explore

This is the output:

If we want to extract features using **wavelets (not sines/cosine)*** we can do the wavelet transform. We would need to install this guy:

pip install PyWavelets

And then run this:

*I talk about wavelets in detail in this article here. Give a look to learn more about those majestic beasts 🙂

3. Statistical based Feature Extraction

3.1 Explanation

Another approach to feature extraction is to rely on the good old statistics. Given a signal, there are multiple things one can do to extract some statistical information out of it. In order of simple to complex, this is a list of information we can extract:

  • The mean, is nothing but the sum of the signal divided by the number of time steps of the signal. Super simple:
  • The variance, that is how much the signal vary from the mean value:
  • Skewness and Kurtosis. Those are metrics to test how "not Gaussian" the distribution of your time series is. Skewness describes how asymmetric it is and the kurtosis describes how "tailed" it is.
  • Quantiles: Those are the values that divide the time series into intervals with probability ranges. For example a 0.25 quantile with value = 10 means that 25% of the values in your time series are below 10 and the remaining 75% are larger than 10
  • Autocorrelation. This basically tells you how much your time series is "patterned", meaning the intensity of patterns in your time series. More formally, this metric indicates how much the time series values are correlated with their own past values.
  • Entropy: Represents the complexity or unpredictability of the time series. I did a whole blog post about it here.

In 2024 each one of these properties can be implemented with one line of code (what a time to be alive). This is how:

4. Time based Feature Extractor

4.1 Explanation

In this specific section, we will focus on how to extract the information of a Time Series by just extracting the time feature. In particular, we will extract the information of the peaks and valleys. To do so, we will use the find_peaks function from SciPy.

There are multiple features that would be best to know to extract the peaks of a signal such as the expected width, the expected threshold, or the plateau size. If you have any of this information (for example you only want to consider peaks with amplitude > 2 for some reason) you can tweak the parameters, otherwise, you can just leave everything to default.

We also have the privilege to decide how many peaks/features we want. For example, we might want N = 10: only the largest 10 peaks+valleys. Just keep in mind that if we do so, we actively have Nx2 = 20 features (10 locations and 10 amplitudes).

4.2 Code

As always, the code to do so is fairly easy:

Keep in mind that, if we select N = 10 peaks but you only really have M = 4 peaks in your signals the remaining 6 locs and peak amplitudes will be 0.

5. Which method to use?

Ok, so we have seen 4 different classes of methods.

Which one should we use?

I’m not going to hit you with the diplomatic "It depends on the problem," because, of course, it always depends on the problem.

The truth of the matter is that if you have a domain-based feature extraction that is always your best bet: if the physics of the experiment or the prior knowledge of the problem is clear, you should rely on that and consider those features as the most important ones, maybe even consider them as the only ones. Sometimes (a lot of times) you don’t have the domain based features and that’s ok.

For how much it concerns the frequency, statistical, and time based features you should, in my opinion, use them all together. You should add those features to your dataset and then see if some of them help, don’t help, or actually confuse your machine learning model.

6. Conclusions

Thank you very much for spending time with me. I would like to take this space to summarize everything that we have done.

  1. We introduced the idea of feature extraction. We explained why it’s important and why it’s important to know the specific techniques for time series
  2. We explained the difference between model based and data driven feature extraction techniques. Model based techniques are feature extraction techniques that are trained end to end. In this blogpost we focused on data driven techniques that are performed independently from the given task
  3. We discussed the domain based feature extraction techniques, which are specific techniques that stem from the specific problem of interest
  4. We discussed the spectral techniques, which are techniques that involve the Fourier/frequency spectrum of the signal
  5. We discussed the statistical techniques, which extract values like mean, std, entropy, and autocorrelation from the signals
  6. We discussed the time based techniques, which extract the peak information from the signal
  7. We briefly gave an idea of which technique to adopt for your specific case

7. About me!

Thank you again for your time. It means a lot ❤

My name is Piero Paialunga and I’m this guy here:

Image made by author

I am a Ph.D. candidate at the University of Cincinnati Aerospace Engineering Department and a Machine Learning Engineer for Gen Nine. I talk about AI, and Machine Learning in my blog posts and on Linkedin. If you liked the article and want to know more about machine learning and follow my studies you can:

A. Follow me on Linkedin, where I publish all my stories B. Subscribe to my newsletter. It will keep you updated about new stories and give you the chance to text me to receive all the corrections or doubts you may have. C. Become a referred member, so you won’t have any "maximum number of stories for the month" and you can read whatever I (and thousands of other Machine Learning and Data Science top writers) write about the newest technology available. D. Want to work with me? Check my rates and projects on Upwork!

If you want to ask me questions or start a collaboration, leave a message here or on Linkedin:

[email protected]


Related Articles