Programming | Towards Data Science

How to Optimize your Python Program for Slowness

Carl Kadie — Tue, 08 Apr 2025 00:25:35 +0000

Also available: A Rust version of this article.

Everyone talks about making Python programs faster [1, 2, 3], but what if we pursue the opposite goal? Let’s explore how to make them slower — absurdly slower. Along the way, we’ll examine the nature of computation, the role of memory, and the scale of unimaginably large numbers.

Our guiding challenge: write short Python programs that run for an extraordinarily long time.

To do this, we’ll explore a sequence of rule sets — each one defining what kind of programs we’re allowed to write, by placing constraints on halting, memory, and program state. This sequence isn’t a progression, but a series of shifts in perspective. Each rule set helps reveal something different about how simple code can stretch time.

Here are the rule sets we’ll investigate:

Anything Goes — Infinite Loop
Must Halt, Finite Memory — Nested, Fixed-Range Loops
Infinite, Zero-Initialized Memory — 5-State Turing Machine
Infinite, Zero-Initialized Memory — 6-State Turing Machine (>10↑↑15 steps)
Infinite, Zero-Initialized Memory — Plain Python (compute 10↑↑15 without Turing machine emulation)

Aside: 10↑↑15 is not a typo or a double exponent. It’s a number so large that “exponential” and “astronomical” don’t describe it. We’ll define it in Rule Set 4.

We start with the most permissive rule set. From there, we’ll change the rules step by step to see how different constraints shape what long-running programs look like — and what they can teach us.

Rule Set 1: Anything Goes — Infinite Loop

We begin with the most permissive rules: the program doesn’t need to halt, can use unlimited memory, and can contain arbitrary code.

If our only goal is to run forever, the solution is immediate:

while True:
  pass

This program is short, uses negligible memory, and never finishes. It satisfies the challenge in the most literal way — by doing nothing forever.

Of course, it’s not interesting — it does nothing. But it gives us a baseline: if we remove all constraints, infinite runtime is trivial. In the next rule set, we’ll introduce our first constraint: the program must eventually halt. Let’s see how far we can stretch the running time under that new requirement — using only finite memory.

Rule Set 2: Must Halt, Finite Memory — Nested, Fixed-Range Loops

If we want a program that runs longer than the universe will survive and then halts, it’s easy. Just write two nested loops, each counting over a fixed range from 0 to 10¹⁰⁰−1:

for a in range(10**100):
  for b in range(10**100):
      if b % 10_000_000 == 0:
          print(f"{a:,}, {b:,}")

You can see that this program halts after 10¹⁰⁰ × 10¹⁰⁰ steps. That’s 10²⁰⁰. And — ignoring the print—this program uses only a small amount of memory to hold its two integer loop variables—just 144 bytes.

My desktop computer runs this program at about 14 million steps per second. But suppose it could run at Planck speed (the smallest meaningful unit of time in physics). That would be about 10⁵⁰ steps per year — so 10¹⁵⁰ years to complete.

Current cosmological models estimate the heat death of the universe in 10¹⁰⁰ years, so our program will run about 100,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 times longer than the projected lifetime of the universe.

Aside: Practical concerns about running a program beyond the end of the universe are outside the scope of this article.

For an added margin, we can use more memory. Instead of 144 bytes for variables, let’s use 64 gigabytes — about what you’d find in a well-equipped personal computer. That’s about 500 million times more memory, which gives us about one billion variables instead of 2. If each variable iterates over the full 10¹⁰⁰ range, the total number of steps becomes roughly 10¹⁰⁰^(10⁹), or about 10^(100 billion) steps. At Planck speed — roughly 10⁵⁰ steps per year — that corresponds to 10^(100 billion − 50) years of computation.

Can we do better? Well, if we allow an unrealistic but interesting rule change, we can do much, much better.

Rule Set 3: Infinite, Zero-Initialized Memory — 5-State Turing Machine

What if we allow infinite memory — so long as it starts out entirely zero?

Aside: Why don’t we allow infinite, arbitrarily initialized memory? Because it trivializes the challenge. For example, you could mark a single byte far out in memory with a 0x01—say, at position 10¹²⁰—and write a tiny program that just scans until it finds it. That program would take an absurdly long time to run — but it wouldn’t be interesting. The slowness is baked into the data, not the code. We’re after something deeper: small programs that generate their own long runtimes from simple, uniform starting conditions.

My first idea was to use the memory to count upward in binary:

We can do that — but how do we know when to stop? If we don’t stop, we’re violating the “must halt” rule. So, what else can we try?

Let’s take inspiration from the father of Computer Science, Alan Turing. We’ll program a simple abstract machine — now known as a Turing machine — under the following constraints:

The machine has infinite memory, laid out as a tape that extends endlessly in both directions. Each cell on the tape holds a single bit: 0 or 1.
A read/write head moves across the tape. On each step, it reads the current bit, writes a new bit (0 or 1), and moves one cell left or right.

A read/write head positioned on an infinite tape.

The machine also has an internal variable called state, which can hold one of n values. For example, with 5 states, we might name the possible values A, B, C, D, and E—plus a special halting state H, which we don’t count among the five. The machine always starts in the first state, A.

We can express a full Turing machine program as a transition table. Here’s an example we’ll walk through step by step.

A 5-state Turing machine transition table.

Each row corresponds to the current tape value (0 or 1).
Each column corresponds to the current state (A through E).
Each entry in the table tells the machine what to do next:
- The first character is the bit to write (0 or 1)
- The second is the direction to move (L for left, R for right)
- The third is the next state to enter (A, B, C, D, E, or H, where H is the special halting state).

Now that we’ve defined the machine, let’s see how it behaves over time.

We’ll refer to each moment in time — the full configuration of the machine and tape — as a step. This includes the current tape contents, the head position, and the machine’s internal state (like A, B, or H).

Below is Step 0. The head is pointing to a 0 on the tape, and the machine is in state A.

Looking at row 0, column A in the program table, we find the instruction 1RB. That means:

Write 1 to the current tape cell.
Move the head Right.
Enter state B.

Step 0:

This puts us in Step 1:

The machine is now in state B, pointing at the next tape cell (again 0).

What will happen if we let this Turing machine keep running? It will run for exactly 47,176,870 steps — and then halt.

Aside: With a Google sign in, you can run this yourself via a Python notebook on Google Colab. Alternatively, you can copy and run the notebook locally on your own computer by downloading it from GitHub.

That number 47,176,870 is astonishing on its own, but seeing the full run makes it more tangible. We can visualize the execution using a space-time diagram, where each row shows the tape at a single step, from top (earliest) to bottom (latest). In the image:

The first row is blank — it shows the all-zero tape before the machine takes its first step.
1s are shown in orange.
0s are shown in white.
Light orange appears where 0s and 1s are so close together they blend.

Space-time diagram for the champion 5-state Turing machine. It runs for 47,176,870 steps before halting. Each row shows the tape at a single step, starting from the top. Orange represents 1, white represents 0.

In 2023, an online group of amateur researchers organized through bbchallenge.org proved that this is the longest-running 5-state Turing machine that eventually halts.

Want to see this Turing machine in motion? You can watch the full 47-million-step execution unfold in this pixel-perfect video:

Or interact with it directly using the Busy Beaver Blaze web app.

The video generator and web app are part of busy-beaver-blaze, the open-source Python & Rust project that accompanies this article.

It’s hard to believe that such a small machine can run 47 million steps and still halt. But it gets even more astonishing: the team at bbchallenge.org found a 6-state machine with a runtime so long it can’t even be written with ordinary exponents.

Rule Set 4: Infinite, Zero-Initialized Memory — 6-State Turing Machine (>10↑↑15 steps)

As of this writing, the longest running (but still halting) 6-state Turing machine known to humankind is:

A   B   C   D   E   F
0   1RB 1RC 1LC 0LE 1LF 0RC
1   0LD 0RF 1LA 1RH 0RB 0RE

Here is a video showing its first 10 trillion steps:

And here you can run it interactively via a web app.

So, if we are patient — comically patient — how long will this Turing machine run? More than 10↑↑15 where “10 ↑↑ 15” means:

This is not the same as 10¹⁵ (which is just a regular exponent). Instead:

10¹ = 10
10¹⁰ = 10,000,000,000
10^10^10 is 10¹⁰⁰⁰⁰⁰⁰⁰⁰⁰⁰, already unimaginably large.
10↑↑4 is so large that it vastly exceeds the number of atoms in the observable universe.
10↑↑15 is so large that writing it in exponent notation becomes annoying.

Pavel Kropitz announced this 6-state machine on May 30, 2022. Shawn Ligocki has a great write up explaining both his and Pavel’s discoveries. To prove that these machines run so long and then halt, researchers used a mix of analysis and automated tools. Rather than simulating every step, they identified repeating structures and patterns that could be proven — using formal, machine-verified proofs — to eventually lead to halting.

Up to this point, we’ve been talking about Turing machines — specifically, the longest-known 5- and 6-state machines that eventually halt. We ran the 5-state champion to completion and watched visualizations to explore its behavior. But the discovery that it’s the longest halting machine with 5 states — and the identification of the 6-state contender — came from extensive research and formal proofs, not from running them step-by-step.

That said, the Turing machine interpreter I built in Python can run for millions of steps, and the visualizer written in Rust can handle trillions (see GitHub). But even 10 trillion steps isn’t an atom in a drop of water in the ocean compared to the full runtime of the 6-state machine. And running it that far doesn’t get us any closer to understanding why it runs so long.

Aside: Python and Rust “interpreted” the Turing machines up to some point — reading their transition tables and applying the rules step by step. You could also say they “emulated” them, in that they reproduced their behavior exactly. I avoid the word “simulated”: a simulated elephant isn’t an elephant, but a simulated computer is a computer.

Returning to our central challenge: we want to understand what makes a short program run for a long time. Instead of analyzing these Turing machines, let’s construct a Python program whose 10↑↑15 runtime is clear by design.

Rule Set 5: Infinite, Zero-Initialized Memory — Plain Python (compute 10↑↑15 without Turing machine emulation)

Our challenge is to write a small Python program that runs for at least 10↑↑15 steps, using any amount of zero-initialized memory.

To achieve this, we’ll compute the value of 10↑↑15 in a way that guarantees the program takes at least that many steps. The ↑↑ operator is called tetration—recall from Rule Set 4 that ↑↑ stacks exponents: for example, 10↑↑3 means 10^(10^10). It’s an extremely fast-growing function. We will program it from the ground up.

Rather than rely on built-in operators, we’ll define tetration from first principles:

Tetration, implemented by the function tetrate, as repeated exponentiation
Exponentiation, via exponentiate, as repeated multiplication
Multiplication, via multiply, as repeated addition
Addition, via add, as repeated increment

Each layer builds on the one below it, using only zero-initialized memory and in-place updates.

We’ll begin at the foundation — with the simplest operation of all: increment.

Increment

Here’s our definition of increment and an example of its use:

from gmpy2 import xmpz

def increment(acc_increment):
  assert is_valid_accumulator(acc_increment), "not a valid accumulator"
  acc_increment += 1

def is_valid_accumulator(acc):
  return isinstance(acc, xmpz) and acc >= 0  

b = xmpz(4)
print(f"++{b} = ", end="")
increment(b)
print(b)
assert b == 5

Output:

++4 = 5

We’re using xmpz, a mutable arbitrary-precision integer type provided by the gmpy2 library. It behaves like Python’s built-in int in terms of numeric range—limited only by memory—but unlike int, it supports in-place updates.

To stay true to the spirit of a Turing machine and to keep the logic minimal and observable, we restrict ourselves to just a few operations:

Creating an integer with value 0 (xmpz(0))
In-place increment (+= 1) and decrement (-= 1)
Comparing with zero

All arithmetic is done in-place, with no copies and no temporary values. Each function in our computation chain modifies an accumulator directly. Most functions also take an input value a, but increment—being the most basic—does not. We use descriptive names like increment_acc, add_acc, and so on to make the operation clear and to support later functions where multiple accumulators will appear together.

Aside: Why not use Python’s built-in int type? It supports arbitrary precision and can grow as large as your memory allows. But it’s also immutable, meaning any update like += 1 creates a new integer object. Even if you think you’re modifying a large number in place, Python is actually copying all of its internal memory—no matter how big it is.
For example:

x = 10**100
y = x
x += 1
assert x == 10**100 + 1 and y == 10**100

Even though x and y start out identical, x += 1 creates a new object—leaving y unchanged. This behavior is fine for small numbers, but it violates our rules about memory use and in-place updates. That’s why we use gmpy2.xmpz, a mutable arbitrary-precision integer that truly supports efficient, in-place changes.

Addition

With increment defined, we next define addition as repeated incrementing.

def add(a, add_acc):
  assert is_valid_other(a), "not a valid other"
  assert is_valid_accumulator(add_acc), "not a valid accumulator"
  for _ in range(a):
      add_acc += 1

def is_valid_other(a):
  return isinstance(a, int) and a >= 0      

a = 2
b = xmpz(4)
print(f"Before: id(b) = {id(b)}")
print(f"{a} + {b} = ", end="")
add(a, b)
print(b)
print(f"After:  id(b) = {id(b)}")  # ← compare object IDs
assert b == 6

Output:

Before: id(b) = 2082778466064
2 + 4 = 6
After:  id(b) = 2082778466064

The function adds a to add_acc by incrementing add_acc one step at a time, a times. The before and after ids are the same, showing that no new object was created—add_acc was truly updated in place.

Aside: You might wonder why add doesn’t just call our increment function. We could write it that way—but we’re deliberately inlining each level by hand. This keeps all loops visible, makes control flow explicit, and helps us reason precisely about how much work each function performs.

Even though gmpy2.xmpz supports direct addition, we don’t use it. We’re working at the most primitive level possible—incrementing by 1—to keep the logic simple, intentionally slow, and to make the amount of work explicit.

As with increment_acc, we update add_acc in place, with no copying or temporary values. The only operation we use is += 1, repeated a times.

Next, we define multiplication.

Multiplication

With addition in place, we can now define multiplication as repeated addition. Here’s the function and example usage. Unlike add and increment, this one builds up a new xmpz value from zero and returns it.

def multiply(a, multiply_acc):
  assert is_valid_other(a), "not a valid other"
  assert is_valid_accumulator(multiply_acc), "not a valid accumulator"

  add_acc = xmpz(0)
  for _ in count_down(multiply_acc):
      for _ in range(a):
          add_acc += 1
  return add_acc

def count_down(acc):
  assert is_valid_accumulator(acc), "not a valid accumulator"
  while acc > 0:
      acc -= 1
      yield

a = 2
b = xmpz(4)
print(f"{a} * {b} = ", end="")
c = multiply(a, b)
print(c)
assert c == 8
assert b == 0

Output:

2 * 4 = 8

This multiplies a by the value of multiply_acc by adding a to add_acc once for every time multiply_acc can be decremented. The result is returned and then assigned to c. The original multiply_acc is decremented to zero and consumed in the process.

You might wonder what this line does:

for _ in count_down(multiply_acc):

While xmpz technically works with range(), doing so converts it to a standard Python int, which is immutable. That triggers a full copy of its internal memory—an expensive operation for large values. Worse, each decrement step would involve allocating a new integer and copying all previous bits, so what should be a linear loop ends up doing quadratic total work. Our custom count_down() avoids all that by decrementing in place, yielding control without copying, and maintaining predictable memory use.

We’ve built multiplication from repeated addition. Now it’s time to go a layer further: exponentiation.

Exponentiation

We define exponentiation as repeated multiplication. As before, we perform all work using only incrementing, decrementing, and in-place memory. As with multiply, the final result is returned while the input accumulator is consumed.

Here’s the function and example usage:

def exponentiate(a, exponentiate_acc):
  assert is_valid_other(a), "not a valid other"
  assert is_valid_accumulator(exponentiate_acc), "not a valid accumulator"
  assert a > 0 or exponentiate_acc != 0, "0^0 is undefined"

  multiply_acc = xmpz(0)
  multiply_acc += 1
  for _ in count_down(exponentiate_acc):
      add_acc = xmpz(0)
      for _ in count_down(multiply_acc):
          for _ in range(a):
              add_acc += 1
      multiply_acc = add_acc
  return multiply_acc


a = 2
b = xmpz(4)
print(f"{a}^{b} = ", end="")
c = exponentiate(a, b)
print(c)
assert c == 16
assert b == 0

Output:

2^4 = 16

This raises a to the power of exponentiate_acc, using only incrementing, decrementing, and loop control. We initialize multiply_acc to 1 with a single increment—because repeatedly multiplying from zero would get us nowhere. Then, for each time exponentiate_acc can be decremented, we multiply the current result (multiply_acc) by a. As with the earlier layers, we inline the multiply logic directly instead of calling the multiply function—so the control flow and step count stay fully visible.

Aside: And how many times is += 1 called? Obviously at least 2⁴ times—because our result is 2⁴, and we reach it by incrementing from zero. More precisely, the number of increments is:

• 1 increment — initializing multiply_acc to one

Then we loop four times, and in each loop, we multiply the current value of multiply_acc by a = 2, using repeated addition:

• 2 increments — for multiply_acc = 1, add 2 once
• 4 increments — for multiply_acc = 2, add 2 twice
• 8 increments — for multiply_acc = 4, add 2 four times
• 16 increments — for multiply_acc = 8, add 2 eight times

That’s a total of 1 + 2 + 4 + 8 + 16 = 31 increments, which is 2⁵-1. In general, the number of calls to increment will be exponential, but the number is not the same exponential that we are computing.

With exponentiation defined, we’re ready for the top of our tower: tetration.

Tetration

Here’s the function and example usage:

def tetrate(a, tetrate_acc):
  assert is_valid_other(a), "not a valid other"
  assert is_valid_accumulator(tetrate_acc), "not a valid accumulator"
  assert a > 0, "we don't define 0↑↑b"

  exponentiate_acc = xmpz(0)
  exponentiate_acc += 1
  for _ in count_down(tetrate_acc):
      multiply_acc = xmpz(0)
      multiply_acc += 1
      for _ in count_down(exponentiate_acc):
          add_acc = xmpz(0)
          for _ in count_down(multiply_acc):
              for _ in range(a):
                  add_acc += 1
          multiply_acc = add_acc
      exponentiate_acc = multiply_acc
  return exponentiate_acc


a = 2
b = xmpz(3)
print(f"{a}↑↑{b} = ", end="")
c = tetrate(a, b)
print(c)
assert c == 16  # 2^(2^2)
assert b == 0   # Confirm tetrate_acc is consumed

Output:

2↑↑3 = 16

This computes a ↑↑ tetrate_acc, meaning it exponentiates a by itself repeatedly, tetrate_acc times.

For each decrement of tetrate_acc, we exponentiate the current value. We in-line the entire exponentiate and multiply logic again, all the way down to repeated increments.

As expected, this computes 2^(2^2) = 16. With a Google sign-in, you can run this yourself via a Python notebook on Google Colab. Alternatively, you can copy the notebook from GitHub and then run it on your own computer.

We can also run tetrate on 10↑↑15. It will start running, but it won’t stop during our lifetimes — or even the lifetime of the universe:

a = 10
b = xmpz(15)
print(f"{a}↑↑{b} = ", end="")
c = tetrate(a, b)
print(c)

Let’s compare this tetrate function to what we found in the previous Rule Sets.

Rule Set 1: Anything Goes — Infinite Loop

Recall our first function:

while True:
  pass

Unlike this infinite loop, our tetrate function eventually halts — though not anytime soon.

Rule Set 2: Must Halt, Finite Memory — Nested, Fixed-Range Loops

Recall our second function:

for a in range(10**100):
  for b in range(10**100):
      if b % 10_000_000 == 0:
          print(f"{a:,}, {b:,}")

Both this function and our tetrate function contain a fixed number of nested loops. But tetrate differs in an important way: the number of loop iterations grows with the input value. In this function, in contrast, each loop runs from 0 to 10¹⁰⁰-1—a hardcoded bound. In contrast, tetrate’s loop bounds are dynamic — they grow explosively with each layer of computation.

Rule Sets 3 & 4: Infinite, Zero-Initialized Memory — 5- and 6-State Turing Machines

Compared to the Turing machines, our tetrate function has a clear advantage: we can directly see that it will call += 1 more than 10↑↑15 times. Even better, we can also see — by construction — that it halts.

What the Turing machines offer instead is a simpler, more universal model of computation — and perhaps a more principled definition of what counts as a “small program.”

Conclusion

So, there you have it — a journey through writing absurdly slow programs. Along the way, we explored the outer edges of computation, memory, and performance, using everything from deeply nested loops to Turing machines to a hand-inlined tetration function.

Here’s what surprised me:

Nested loops are enough.
If you just want a short program that halts after outliving the universe, two nested loops with 144 bytes of memory will do the job. I hadn’t realized it was that simple.
Turing machines escalate fast.
The jump from 5 to 6 states unleashes a dramatic leap in complexity and runtime. Also, the importance of starting with zero-initialized memory is obvious in retrospect — but it wasn’t something I’d considered before.
Python’s int type can kill performance
Yes, Python integers are arbitrary precision, which is great. But they’re also immutable. That means every time you do something like x += 1, Python silently allocates a brand-new integer object—copying all the memory of x, no matter how big it is. It feels in-place, but it’s not. This behavior turns efficient-looking code into a performance trap when working with large values. To get around this, we use the gmpy2.xmpz type—a mutable, arbitrary-precision integer that allows true in-place updates.
There’s something beyond exponentiation — and it’s called tetration.
I didn’t know this. I wasn’t familiar with the ↑↑ notation or the idea that exponentiation could itself be iterated to form something even faster-growing. It was surprising to learn how compactly it can express numbers that are otherwise unthinkably large.
And because I know you’re asking — yes, there’s something beyond tetration too. It’s called pentation, then hexation, and so on. These are part of a whole hierarchy known as hyperoperations. There’s even a metageneralization: systems like the Ackermann function and fast-growing hierarchies capture entire families of these functions and more.
Writing Tetration with Explicit Loops Was Eye-Opening
I already knew that exponentiation is repeated multiplication, and so on. I also knew this could be written recursively. What I hadn’t seen was how cleanly it could be written as nested loops, without copying values and with strict in-place updates.

Thank you for joining me on this journey. I hope you now have a clearer understanding of how small Python programs can run for an astonishingly long time — and what that reveals about computation, memory, and minimal systems. We’ve seen programs that halt only after the universe dies, and others that run even longer.

All code from this article is available in an open-source GitHub repository.

Please follow Carl on Towards Data Science and on @carlkadie.bsky.social. I write on scientific programming in Python and Rust, machine learning, and statistics. I tend to write about one article per month.

The post How to Optimize your Python Program for Slowness appeared first on Towards Data Science.

How I Would Learn To Code (If I Could Start Over)

Egor Howell — Fri, 04 Apr 2025 18:43:36 +0000

According to various sources, the average salary for Coding jobs is ~£47.5k in the UK, which is ~35% higher than the median salary of about £35k.

So, coding is a very valuable skill that will earn you more money, not to mention it’s really fun.

I have been coding professionally now for 4 years, working as a data scientist and machine learning engineer and in this post, I will explain how I would learn to code if I had to do it all over again.

My journey

I still remember the time I wrote my first bit of code.
It was 9am on the first day of my physics undergrad, and we were in the computer lab.

The professor explained that computation is an integral part of modern physics as it allows us to run large-scale simulations of everything from subatomic particle collisions to the movement of galaxies.

It sounded amazing.

And the way we started this process was by going through a textbook to learn Fortran.

Yes, you heard that right.

My first programming language was Fortran, specifically Fortran 90.
I learned DO loops before FOR loops. I am definitely a rarity in this case.

In that first lab session, I remember writing “Hello World” as is the usual rite of passage and thinking, “Big woop.”

This is how you write “Hello World” in Fortran in case you are interested.

program hello
print *, 'Hello World!'
end program hello

I actually really struggled to code in Fortran and didn’t do that well on tests we had, which put me off coding.

I still have some old coding projects in Fortran on my GitHub that you can check out.

Looking back, the learning curve to coding is quite steep, but it really does compound, and eventually, it will just click.

I didn’t realise this at the time and actively avoided programming modules in my physics degree, which I regret in hindsight as my progress would have been much quicker.

During my third year, I had to do a research placement as part of my master’s. The company I chose to work for/with used a graphical programming language called LabVIEW to run and manage their experiments.

LabVIEW is based on something called “G” and taught me to think of programming differently than script-based.

However, I haven’t used it since and probably never will, but it was cool to learn then.

I did enjoy the research year somewhat, but the pace at which research moves, at least in physics, is painfully slow. Nothing like the “heyday” from the early 20th century I envisioned.

One day after work a video was recommended to me on my YouTube home page.

For those of you unaware, this was a documentary about DeepMind’s AI AlphaGo that beat the best GO player in the world. Most people thought that an AI could never be good at GO.

From the video, I started to understand how AI worked and learn about neural networks, reinforcement learning, and deep learning.
I found it all so interesting, similar to physics research in the early 20th century.

Ultimately, this is when I started studying for a career in Data Science and machine learning, where I needed to teach myself Python and SQL.

This is where I so-called “fell in love” with coding.
I saw its real potential in actually solving problems, but the main thing was that I had a motivated reason to learn. I was studying to break into a career I wanted to be in, which really drove me.

I then became a data scientist for three years and am now a Machine Learning engineer. During this time, I worked extensively with Python and SQL.

Until a few months ago, those were the only programming languages I knew. I did learn other tools, such as bash/z-shell, AWS, docker, data bricks, snowflake, etc. but not any other “proper” programming languages.

In my spare time, I dabbled a bit with C a couple of years ago, but I have forgotten virtually all of it now. I have some basic scripts on my GitHub if you are interested.

However, in my new role that I started a couple of months ago, I will be using Rust and GO, which I am very much looking forward to learning.

If you are interested in my entire journey to becoming a data scientist and machine learning engineer, you can read about it below:

How I Became A Machine Learning Engineer (No CS Degree, No Bootcamp)

Choose a language

I always recommend starting with a single language.

According to TestGorilla, there are over 8,000 programming languages, so how do you pick one?

Well, I would argue that many of these are useless for most jobs and have probably been developed as pet projects or for really niche cases.

You could choose your first language based on popularity. The Stack Overflow 2024 survey has great information on this. The most popular languages are JavaScript, Python, SQL, and Java.

However, the way I recommend you choose your first language should be based on what you want to do or work as.

Front-end web — JavaScript, HTML, CSS
Back-end web — Java, C#, Python, PHP or GO
iOS/macOS apps — Swift
Andriod apps — Kotlin or Java
Games — C++ or C
Embedded Systems — C or C++
Data science/machine learning / AI — Python and SQL

As I wanted to work in the AI/ML space, I focused my energy mainly on Python and some on SQL. It was probably a 90% / 10% split as SQL is smaller and easier to learn.

To this day, I still only know Python and SQL to a “professional” standard, but that’s fine, as pretty much the whole machine-learning community requires these languages.

This shows that you don’t need to know many languages; I have progressed quite far in my career, only knowing two to a significant depth. Of course, it would vary by sector, but the main point still stands.

So, pick a field you want to enter and choose the most in-demand and relevant language in that field.

Learn the bare minimum

The biggest mistake I see beginners make is getting stuck in “tutorial hell.”

This is where you take course after course but never branch out on your own.

I recommend taking a maximum of two courses on a language — literally any intro course would do — and then starting to build immediately.

And I literally mean, build your own projects and experience the struggle because that’s where learning is done.

You won’t know how to write functions until you do it yourself, you won’t know how to create classes until you do it yourself, and you literally won’t understand loops until you implement them yourself.

So, learn the bare minimum and immediately start experimenting; I promise it will at least 2x your learning curve.

You probably have heard this advice a lot, but in reality it is that simple.

I always say that most things in life are simple but hard to do, especially in programming.

Avoid trends

When I say avoid trends, I don’t mean not to focus on areas that are doing well or in demand in the market.

What I am saying is that when you pick a certain language or specialism, stick with it.

Programming languages all share similar concepts and patterns, so when you learn one, you indirectly improve your ability to pick up another later.

But you still should focus on one language for at least a few months.

Don’t develop “shiny object syndrome” and chase the latest technologies; it’s a game that you will unfortunately lose.

There have been so many “distracting” technologies, such as blockchain, Web3, AI, the list goes on.

Instead, focus on the fundamentals:

Data types
Design patterns
Object-oriented programming
Data structures and algorithms
Problem-solving skills

These topics transcend individual programming languages and are much better to master than the latest Javascript framework!

It’s much better to have a strong understanding of one area than try to learn everything. Not only is this more manageable, but it is also better for your long-term career.

As I said earlier, I have progressed quite well in my career by only knowing Python and SQL, as I learned the required technologies for the field and didn’t get distracted.

I can’t stress how much leverage you will have in your career if you document your learning publicly.

Document your learning

I don’t know why more people don’t do this. Sharing what I have learned online has been the biggest game changer for my career.

Literally committing your code on GitHub is enough, but I really recommend posting on LinkedIn or X, and ideally, you should create blog posts to help you cement your understanding and show off you knowledge to employers.

When I interview candidates, if they have some sort of online presence showing their learnings, that’s immediately a tick in my box and an extra edge over other applicants.

It shows enthusiasm and passion, not to mention increasing your surface area of serendipity.

I know many people are scared to do this, but you are suffering from the spotlight effect. Wikipedia defines this as:

The spotlight effect is the psychological phenomenon by which people tend to believe they are being noticed more than they really are.

No one literally cares if you post online or think about you as much as 1% as you think.

So, start posting.

What about AI?

I could spend hours discussing why AI is not an immediate risk for anyone who wants to work in the coding profession.

You should embrace AI as part of your toolkit, but that’s as far as it will go, and it will definitely not replace programmers in 5 years.

Unless an AGI breakthrough suddenly occurs in the next decade, which is highly unlikely.

I personally doubt the answer to AGI is the cross-entropy loss function, which is what is used in most LLMs nowadays.

It has been shown time and time again that these AI models lack strong mathematical reasoning abilities, which is one of the most fundamental skills to being a good coder.

Even the so-called “software engineer killer” Devin is not as good as the creators initially marketed it.

Most companies are simply trying to boost their investment by hyping AI, and their results are often over-exaggerated with controversial benchmark testing.

When I was building a website, ChatGPT even struggled with simple HTML and CSS, which you can argue is its bread and butter!

Overall, don’t worry about AI if you want to work as a coder; there is much, much bigger fish to fry before we cross that bridge!

NeetCode has done a great video explaining how current AI is incapable of replacing programmers.

Another thing!

Join my free newsletter, Dishing the Data, where I share weekly tips, insights, and advice from my experience as a practicing data scientist. Plus, as a subscriber, you’ll get my FREE Data Science Resume Template!

Connect with me

The post How I Would Learn To Code (If I Could Start Over) appeared first on Towards Data Science.

PyScript vs. JavaScript: A Battle of Web Titans

Pol Marin — Wed, 02 Apr 2025 17:15:17 +0000

We’re delving into frontend web development today, and you might be thinking: what does this have to do with Data Science? Why is Towards Data Science publishing a post related to web dev?

Well, because data science isn’t only about building powerful models, engaging in advanced analytics, or cleaning and transforming data—presenting the results is also a key part of our job. And there are several ways to do it: PowerPoint presentations, interactive dashboards (like Tableau), or, as you’ve guessed, through a website.

Speaking from personal experience, I work daily on developing the website we use to present our data-driven results. Using a website instead of PowerPoints or Tableau has many advantages, with freedom and customization being the biggest ones.

Even though I’ve come to (kind of) enjoy JavaScript, it will never match the fun of coding in Python. Luckily, at FOSDEM, I learned about PyScript, and to my surprise, it’s not as alpha as I initially thought.

But is that enough to call it a potential JavaScript replacement? That’s exactly what we’re going to explore today.

JavaScript has been the king of web development for decades. It’s everywhere: from simple button clicks to complex web apps like Gmail and Netflix. But now, there’s a challenger stepping into the ring—PyScript—a framework that lets you run Python in the browser without needing a backend.Sounds like a dream, right? Let’s break it down in an entertaining head-to-head battle between these two web technologies to see if PyScript is a true competitor!

Round 1: What Are They?

This is like the Jake Paul vs Mike Tyson battle: the new challenger (PyScript) vs the veteran champion (JS). Don’t worry, I’m not saying today’s battle will be a disappointment as well.

Let’s start with the veteran: JavaScript.

Created in 1995, JavaScript is the backbone of web development.
Runs natively in browsers, controlling everything from user interactions to animations.
Supported by React, Vue, Angular, and a massive ecosystem of frameworks.
Can directly manipulate the DOM, making web pages dynamic.

Now onto the novice: PyScript.

Built on Pyodide (a Python-to-WebAssembly project), PyScript lets you write Python inside an HTML file.
No need for backend servers—your Python code runs directly in the browser.
Can import Python libraries like NumPy, Pandas, and Matplotlib.
But… it’s still evolving and has limitations.

This last but is a big one, so JavaScript wins the first round!

Round 2: Performance Battle

When it comes to speed, JavaScript is like Usain Bolt—optimized and blazing fast. It runs natively in the browser and is fine-tuned for performance. On the other hand, PyScript runs Python via WebAssembly, which means extra overhead.

Let’s use a real mini-project: a simple counter app. We’ll build it using both alternatives and see which one performs better.

JavaScript

PyScript


from pyscript import display
count = 0

def increment():
    global count
    count += 1
    display(count, target="count")


0

Putting them to the test:

JavaScript runs instantly.
PyScript has a noticeable delay.

End of round: JS increases its advantage making it 2-0!

Round 3: Ease of Use & Readability

Neither of both languages is perfect (for example, neither includes static typing), but their syntax is very different. JavaScript can be quite messy:

const numbers = [1, 2, 3];
const doubled = numbers.map(num => num * 2);

While Python is far easier to understand:

numbers = [1, 2, 3]
doubled = [num * 2 for num in numbers]

The fact that PyScript lets us use the Python syntax makes it the round winner without a doubt. Even though I’m clearly biased towards Python, the fact that it’s beginner-friendly and usually more concise and simple than JS makes it better in terms of usability.

The problem for PyScript is that JavaScript is already deeply integrated into browsers, making it more practical. Despite this, PyScript wins the round making it 2-1.

One more round to go…

Round 4: Ecosystem & Libraries

JavaScript has countless frameworks like React, Vue, and Angular, making it a powerhouse for building dynamic web applications. Its libraries are specifically optimized for the web, providing tools for everything from UI components to complex animations.

On the other hand, PyScript benefits from Python’s vast ecosystem of scientific computing and data science libraries, such as NumPy, Pandas, and Matplotlib. While these tools are excellent for Data Visualization and analysis, they aren’t optimized for frontend web development. Additionally, PyScript requires workarounds to interact with the DOM, which JavaScript handles natively and efficiently.

While PyScript is an exciting tool for embedding Python into web applications, it’s still in its early stages. JavaScript remains the more practical choice for general web development, whereas PyScript shines in scenarios where Python’s computational power is needed within the browser.

Here’s a table summarizing some of the key components

Feature	JavaScript	PyScript
DOM Control	Direct & instant	Requires JavaScript workarounds
Performance	Optimized for browsers	WebAssembly overhead
Ecosystem	Huge (React, Vue, Angular)	Limited, still growing
Libraries	Web-focused (Lodash, D3.js)	Python-focused (NumPy, Pandas)
Use Cases	Full web apps	Data-heavy apps, interactive widgets

Round’s verdict: JavaScript dominates in general web dev, but Pyscript shines for Python-centric projects.

Final Verdict

This was a quick fight! We still don’t know who won though…

Time to reveal it:

If you’re building a full web app, JavaScript is the clear winner.
If you’re adding Python-powered interactivity (e.g., data visualization), PyScript could be useful.

With that said, it’s fair to say that JavaScript (and its derivatives) still remains the web’s frontend best option. However, the future of PyScript is one to watch: If performance improves and it gets better browser integration, PyScript could become a strong hybrid tool for Python developers willing to incorporate more data-related tasks on the frontend.

Winner: JavaScript.

The post PyScript vs. JavaScript: A Battle of Web Titans appeared first on Towards Data Science.

Nine Pico PIO Wats with Rust (Part 2)

Carl Kadie — Fri, 14 Mar 2025 18:45:53 +0000

This is Part 2 of an exploration into the unexpected quirks of programming the Raspberry Pi Pico PIO with Micropython. If you missed Part 1, we uncovered four Wats that challenge assumptions about register count, instruction slots, the behavior of pull noblock, and smart yet cheap hardware.

Now, we continue our journey toward crafting a theremin-like musical instrument — a project that reveals some of the quirks and perplexities of PIO programming. Prepare to challenge your understanding of constants in a way that brings to mind a Shakespearean tragedy.

Wat 5: Inconstant constants

In the world of PIO programming, constants should be reliable, steadfast, and, well, constant. But what if they’re not? This brings us to a puzzling Wat about how the set instruction in PIO works—or doesn’t—when handling larger constants.

Much like Juliet doubting Romeo’s constancy, you might find yourself wondering if PIO constants will, as she says, “prove likewise variable.”

The problem: Constants are not as big as they seem

Imagine you’re programming an ultrasonic range finder and need to count down from 500 while waiting for the Echo signal to drop from high to low. To set up this wait time in PIO, you might naïvely try to load the constant value directly using set:

; In Rust, be sure 'config.shift_in.direction = ShiftDirection::Left;'
set y, 15       ; Load upper 5 bits (0b01111)
mov isr, y      ; Transfer to ISR (clears ISR)
set y, 20       ; Load lower 5 bits (0b10100)
in y, 5         ; Shift in lower bits to form 500 in ISR
mov y, isr      ; Transfer back to y

Aside: Don’t try to understand the crazy jmp operations here. We’ll discuss those next in Wat 6.

But here’s the tragic twist: the set instruction in PIO is limited to constants between 0 and 31. Moreover, the star-crossed set instruction doesn’t report an error. Instead, it silently corrupts the entire PIO instruction. This produces a nonsense result.

Workarounds for inconstant constants

To address this limitation, consider the following approaches:

Read Values and Store Them in a Register: We saw this approach in Wat 1. You can load your constant in the osr register, then transfer it to y. For example:

# Read the max echo wait into OSR.
pull                    ; same as pull block
mov y, osr              ; Load max echo wait into Y

Shift and Combine Smaller Values: Using the isr register and the in instruction, you can build up a constant of any size. This, however, consumes time and operations from your 32-operation budget (see Part 1, Wat 2).

; In Rust, be sure 'config.shift_in.direction = ShiftDirection::Left;'

set y, 15       ; Load upper 5 bits (0b01111)
mov isr, y      ; Transfer to ISR (clears ISR)
set y, 20       ; Load lower 5 bits (0b10100)
in y, 5         ; Shift in lower bits to form 500 in ISR
mov y, isr      ; Transfer back to y

Slow Down the Timing: Reduce the frequency of the state machine to stretch delays over more system clock cycles. For example, lowering the state machine speed from 125 MHz to 343 kHz reduces the timeout constant 182,216 to 500.
Use Extra Delays and (Nested) Loops: All instructions support an optional delay, allowing you to add up to 31 extra cycles. (To generate even longer delays, use loops — or even nested loops.)

; Generate 10μs trigger pulse (4 cycles at 343_000Hz)
set pins, 1 [3]       ; Set trigger pin to high, add delay of 3
set pins, 0           ; Set trigger pin to low voltage

Use the “Subtraction Trick” to Generate the Maximum 32-bit Integer: In Wat 7, we’ll explore a way to generate 4,294,967,295 (the maximum unsigned 32-bit integer) via subtraction.

Much like Juliet cautioning against swearing by the inconstant moon, we’ve discovered that PIO constants are not always as steadfast as they seem. Yet, just as their story takes unexpected turns, so too does ours, moving from the inconstancy of constants to the uneven nature of conditionals. In the next Wat, we’ll explore how PIO’s handling of conditional jumps can leave you questioning its loyalty to logic.

Wat 6: Conditionals through the looking-glass

In most programming environments, logical conditionals feel balanced: you can test if a pin is high or low, or check registers for equality or inequality. In PIO, this symmetry breaks down. You can jump on pin high, but not pin low, and on x!=y, but not x==y. The rules are whimsical — like Humpty Dumpty in Through the Looking-Glass: “When I define a conditional, it means just what I choose it to mean — neither more nor less.”

These quirks force us to rewrite our code to fit the lopsided logic, creating a gulf between how we wish the code could be written and how we must write it.

The problem: Lopsided conditionals in action

Consider a simple scenario: using a range finder, you want to count down from a maximum wait time (y) until the ultrasonic echo pin goes low. Intuitively, you might write the logic like this:

measure_echo_loop:
 jmp !pin measurement_complete   ; If echo voltage is low, measurement is complete
 jmp y-- measure_echo_loop       ; Continue counting down unless timeout

And when processing the measurement, if we only wish to output values that differ from the previous value, we would write:

measurement_complete:
 jmp x==y cooldown             ; If measurement is the same, skip to cool down
 mov isr, y                    ; Store measurement in ISR
 push                          ; Output ISR
 mov x, y                      ; Save the measurement in X

Unfortunately, PIO doesn’t let you test !pin or x==y directly. You must restructure your logic to accommodate the available conditionals, such as pin and x!=y.

The solution: The way it must be

Given PIO’s limitations, we adapt our logic with a two-step approach that ensures the desired behavior despite the missing conditionals:

Jump on the opposite conditional to skip two instructions forward.
Next, use an unconditional jump to reach the desired target.

This workaround adds one extra jump (affecting the instruction limit), but the additional label is cost-free.

Here is the rewritten code for counting down until the pin goes low:

measure_echo_loop:
   jmp pin echo_active     ; if echo voltage is high continue count down
   jmp measurement_complete ; if echo voltage is low, measurement is complete
echo_active:
   jmp y-- measure_echo_loop ; Continue counting down unless timeout

And here is the code for processing the measurement such that it will only output differing values:

measurement_complete:
   jmp x!=y send_result    ; if measurement is different, then send it.
   jmp cooldown            ; If measurement is the same, don't send.

send_result:
   mov isr, y              ; Store measurement in ISR
   push                    ; Output ISR
   mov x, y               ; Save the measurement in X

Lessons from Humpty Dumpty’s conditionals

In Through the Looking-Glass, Alice learns to navigate Humpty Dumpty’s peculiar world — just as you’ll learn to navigate PIO’s Wonderland of lopsided conditions.

But as soon as you master one quirk, another reveals itself. In the next Wat, we’ll uncover a surprising behavior of jmp that, if it were an athlete, would shatter world records.

Wat 7: Overshooting jumps

In Part 1’s Wat 1 and Wat 3, we saw how jmp x-- or jmp y-- is often used to loop a fixed number of times by decrementing a register until it reaches 0. Straightforward enough, right? But what happens when y is 0 and we run the following instruction?

jmp y-- measure_echo_loop

If you guessed that it does not jump to measure_echo_loop and instead falls through to the next instruction, you’re absolutely correct. But for full credit, answer this: What value does y have after the instruction?

The answer: 4,294,967,295. Why? Because y is decremented after it is tested for zero. Wat!?

Aside: If this doesn’t surprise you, you likely have experience with C or C++ which distinguish between pre-increment (e.g., ++x) and post-increment (e.g., x++) operations. The behavior of jmp y-- is equivalent to a post-decrement, where the value is tested before being decremented.

This value, 4,294,967,295, is the maximum for a 32-bit unsigned integer. It’s as if a track-and-field long jumper launches off the takeoff board but, instead of landing in the sandpit, overshoots and ends up on another continent.

Aside: As foreshadowed in Wat 5, we can use this behavior intentionally to set a register to the value 4,294,967,295.

Now that we’ve learned how to stick the landing with jmp, let’s see if we can avoid getting stuck by the pins that PIO reads and sets.

Wat 8: Too many “pins”

In Dr. Seuss’s Too Many Daves, Mrs. McCave had 23 sons, all named Dave, leading to endless confusion whenever she called out their name. In PIO programming, pin and pins can refer to completely different ranges of pins depending on the context. It’s hard to know which Dave or Daves you’re talking to.

The problem: Pin ranges and subranges

In PIO, both pin and pins instructions depend on pin ranges defined in Rust, outside of PIO. However, individual instructions often operate on a subrange of those pin ranges. The behavior varies depending on the command: the subrange could be the first n pins of the range, all the pins, or just a specific pin given by an index. To clarify PIO’s behavior, I created the following table:

This table shows how PIO interprets the terms pin and pins in different instructions, along with their associated contexts and configurations.

Example: Distance program for the range finder

Here’s a PIO program for measuring the distance to an object using Trigger and Echo pins. The key features of this program are:

Continuous Operation: The range finder runs in a loop as fast as possible.
Maximum Range Limit: Measurements are capped at a given distance, with a return value of 4,294,967,295 if no object is detected.
Filtered Outputs: Only measurements that differ from their immediate predecessor are sent, reducing the output rate.

Glance over the program and notice that although it is working with two pins — Trigger and Echo — throughout the program we only see pin and pins.

.program distance

; X is the last value sent. Initialize it to
; u32::MAX which means 'echo timeout'
; (Set X to u32::MAX by subtracting 1 from 0)
   set x, 0
subtraction_trick:
   jmp x-- subtraction_trick

; Read the max echo wait into OSR
   pull                         ; same as pull block

; Main loop
.wrap_target
   ; Generate 10μs trigger pulse (4 cycles at 343_000Hz)
   set pins, 0b1 [3]       ; Set trigger pin to high, add delay of 3
   set pins, 0b0           ; Set trigger pin to low voltage

   ; When the trigger goes high, start counting down until it goes low
   wait 1 pin 0            ; Wait for echo pin to be high voltage
   mov y, osr              ; Load max echo wait into Y

measure_echo_loop:
   jmp pin echo_active     ; if echo voltage is high continue count down
   jmp measurement_complete ; if echo voltage is low, measurement is complete
echo_active:
   jmp y-- measure_echo_loop ; Continue counting down unless timeout

; Y tells where the echo countdown stopped. It
; will be u32::MAX if the echo timed out.
measurement_complete:
   jmp x!=y send_result    ; if measurement is different, then sent it.
   jmp cooldown            ; If measurement is the same, don't send.

send_result:
   mov isr, y              ; Store measurement in ISR
   push                    ; Output ISR
   mov x, y               ; Save the measurement in X

; Cool down period before next measurement
cooldown:
   wait 0 pin 0           ; Wait for echo pin to be low
.wrap                      ; Restart the measurement loop

Configuring Pins

To ensure the PIO program behaves as intended:

set pins, 0b1 should control the Trigger pin.
wait 1 pin 0 should monitor the Echo pin.
jmp pin echo_active should also monitor the Echo pin.

Here’s how you can configure this in Rust (followed by an explanation):

let mut distance_state_machine = pio1.sm0;
let trigger_pio = pio1.common.make_pio_pin(hardware.trigger);
let echo_pio = pio1.common.make_pio_pin(hardware.echo);
distance_state_machine.set_pin_dirs(Direction::Out, &[&trigger_pio]);
distance_state_machine.set_pin_dirs(Direction::In, &[&echo_pio]);
distance_state_machine.set_config(&{
   let mut config = Config::default();
   config.set_set_pins(&[&trigger_pio]); // For set instruction
   config.set_in_pins(&[&echo_pio]); // For wait instruction
   config.set_jmp_pin(&echo_pio); // For jmp instruction
   let program_with_defines = pio_file!("examples/distance.pio");
   let program = pio1.common.load_program(&program_with_defines.program);
   config.use_program(&program, &[]); // No side-set pins
   config
});

The keys here are the set_set_pins, set_in_pins, and set_jmp_pin methods on the Config struct.

set_in_pins: Specifies the pins for input operations, such as wait(1, pin, …). The “in” pins must be consecutive.
set_set_pins: Configures the pin for set operations, like set(pins, 1). The “set” pins must also be consecutive.
set_jmp_pin: Defines the single pin used in conditional jumps, such as jmp(pin, ...).

As described in the table, other optional inputs include:

set_out_pins: Sets the consecutive pins for output operations, such as out(pins, …).
use_program: Sets a) the loaded program and b) consecutive pins for sideset operations. Sideset operations allow simultaneous pin toggling during other instructions.

Configuring Multiple Pins

Although not required for this program, you can configure a range of pins in PIO by providing a slice of consecutive pins. For example, suppose we had two ultrasonic range finders:

let trigger_a_pio = pio1.common.make_pio_pin(hardware.trigger_a);
let trigger_b_pio = pio1.common.make_pio_pin(hardware.trigger_b);
config.set_set_pins(&[&trigger_a_pio, &trigger_b_pio]);

A single instruction can then control both pins:

set pins, 0b11 [3]  # Sets both trigger pins (17, 18) high, adds delay
set pins, 0b00      # Sets both trigger pins low

This approach lets you efficiently apply bit patterns to multiple pins simultaneously, streamlining control for applications involving multiple outputs.

Aside: The Word “Set” in Programming

In programming, the word “set” is notoriously overloaded with multiple meanings. In the context of PIO, “set” refers to something to which you can assign a value — such as a pin’s state. It does not mean a collection of things, as it often does in other programming contexts. When PIO refers to a collection, it usually uses the term “range” instead. This distinction is crucial for avoiding confusion as you work with PIO.

Lessons from Mrs. McCave

In Too Many Daves, Mrs. McCave lamented not giving her 23 Daves more distinct names. You can avoid her mistake by clearly documenting your pins with meaningful names — like Trigger and Echo — in your comments.

But if you think handling these pin ranges is tricky, debugging a PIO program adds an entirely new layer of challenge. In the next Wat, we’ll dive into the kludgy debugging methods available. Let’s see just how far we can push them.

Wat 9: Kludgy debugging

I like to debug with interactive breakpoints in VS Code. I also do print debugging, where you insert temporary info statements to see what the code is doing and the values of variables. Using the Raspberry Pi Debug Probe and probe-rs, I can do both of these with regular Rust code on the Pico.

With PIO programming, however, I can do neither.

The fallback is push-to-print debugging. In PIO, you temporarily output integer values of interest. Then, in Rust, you use info! to print those values for inspection.

For example, in the following PIO program, we temporarily add instructions to push the value of x for debugging. We also include set and out to push a constant value, such as 7, which must be between 0 and 31 inclusive.

.program distance

; X is the last value sent. Initialize it to
; u32::MAX which means 'echo timeout'
; (Set X to u32::MAX by subtracting 1 from 0)
   set x, 0
subtraction_trick:
   jmp x-- subtraction_trick

; DEBUG: See the value of x
   mov isr, x
   push

; Read the max echo wait into OSR
   pull                         ; same as pull block

; DEBUG: Send constant value
   set y, 7           ; Push '7' so that we know we've reached this point
   mov isr, y
   push
; ...

Back in Rust, you can read and print these values to help understand what’s happening in the PIO code (full code and project):

  // ...
   distance_state_machine.set_enable(true);
   distance_state_machine.tx().wait_push(MAX_LOOPS).await;
   loop {
       let end_loops = distance_state_machine.rx().wait_pull().await;
       info!("end_loops: {}", end_loops);
   }
  // ...

Outputs:

INFO  Hello, debug!
└─ distance_debug::inner_main::{async_fn#0} @ examplesdistance_debug.rs:27
INFO  end_loops: 4294967295
└─ distance_debug::inner_main::{async_fn#0} @ examplesdistance_debug.rs:57
INFO  end_loops: 7
└─ distance_debug::inner_main::{async_fn#0} @ examplesdistance_debug.rs:57

When push-to-print debugging isn’t enough, you can turn to hardware tools. I bought my first oscilloscope (a FNIRSI DSO152, for $37). With it, I was able to confirm the Echo signal was working. The Trigger signal, however, was too fast for this inexpensive oscilloscope to capture clearly.

Using these methods — especially push-to-print debugging — you can trace the flow of your PIO program, even without a traditional debugger.

Aside: In C/C++ (and potentially Rust), you can get closer to a full debugging experience for PIO, for example, by using the piodebug project.

That concludes the nine Wats, but let’s bring everything together in a bonus Wat.

Bonus Wat 10: Putting it all together

Now that all the components are ready, it’s time to combine them into a working theremin-like musical instrument. We need a Rust monitor program. This program starts both PIO state machines — one for measuring distance and the other for generating tones. It then waits for a new distance measurement, maps that distance to a tone, and sends the corresponding tone frequency to the tone-playing state machine. If the distance is out of range, it stops the tone.

Rust’s Place: At the heart of this system is a function that maps distances (from 0 to 50 cm) to tones (approximately B2 to F5). This function is simple to write in Rust, leveraging Rust’s floating-point math and exponential operations. Implementing this in PIO would be virtually impossible due to its limited instruction set and lack of floating-point support.

Here’s the core monitor program to run the theremin (full file and project):

sound_state_machine.set_enable(true);
distance_state_machine.set_enable(true);
distance_state_machine.tx().wait_push(MAX_LOOPS).await;
loop {
   let end_loops = distance_state_machine.rx().wait_pull().await;
   match loop_difference_to_distance_cm(end_loops) {
       None => {
           info!("Distance: out of range");
           sound_state_machine.tx().wait_push(0).await;
       }
       Some(distance_cm) => {
           let tone_frequency = distance_to_tone_frequency(distance_cm);
           let half_period = sound_state_machine_frequency / tone_frequency as u32 / 2;
           info!("Distance: {} cm, tone: {} Hz", distance_cm, tone_frequency);
           sound_state_machine.tx().push(half_period); // non-blocking push
           Timer::after(Duration::from_millis(50)).await;
       }
   }
}

Using two PIO state machines alongside a Rust monitor program lets you literally run three programs at once. This setup is convenient on its own and is essential when strict timing or very high-frequency I/O operations are required.

Aside: Alternatively, Rust Embassy’s async tasks let you implement cooperative multitasking directly on a single main processor. You code in Rust rather than a mixture of Rust and PIO. Although Embassy tasks don’t literally run in parallel, they switch quickly enough to handle applications like a theremin. Here’s a snippet from theremin_no_pio.rs showing a similar core loop:

loop {
       match distance.measure().await {
           None => {
               info!("Distance: out of range");
               sound.rest().await;
           }
           Some(distance_cm) => {
               let tone_frequency = distance_to_tone_frequency(distance_cm);
               info!("Distance: {} cm, tone: {} Hz", distance_cm, tone_frequency);
               sound.play(tone_frequency).await;
               Timer::after(Duration::from_millis(50)).await;
           }
       }
   }

See our recent article on Rust Embassy programming for more details.

Now that we’ve assembled all the components, let’s watch the video again of me “playing” the musical instrument. On the monitor screen, you can see the debugging prints displaying the distance measurements and the corresponding tones. This visual connection highlights how the system responds in real time.

Conclusion

PIO programming on the Raspberry Pi Pico is a captivating blend of simplicity and complexity, offering unparalleled hardware control while demanding a shift in mindset for developers accustomed to higher-level programming. Through the nine Wats we’ve explored, PIO has both surprised us with its limitations and impressed us with its raw efficiency.

While we’ve covered significant ground — managing state machines, pin assignments, timing intricacies, and debugging — there’s still much more you can learn as needed: DMA, IRQ, side-set pins, differences between PIO on the Pico 1 and Pico 2, autopush and autopull, FIFO join, and more.

Recommended Resources

Code for this project on GitHub
The Pico SDK manual, Chapter 3
Embassy PIO API Documentation
The RP2040 (Pico 1) and RP2350 (Pico 2) datasheets

At its core, PIO’s quirks reflect a design philosophy that prioritizes low-level hardware control with minimal overhead. By embracing these characteristics, PIO will not only meet your project’s demands but also open doors to new possibilities in embedded systems programming.

Please follow Carl on Towards Data Science and on @carlkadie.bsky.social. I write on scientific programming in Rust and Python, machine learning, and statistics. I tend to write about one article per month.

The post Nine Pico PIO Wats with Rust (Part 2) appeared first on Towards Data Science.

Comprehensive Guide to Dependency Management in Python

Vyacheslav Efimov — Fri, 07 Mar 2025 19:34:56 +0000

Introduction

When learning Python, many beginners focus solely on the language and its libraries while completely ignoring virtual environments. As a result, managing Python projects can become a mess: dependencies installed for different projects may have conflicting versions, leading to compatibility issues.

Even when I studied Python, nobody emphasized the importance of virtual environments, which I now find very strange. They are an extremely useful tool for isolating different projects from each other.

In this article, I will explain how virtual environments work, provide several examples, and share useful commands for managing them.

Problem

Imagine you have two Python projects on your laptop, each located in a different directory. You realize that you need to install the latest version of library A for the first project. Later, you switch to the second project and attempt to install library B.

Here’s the problem: library B depends on library A, but it requires a different version than the one you installed earlier.

Since you haven’t used any tool for Dependency Management, all dependencies are installed globally on your computer. Due to the incompatible versions of library A, you encounter an error when trying to install library B.

Solution

To prevent such issues, virtual environments are used. The idea is to allocate a separate storage space for each Python project. Each storage will contain all the externally downloaded dependencies for a specific project in an isolated manner.

More specifically, if we download the same library A for two projects within their own virtual environments, library A will be downloaded twice — once for each environment. Moreover, the versions of the library can differ between the environments because each environment is completely isolated and does not interact with the others.

Now that the motivation behind using virtual environments is clear, let’s explore how to create them in Python.

Virtual environments in Python

It is recommended to create a virtual environment in the root directory of a project. An environment is created using the following command in the terminal:

python -m venv

By convention, is usually named venv, so the command becomes:

python -m venv venv

As a result, this command creates a directory called venv, which contains the virtual environment itself. It is even possible to go inside that directory, but in most cases, it is not very useful, as the venv directory primarily contains system scripts that are not intended to be used directly.

To activate the virtual environment, use the following command:

source venv/bin/activate

Once the environment is activated, we can install dependencies for the project. As long as the venv is activated, any installed dependency will only belong to that environment.

To deactivate the virtual environment, type:

deactivate

Once the environment is deactivated, the terminal returns to its normal state. For example, you can switch to another project and activate its environment there.

Dependency management

Installing libraries

Before installing any dependencies, it is recommended to activate a virtual environment to ensure that installed libraries belong to a single project. This helps avoid global version conflicts.

The most frequently used command for dependency management is pip. Compared to other alternatives, pip is intuitive and simple to use.

To install a library, type:

pip install

In the examples below instead of the , I will write pandas (the most commonly used data analysis library).

So, for instance, if we wanted to download the latest version of pandas, we should have typed:

pip install pandas

In some scenarios, we might need to install a specific version of a library. pip provides a simple syntax to do that:

pip install pandas==2.1.4 # install pandas of version 2.1.4
pip install pandas>=2.1.4 # install pandas of version 2.1.4 or higher
pip install pandas<2.1.4 # install pandas of version less than 2.1.4
pip install pandas>=2.1.2,<2.2.4 # installs the latest version available between 2.1.2 and 2.2.4

Viewing dependency details

If you are interested in a particular dependency that you have installed, a simple way to get more information about it is to use the pip show command:

pip show pandas

For example, the command in the example will output the following information:

Example of the output of the pip show command

Deleting dependency

To remove a dependency from a virtual environment, use the following command:

pip uninstall pandas

After executing this command, all files related to the specified library will be deleted, thus freeing up disk space. However, if you run a Python program that imports this library again, you will encounter an ImportError.

File with requirements

A common practice when managing dependencies is to create a requirements.txt file that contains a list of all downloaded dependencies in the project along with their versions. Here is an example of what it might look like:

fastapi==0.115.5
pydantic==2.10.1
PyYAML==6.0.2
requests==2.32.3
scikit-learn==1.5.2
scipy==1.14.1
seaborn==0.13.2
streamlit==1.40.2
torch==2.5.1
torchvision==0.20.1
tornado==6.4.2
tqdm==4.67.1
urllib3==2.2.3
uvicorn==0.32.1
yolo==0.3.2

Ideally, every time you use the pip install command, you should add a corresponding line to the requirements.txt file to keep track of all the libraries used in the project.

However, if you forget to do that, there is still an alternative: the pip freeze command outputs all of the installed dependencies in the project. Nevertheless, pip freeze can be quite verbose, often including many other library names that are dependencies of the libraries you are using in the project.

pip freeze > requirements.txt

Given this, it’s a good habit to add installed requirements with their versions to the requirements.txt file.

Whenever you clone a Python project, it is expected that a requirements.txt file is already present in the Git repository. To install all the dependencies listed in this file, you use the pip install command along with the -r flag followed by the requirements filename.

pip install -r requirements.txt

Conversely, whenever you work on a Python project, you should create a requirements.txt file so that other collaborators can easily install the necessary dependencies.

.gitignore

When working with version control systems, virtual environments should never be pushed to Git! Instead, they must be mentioned in a .gitignore file.

Virtual environments tend to be very large, and if there is an existing requirements.txt file, there should be no problem downloading all necessary dependencies.

Conclusion

In this article, we have looked at the very important concept of virtual environments. By isolating downloaded dependencies for different projects, they allow for easier management of multiple Python Projects.

All images are by the author unless noted otherwise.

The post Comprehensive Guide to Dependency Management in Python appeared first on Towards Data Science.

Nine Rules for SIMD Acceleration of Your Rust Code (Part 1)

Carl Kadie — Thu, 27 Feb 2025 02:03:02 +0000

Thanks to Ben Lichtman (B3NNY) at the Seattle Rust Meetup for pointing me in the right direction on SIMD.

SIMD (Single Instruction, Multiple Data) operations have been a feature of Intel/AMD and ARM CPUs since the early 2000s. These operations enable you to, for example, add an array of eight i32 to another array of eight i32 with just one CPU operation on a single core. Using SIMD operations greatly speeds up certain tasks. If you’re not using SIMD, you may not be fully using your CPU’s capabilities.

Is this “Yet Another Rust and SIMD” article? Yes and no. Yes, I did apply SIMD to a programming problem and then feel compelled to write an article about it. No, I hope that this article also goes into enough depth that it can guide you through your project. It explains the newly available SIMD capabilities and settings in Rust nightly. It includes a Rust SIMD cheatsheet. It shows how to make your SIMD code generic without leaving safe Rust. It gets you started with tools such as Godbolt and Criterion. Finally, it introduces new cargo commands that make the process easier.

The range-set-blaze crate uses its RangeSetBlaze::from_iter method to ingest potentially long sequences of integers. When the integers are “clumpy”, it can do this 30 times faster than Rust’s standard HashSet::from_iter. Can we do even better if we use Simd operations? Yes!

See this documentation for the definition of “clumpy”. Also, what happens if the integers are not clumpy? RangeSetBlaze is 2 to 3 times slower than HashSet.

On clumpy integers, RangeSetBlaze::from_slice — a new method based on SIMD operations — is 7 times faster than RangeSetBlaze::from_iter. That makes it more than 200 times faster than HashSet::from_iter. (When the integers are not clumpy, it is still 2 to 3 times slower than HashSet.)

Over the course of implementing this speed up, I learned nine rules that can help you accelerate your projects with SIMD operations.

The rules are:

Use nightly Rust and core::simd, Rust’s experimental standard SIMD module.
CCC: Check, Control, and Choose your computer’s SIMD capabilities.
Learn core::simd, but selectively.
Brainstorm candidate algorithms.
Use Godbolt and AI to understand your code’s assembly, even if you don’t know assembly language.
Generalize to all types and LANES with in-lined generics, (and when that doesn’t work) macros, and (when that doesn’t work) traits.

See Part 2 for these rules:

7. Use Criterion benchmarking to pick an algorithm and to discover that LANES should (almost) always be 32 or 64.

8. Integrate your best SIMD algorithm into your project with as_simd, special code for i128/u128, and additional in-context benchmarking.

9. Extricate your best SIMD algorithm from your project (for now) with an optional cargo feature.

Aside: To avoid wishy-washiness, I call these “rules”, but they are, of course, just suggestions.

Rule 1: Use nightly Rust and `core::simd`, Rust’s experimental standard SIMD module.

Rust can access SIMD operations either via the stable core::arch module or via nighty’s core::simd module. Let’s compare them:

core::arch

Stable
“[N]ot the easiest thing in the world”
Offers high-performance to downstream users of your crate. For example, because regex and memchr went this route, over 100,000 other crates got stable SIMD acceleration for free. [Reddit discussion, some relevant memchr code]

core::simd

Nightly
Delightfully easy and portable.
Limits downstream users to nightly.

I decided to go with “easy”. If you decide to take the harder road, starting first with the easier path may still be worthwhile.

In either case, before we try to use SIMD operations in a larger project, let’s make sure we can get them working at all. Here are the steps:

First, create a project called simd_hello:

cargo new simd_hello
cd simd_hello

Edit src/main.rs to contain (Rust playground):

// Tell nightly Rust to enable 'portable_simd'
#![feature(portable_simd)]
use core::simd::prelude::*;

// constant Simd structs
const LANES: usize = 32;
const THIRTEENS: Simd = Simd::::from_array([13; LANES]);
const TWENTYSIXS: Simd = Simd::::from_array([26; LANES]);
const ZEES: Simd = Simd::::from_array([b'Z'; LANES]);

fn main() {
    // create a Simd struct from a slice of LANES bytes
    let mut data = Simd::::from_slice(b"URYYBJBEYQVQBUBCRVGFNYYTBVATJRYY");

    data += THIRTEENS; // add 13 to each byte

    // compare each byte to 'Z', where the byte is greater than 'Z', subtract 26
    let mask = data.simd_gt(ZEES); // compare each byte to 'Z'
    data = mask.select(data - TWENTYSIXS, data);

    let output = String::from_utf8_lossy(data.as_array());
    assert_eq!(output, "HELLOWORLDIDOHOPEITSALLGOINGWELL");
    println!("{}", output);
}

Next — full SIMD capabilities require the nightly version of Rust. Assuming you have Rust installed, install nightly (rustup install nightly). Make sure you have the latest nightly version (rustup update nightly). Finally, set this project to use nightly (rustup override set nightly).

You can now run the program with cargo run. The program applies ROT13 decryption to 32 bytes of upper-case letters. With SIMD, the program can decrypt all 32 bytes simultaneously.

Let’s look at each section of the program to see how it works. It starts with:

#![feature(portable_simd)]
use core::simd::prelude::*;

Rust nightly offers its extra capabilities (or “features”) only on request. The #![feature(portable_simd)] statement requests that Rust nightly make available the new experimental core::simd module. The use statement then imports the module’s most important types and traits.

In the code’s next section, we define useful constants:

const LANES: usize = 32;
const THIRTEENS: Simd = Simd::::from_array([13; LANES]);
const TWENTYSIXS: Simd = Simd::::from_array([26; LANES]);
const ZEES: Simd = Simd::::from_array([b'Z'; LANES]);

The Simd struct is a special kind of Rust array. (It is, for example, always memory aligned.) The constant LANES tells the length of the Simd array. The from_array constructor copies a regular Rust array to create a Simd. In this case, because we want const Simd’s, the arrays we construct from must also be const.

The next two lines copy our encrypted text into data and then adds 13 to each letter.

let mut data = Simd::::from_slice(b"URYYBJBEYQVQBUBCRVGFNYYTBVATJRYY");
data += THIRTEENS;

What if you make an error and your encrypted text isn’t exactly length LANES (32)? Sadly, the compiler won’t tell you. Instead, when you run the program, from_slice will panic. What if the encrypted text contains non-upper-case letters? In this example program, we’ll ignore that possibility.

The += operator does element-wise addition between the Simd data and Simd THIRTEENS. It puts the result in data. Recall that debug builds of regular Rust addition check for overflows. Not so with SIMD. Rust defines SIMD arithmetic operators to always wrap. Values of type u8 wrap after 255.

Coincidentally, Rot13 decryption also requires wrapping, but after ‘Z’ rather than after 255. Here is one approach to coding the needed Rot13 wrapping. It subtracts 26 from any values on beyond ‘Z’.

let mask = data.simd_gt(ZEES);
data = mask.select(data - TWENTYSIXS, data);

This says to find the element-wise places beyond ‘Z’. Then, subtract 26 from all values. At the places of interest, use the subtracted values. At the other places, use the original values. Does subtracting from all values and then using only some seem wasteful? With SIMD, this takes no extra computer time and avoids jumps. This strategy is, thus, efficient and common.

The program ends like so:

let output = String::from_utf8_lossy(data.as_array());
assert_eq!(output, "HELLOWORLDIDOHOPEITSALLGOINGWELL");
println!("{}", output);

Notice the .as_array() method. It safely transmutes a Simd struct into a regular Rust array without copying.

Surprisingly to me, this program runs fine on computers without SIMD extensions. Rust nightly compiles the code to regular (non-SIMD) instructions. But we don’t just want to run “fine”, we want to run faster. That requires us to turn on our computer’s SIMD power.

Rule 2: CCC: Check, Control, and Choose your computer’s SIMD capabilities.

To make SIMD programs run faster on your machine, you must first discover which SIMD extensions your machine supports. If you have an Intel/AMD machine, you can use my simd-detect cargo command.

Run with:

rustup override set nightly
cargo install cargo-simd-detect --force
cargo simd-detect

On my machine, it outputs:

extension       width                   available       enabled
sse2            128-bit/16-bytes        true            true
avx2            256-bit/32-bytes        true            false
avx512f         512-bit/64-bytes        true            false

This says that my machine supports the sse2, avx2, and avx512f SIMD extensions. Of those, by default, Rust enables the ubiquitous twenty-year-old sse2 extension.

The SIMD extensions form a hierarchy with avx512f above avx2 above sse2. Enabling a higher-level extension also enables the lower-level extensions.

Most Intel/AMD computers also support the ten-year-old avx2 extension. You enable it by setting an environment variable:

# For Windows Command Prompt
set RUSTFLAGS=-C target-feature=+avx2

# For Unix-like shells (like Bash)
export RUSTFLAGS="-C target-feature=+avx2"

“Force install” and run simd-detect again and you should see that avx2 is enabled.

# Force install every time to see changes to 'enabled'
cargo install cargo-simd-detect --force
cargo simd-detect

extension         width                   available       enabled
sse2            128-bit/16-bytes        true            true
avx2            256-bit/32-bytes        true            true
avx512f         512-bit/64-bytes        true            false

Alternatively, you can turn on every SIMD extension that your machine supports:

# For Windows Command Prompt
set RUSTFLAGS=-C target-cpu=native

# For Unix-like shells (like Bash)
export RUSTFLAGS="-C target-cpu=native"

On my machine this enables avx512f, a newer SIMD extension supported by some Intel computers and a few AMD computers.

You can set SIMD extensions back to their default (sse2 on Intel/AMD) with:

# For Windows Command Prompt
set RUSTFLAGS=

# For Unix-like shells (like Bash)
unset RUSTFLAGS

You may wonder why target-cpu=native isn’t Rust’s default. The problem is that binaries created using avx2 or avx512f won’t run on computers missing those SIMD extensions. So, if you are compiling only for your own use, use target-cpu=native. If, however, you are compiling for others, choose your SIMD extensions thoughtfully and let people know which SIMD extension level you are assuming.

Happily, whatever level of SIMD extension you pick, Rust’s SIMD support is so flexible you can easily change your decision later. Let’s next learn details of programming with SIMD in Rust.

Rule 3: Learn `core::simd`, but selectively.

To build with Rust’s new core::simd module you should learn selected building blocks. Here is a cheatsheet with the structs, methods, etc., that I’ve found most useful. Each item includes a link to its documentation.

Structs

Simd – a special, aligned, fixed-length array of SimdElement. We refer to a position in the array and the element stored at that position as a “lane”. By default, we copy Simd structs rather than reference them.
Mask – a special Boolean array showing inclusion/exclusion on a per-lane basis.

SimdElements

Floating-Point Types: f32, f64
Integer Types: i8, u8, i16, u16, i32, u32, i64, u64, isize, usize
— but not i128, u128

`Simd` constructors

Simd::from_array – creates a Simd struct by copying a fixed-length array.
Simd::from_slice – creates a Simd struct by copying the first LANE elements of a slice.
Simd::splat – replicates a single value across all lanes of a Simd struct.
slice::as_simd – without copying, safely transmutes a regular slice into an aligned slice of Simd (plus unaligned leftovers).

`Simd` conversion

Simd::as_array – without copying, safely transmutes an Simd struct into a regular array reference.

`Simd` methods and operators

simd[i] – extract a value from a lane of a Simd.
simd + simd – performs element-wise addition of two Simd structs. Also, supported -, *, /, %, remainder, bitwise-and, -or, xor, -not, -shift.
simd += simd – adds another Simd struct to the current one, in place. Other operators supported, too.
Simd::simd_gt – compares two Simd structs, returning a Mask indicating which elements of the first are greater than those of the second. Also, supported simd_lt, simd_le, simd_ge, simd_lt, simd_eq, simd_ne.
Simd::rotate_elements_left – rotates the elements of a Simd struct to the left by a specified amount. Also, rotate_elements_right.
simd_swizzle!(simd, indexes) – rearranges the elements of a Simd struct based on the specified const indexes.
simd == simd – checks for equality between two Simd structs, returning a regular bool result.
Simd::reduce_and – performs a bitwise AND reduction across all lanes of a Simd struct. Also, supported: reduce_or, reduce_xor, reduce_max, reduce_min, reduce_sum (but noreduce_eq).

`Mask` methods and operators

Mask::select – selects elements from two Simd struct based on a mask.
Mask::all – tells if the mask is all true.
Mask::any – tells if the mask contains any true.

All about lanes

Simd::LANES – a constant indicating the number of elements (lanes) in a Simd struct.
SupportedLaneCount – tells the allowed values of LANES. Use by generics.
simd.lanes – const method that tells a Simd struct’s number of lanes.

Low-level alignment, offsets, etc.

When possible, use to_simd instead.

mem::size_of, mem::align_of, mem::align_to, intrinsics::offset, pointer::read_unaligned (unsafe), pointer::write_unaligned (unsafe), mem::transmute (unsafe, const)

More, perhaps of interest

deinterleave, gather_or, reverse, scatter

With these building blocks at hand, it’s time to build something.

Rule 4: Brainstorm candidate algorithms.

What do you want to speed up? You won’t know ahead of time which SIMD approach (of any) will work best. You should, therefore, create many algorithms that you can then analyze (Rule 5) and benchmark (Rule 7).

I wanted to speed up range-set-blaze, a crate for manipulating sets of “clumpy” integers. I hoped that creating is_consecutive, a function to detect blocks of consecutive integers, would be useful.

Background: Crate range-set-blaze works on “clumpy” integers. “Clumpy”, here, means that the number of ranges needed to represent the data is small compared to the number of input integers. For example, these 1002 input integers

100, 101, …, 489, 499, 501, 502, …, 998, 999, 999, 100, 0

Ultimately become three Rust ranges:

0..=0, 100..=499, 501..=999.

(Internally, the RangeSetBlaze struct represents a set of integers as a sorted list of disjoint ranges stored in a cache efficient BTreeMap.)

Although the input integers are allowed to be unsorted and redundant, we expect them to often be “nice”. RangeSetBlaze’s from_iter constructor already exploits this expectation by grouping up adjacent integers. For example, from_iter first turns the 1002 input integers into four ranges

100..=499, 501..=999, 100..=100, 0..=0.

with minimal, constant memory usage, independent of input size. It then sorts and merges these reduced ranges.

I wondered if a new from_slice method could speed construction from array-like inputs by quickly finding (some) consecutive integers. For example, could it— with minimal, constant memory — turn the 1002 inputs integers into five Rust ranges:

100..=499, 501..=999, 999..=999, 100..=100, 0..=0.

If so, from_iter could then quickly finish the processing.

Let’s start by writing is_consecutive with regular Rust:

pub const LANES: usize = 16;
pub fn is_consecutive_regular(chunk: &[u32; LANES]) -> bool {
    for i in 1..LANES {
        if chunk[i - 1].checked_add(1) != Some(chunk[i]) {
            return false;
        }
    }
    true
}

The algorithm just loops through the array sequentially, checking that each value is one more than its predecessor. It also avoids overflow.

Looping over the items seemed so easy, I wasn’t sure if SIMD could do any better. Here was my first attempt:

Splat0

use std::simd::prelude::*;

const COMPARISON_VALUE_SPLAT0: Simd =
    Simd::from_array([15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]);

pub fn is_consecutive_splat0(chunk: Simd) -> bool {
    if chunk[0].overflowing_add(LANES as u32 - 1) != (chunk[LANES - 1], false) {
        return false;
    }
    let added = chunk + COMPARISON_VALUE_SPLAT0;
    Simd::splat(added[0]) == added
}

Here is an outline of its calculations:

Source: This and all following images by author.

It first (needlessly) checks that the first and last items are 15 apart. It then creates added by adding 15 to the 0th item, 14 to the next, etc. Finally, to see if all items in added are the same, it creates a new Simd based on added’s 0th item and then compares. Recall that splat creates a Simd struct from one value.

Splat1 & Splat2

When I mentioned the is_consecutive problem to Ben Lichtman, he independently came up with this, Splat1:

const COMPARISON_VALUE_SPLAT1: Simd =
    Simd::from_array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]);

pub fn is_consecutive_splat1(chunk: Simd) -> bool {
    let subtracted = chunk - COMPARISON_VALUE_SPLAT1;
    Simd::splat(chunk[0]) == subtracted
}

Splat1 subtracts the comparison value from chunk and checks if the result is the same as the first element of chunk, splatted.

He also came up with a variation called Splat2 that splats the first element of subtracted rather than chunk. That would seemingly avoid one memory access.

I’m sure you are wondering which of these is best, but before we discuss that let’s look at two more candidates.

Swizzle

Swizzle is like Splat2 but uses simd_swizzle! instead of splat. Macro simd_swizzle! creates a new Simd by rearranging the lanes of an old Simd according to an array of indexes.

pub fn is_consecutive_sizzle(chunk: Simd) -> bool {
    let subtracted = chunk - COMPARISON_VALUE_SPLAT1;
    simd_swizzle!(subtracted, [0; LANES]) == subtracted
}

Rotate

This one is different. I had high hopes for it.

const COMPARISON_VALUE_ROTATE: Simd =
    Simd::from_array([4294967281, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]);

pub fn is_consecutive_rotate(chunk: Simd) -> bool {
    let rotated = chunk.rotate_elements_right::<1>();
    chunk - rotated == COMPARISON_VALUE_ROTATE
}

The idea is to rotate all the elements one to the right. We then subtract the original chunk from rotated. If the input is consecutive, the result should be “-15” followed by all 1’s. (Using wrapped subtraction, -15 is 4294967281u32.)

Now that we have candidates, let’s start to evaluate them.

Rule 5: Use Godbolt and AI to understand your code’s assembly, even if you don’t know assembly language.

We’ll evaluate the candidates in two ways. First, in this rule, we’ll look at the assembly language generated from our code. Second, in Rule 7, we’ll benchmark the code’s speed.

Don’t worry if you don’t know assembly language, you can still get something out of looking at it.

The easiest way to see the generated assembly language is with the Compiler Explorer, AKA Godbolt. It works best on short bits of code that don’t use outside crates. It looks like this:

Referring to the numbers in the figure above, follow these steps to use Godbolt:

Open godbolt.org with your web browser.
Add a new source editor.
Select Rust as your language.
Paste in the code of interest. Make the functions of interest public (pub fn). Do not include a main or unneeded functions. The tool doesn’t support external crates.
Add a new compiler.
Set the compiler version to nightly.
Set options (for now) to -C opt-level=3 -C target-feature=+avx512f.
If there are errors, look at the output.
If you want to share or save the state of the tool, click “Share”

From the image above, you can see that Splat2 and Sizzle are exactly the same, so we can remove Sizzle from consideration. If you open up a copy of my Godbolt session, you’ll also see that most of the functions compile to about the same number of assembly operations. The exceptions are Regular — which is much longer — and Splat0 — which includes the early check.

In the assembly, 512-bit registers start with ZMM. 256-bit registers start YMM. 128-bit registers start with XMM. If you want to better understand the generated assembly, use AI tools to generate annotations. For example, here I ask Bing Chat about Splat2:

Try different compiler settings, including -C target-feature=+avx2 and then leaving target-feature completely off.

Fewer assembly operations don’t necessarily mean faster speed. Looking at the assembly does, however, give us a sanity check that the compiler is at least trying to use SIMD operations, inlining const references, etc. Also, as with Splat1 and Swizzle, it can sometimes let us know when two candidates are the same.

You may need disassembly features beyond what Godbolt offers, for example, the ability to work with code the uses external crates. B3NNY recommended the cargo tool cargo-show-asm to me. I tried it and found it reasonably easy to use.

The range-set-blaze crate must handle integer types beyond u32. Moreover, we must pick a number of LANES, but we have no reason to think that 16 LANES is always best. To address these needs, in the next rule we’ll generalize the code.

Rule 6: Generalize to all types and LANES with in-lined generics, (and when that doesn’t work) macros, and (when that doesn’t work) traits.

Let’s first generalize Splat1 with generics.

#[inline]
pub fn is_consecutive_splat1_gen(
    chunk: Simd,
    comparison_value: Simd,
) -> bool
where
    T: SimdElement + PartialEq,
    Simd: Sub, Output = Simd>,
    LaneCount: SupportedLaneCount,
{
    let subtracted = chunk - comparison_value;
    Simd::splat(chunk[0]) == subtracted
}

First, note the #[inline] attribute. It’s important for efficiency and we’ll use it on pretty much every one of these small functions.

The function defined above, is_consecutive_splat1_gen, looks great except that it needs a second input, called comparison_value, that we have yet to define.

If you don’t need a generic const comparison_value, I envy you. You can skip to the next rule if you like. Likewise, if you are reading this in the future and creating a generic const comparison_value is as effortless as having your personal robot do your household chores, then I doubly envy you.

We can try to create a comparison_value_splat_gen that is generic and const. Sadly, neither From nor alternative T::One are const, so this doesn’t work:

// DOESN'T WORK BECAUSE From is not const
pub const fn comparison_value_splat_gen() -> Simd
where
    T: SimdElement + Default + From + AddAssign,
    LaneCount: SupportedLaneCount,
{
    let mut arr: [T; N] = [T::from(0usize); N];
    let mut i_usize = 0;
    while i_usize < N {
        arr[i_usize] = T::from(i_usize);
        i_usize += 1;
    }
    Simd::from_array(arr)
}

Macros are the last refuge of scoundrels. So, let’s use macros:

#[macro_export]
macro_rules! define_is_consecutive_splat1 {
    ($function:ident, $type:ty) => {
        #[inline]
        pub fn $function(chunk: Simd<$type, N>) -> bool
        where
            LaneCount: SupportedLaneCount,
        {
            define_comparison_value_splat!(comparison_value_splat, $type);

            let subtracted = chunk - comparison_value_splat();
            Simd::splat(chunk[0]) == subtracted
        }
    };
}
#[macro_export]
macro_rules! define_comparison_value_splat {
    ($function:ident, $type:ty) => {
        pub const fn $function() -> Simd<$type, N>
        where
            LaneCount: SupportedLaneCount,
        {
            let mut arr: [$type; N] = [0; N];
            let mut i = 0;
            while i < N {
                arr[i] = i as $type;
                i += 1;
            }
            Simd::from_array(arr)
        }
    };
}

This lets us run on any particular element type and all number of LANES (Rust Playground):

define_is_consecutive_splat1!(is_consecutive_splat1_i32, i32);

let a: Simd = black_box(Simd::from_array(array::from_fn(|i| 100 + i as i32)));
let ninety_nines: Simd = black_box(Simd::from_array([99; 16]));
assert!(is_consecutive_splat1_i32(a));
assert!(!is_consecutive_splat1_i32(ninety_nines));

Sadly, this still isn’t enough for range-set-blaze. It needs to run on all element types (not just one) and (ideally) all LANES (not just one).

Happily, there’s a workaround, that again depends on macros. It also exploits the fact that we only need to support a finite list of types, namely: i8, i16, i32, i64, isize, u8, u16, u32, u64, and usize. If you need to also (or instead) support f32 and f64, that’s fine.

If, on the other hand, you need to support i128 and u128, you may be out of luck. The core::simd module doesn’t support them. We’ll see in Rule 8 how range-set-blaze gets around that at a performance cost.

The workaround defines a new trait, here called IsConsecutive. We then use a macro (that calls a macro, that calls a macro) to implement the trait on the 10 types of interest.

pub trait IsConsecutive {
    fn is_consecutive(chunk: Simd) -> bool
    where
        Self: SimdElement,
        Simd: Sub, Output = Simd>,
        LaneCount: SupportedLaneCount;
}

macro_rules! impl_is_consecutive {
    ($type:ty) => {
        impl IsConsecutive for $type {
            #[inline] // very important
            fn is_consecutive(chunk: Simd) -> bool
            where
                Self: SimdElement,
                Simd: Sub, Output = Simd>,
                LaneCount: SupportedLaneCount,
            {
                define_is_consecutive_splat1!(is_consecutive_splat1, $type);
                is_consecutive_splat1(chunk)
            }
        }
    };
}

impl_is_consecutive!(i8);
impl_is_consecutive!(i16);
impl_is_consecutive!(i32);
impl_is_consecutive!(i64);
impl_is_consecutive!(isize);
impl_is_consecutive!(u8);
impl_is_consecutive!(u16);
impl_is_consecutive!(u32);
impl_is_consecutive!(u64);
impl_is_consecutive!(usize);

We can now call fully generic code (Rust Playground):

// Works on i32 and 16 lanes
let a: Simd = black_box(Simd::from_array(array::from_fn(|i| 100 + i as i32)));
let ninety_nines: Simd = black_box(Simd::from_array([99; 16]));

assert!(IsConsecutive::is_consecutive(a));
assert!(!IsConsecutive::is_consecutive(ninety_nines));

// Works on i8 and 64 lanes
let a: Simd = black_box(Simd::from_array(array::from_fn(|i| 10 + i as i8)));
let ninety_nines: Simd = black_box(Simd::from_array([99; 64]));

assert!(IsConsecutive::is_consecutive(a));
assert!(!IsConsecutive::is_consecutive(ninety_nines));

With this technique, we can create multiple candidate algorithms that are fully generic over type and LANES. Next, it is time to benchmark and see which algorithms are fastest.

Those are the first six rules for adding SIMD code to Rust. In Part 2, we look at rules 7 to 9. These rules will cover how to pick an algorithm and set LANES. Also, how to integrate SIMD operations into your existing code and (importantly) how to make it optional. Part 2 concludes with a discussion of when/if you should use SIMD and ideas for improving Rust’s SIMD experience. I hope to see you there.

Please follow Carl on Medium. I write on scientific programming in Rust and Python, machine learning, and statistics. I tend to write about one article per month.

The post Nine Rules for SIMD Acceleration of Your Rust Code (Part 1) appeared first on Towards Data Science.

Is Python Set to Surpass Its Competitors?

Jiayan Yin — Wed, 26 Feb 2025 00:46:31 +0000

A soufflé is a baked egg dish that originated in France in the 18th century. The process of making an elegant and delicious French soufflé is complex, and in the past, it was typically only prepared by professional French pastry chefs. However, with pre-made soufflé mixes now widely available in supermarkets, this classic French dish has found its way into the kitchens of countless households.

Python is like the pre-made soufflé mixes in programming. Many studies have consistently shown that Python is the most popular programming language among developers, and this advantage will continue to expand in 2025. Python stands out compared to languages like C, C++, Java, and Julia because it’s highly readable and expressive, flexible and dynamic, beginner-friendly yet powerful. These characteristics make Python the most suitable programming language for people even without programming basics. The following features distinguish Python from other programming languages:

Dynamic Typing
List Comprehensions
Generators
Argument Passing and Mutability

These features reveal Python’s intrinsic nature as a programming language. Without this knowledge, you’ll never truly understand Python. In today’s article, I will elaborate how Python excels over other programming languages through these features.

Dynamic Typing

For most programming languages like Java or C++, explicit data type declarations are required. But when it comes to Python, you don’t have to declare the type of a variable when you create one. This feature in Python is called dynamic typing, which makes Python flexible and easy to use.

List Comprehensions

List comprehensions are used to generate lists from other lists by applying functions to each element in the list. They provide a concise way to apply loops and optional conditions in a list.

For example, if you’d like to create a list of squares for even numbers between 0 and 9, you can use JavaScript, a regular loop in Python and Python’s list comprehension to achieve the same goal.

JavaScript

let squares = Array.from({ length: 10 }, (_, x) => x)  // Create array [0, 1, 2, ..., 9]
   .filter(x => x % 2 === 0)                          // Filter even numbers
   .map(x => x ** 2);                                 // Square each number
console.log(squares);  // Output: [0, 4, 16, 36, 64]

Regular Loop in Python

squares = []
for x in range(10):
   if x % 2 == 0:
       squares.append(x**2)
print(squares)

Python’s List Comprehension

squares = [x**2 for x in range(10) if x % 2 == 0]print(squares)

All the three sections of code above generate the same list [0, 4, 16, 36, 64], but Python’s list comprehension is the most elegant because the syntax is concise and clearly express the intent while the Python function is more verbose and requires explicit initialization and appending. The syntax of JavaScript is the least elegant and readable because it requires chaining methods of using Array.from, filter, and map. Both Python function and JavaScript function are not intuitive and cannot be read as natural language as Python list comprehension does.

Generator

Generators in Python are a special kind of iterator that allow developers to iterate over a sequence of values without storing them all in memory at once. They are created with the yield keyword. Other programming languages like C++ and Java, though offering similar functionality, don’t have built-in yield keyword in the same simple, integrated way. Here are several key advantages that make Python Generators unique:

Memory Efficiency: Generators yield one value at a time so that they only compute and hold one item in memory at any given moment. This is in contrast to, say, a list in Python, which stores all items in memory.
Lazy Evaluation: Generators enable Python to compute values only as needed. This “lazy” computation results in significant performance improvements when dealing with large or potentially infinite sequences.
Simple Syntax: This might be the biggest reason why developers choose to use generators because they can easily convert a regular function into a generator without having to manage state explicitly.

def fibonacci():
   a, b = 0, 1
   while True:
       yield a
       a, b = b, a + b

fib = fibonacci()
for _ in range(100):
   print(next(fib))

The example above shows how to use the yield keyword when creating a sequence. For the memory usage and time difference between the code with and without Generators, generating 100 Fibonacci numbers can hardly see any differences. But when it comes to 100 million numbers in practice, you’d better use generators because a list of 100 million numbers could easily strain many system resources.

Argument Passing and Mutability

In Python, we don’t really assign values to variables; instead, we bind variables to objects. The result of such an action depends on whether the object is mutable or immutable. If an object is mutable, changes made to it inside the function will affect the original object.

def modify_list(lst):
   lst.append(4)

my_list = [1, 2, 3]
modify_list(my_list)
print(my_list)  # Output: [1, 2, 3, 4]

In the example above, we’d like to append ‘4’ to the list my_list which is [1,2,3]. Because lists are mutable, the behavior append operation changes the original list my_list without creating a copy.

However, immutable objects, such as integers, floats, strings, tuples and frozensets, cannot be changed after creation. Therefore, any modification results in a new object. In the example below, because integers are immutable, the function creates a new integer rather than modifying the original variable.

def modify_number(n):
   n += 10
   return n

a = 5
new_a = modify_number(a)
print(a)      # Output: 5
print(new_a)  # Output: 15

Python’s argument passing is sometimes described as “pass-by-object-reference” or “pass-by-assignment.” This makes Python unique because Python pass references uniformly (pass-by-object-reference) while other languages need to differentiate explicitly between pass-by-value and pass-by-reference. Python’s uniform approach is simple yet powerful. It avoids the need for explicit pointers or reference parameters but requires developers to be mindful of mutable objects.

With Python’s argument passing and mutability, we can enjoy the following benefits in coding:

Memory Efficiency: It saves memory by passing references instead of making full copies of objects. This especially benefits code development with large data structures.
Performance: It avoids unnecessary copies and thus improves the overall coding performance.
Flexibility: This feature provides convenience for updating data structure because developers don’t need to explicitly choose between pass-by-value and pass-by-reference.

However, this characteristic of Python forces developers to carefully choose between mutable and immutable data types and it also brings more complex debugging.

So is Python Really Simple?

Python’s popularity results from its simplicity, memory efficiency, high performance, and beginner-friendiness. It’s also a programming language that looks most like a human’s natural language, so even people who haven’t received systematic and holistic programming training are still able to understand it. These characteristics make Python a top choice among enterprises, academic institutes, and government organisations.

For example, when we’d like to filter out the the “completed” orders with amounts greater than 200, and update a mutable summary report (a dictionary) with the total count and sum of amounts for an e-commerce company, we can use list comprehension to create a list of orders meeting our criteria, skip the declaration of variable types and make changes of the original dictionary with pass-by-assignment.

import random
import time

def order_stream(num_orders):
   """
   A generator that yields a stream of orders.
   Each order is a dictionary with dynamic types:
     - 'order_id': str
     - 'amount': float
     - 'status': str (randomly chosen among 'completed', 'pending', 'cancelled')
   """
   for i in range(num_orders):
       order = {
           "order_id": f"ORD{i+1}",
           "amount": round(random.uniform(10.0, 500.0), 2),
           "status": random.choice(["completed", "pending", "cancelled"])
       }
       yield order
       time.sleep(0.001)  # simulate delay

def update_summary(report, orders):
   """
   Updates the mutable summary report dictionary in-place.
   For each order in the list, it increments the count and adds the order's amount.
   """
   for order in orders:
       report["count"] += 1
       report["total_amount"] += order["amount"]

# Create a mutable summary report dictionary.
summary_report = {"count": 0, "total_amount": 0.0}

# Use a generator to stream 10,000 orders.
orders_gen = order_stream(10000)

# Use a list comprehension to filter orders that are 'completed' and have amount > 200.
high_value_completed_orders = [order for order in orders_gen
                              if order["status"] == "completed" and order["amount"] > 200]

# Update the summary report using our mutable dictionary.
update_summary(summary_report, high_value_completed_orders)

print("Summary Report for High-Value Completed Orders:")
print(summary_report)

If we’d like to achieve the same goal with Java, since Java lacks built-in generators and list comprehensions, we have to generate a list of orders, then filter and update a summary using explicit loops, and thus make the code more complex, less readable and harder to maintain.

import java.util.*;
import java.util.concurrent.ThreadLocalRandom;

class Order {
   public String orderId;
   public double amount;
   public String status;
  
   public Order(String orderId, double amount, String status) {
       this.orderId = orderId;
       this.amount = amount;
       this.status = status;
   }
  
   @Override
   public String toString() {
       return String.format("{orderId:%s, amount:%.2f, status:%s}", orderId, amount, status);
   }
}

public class OrderProcessor {
   // Generates a list of orders.
   public static List generateOrders(int numOrders) {
       List orders = new ArrayList<>();
       String[] statuses = {"completed", "pending", "cancelled"};
       Random rand = new Random();
       for (int i = 0; i < numOrders; i++) {
           String orderId = "ORD" + (i + 1);
           double amount = Math.round(ThreadLocalRandom.current().nextDouble(10.0, 500.0) * 100.0) / 100.0;
           String status = statuses[rand.nextInt(statuses.length)];
           orders.add(new Order(orderId, amount, status));
       }
       return orders;
   }
  
   // Filters orders based on criteria.
   public static List filterHighValueCompletedOrders(List orders) {
       List filtered = new ArrayList<>();
       for (Order order : orders) {
           if ("completed".equals(order.status) && order.amount > 200) {
               filtered.add(order);
           }
       }
       return filtered;
   }
  
   // Updates a mutable summary Map with the count and total amount.
   public static void updateSummary(Map summary, List orders) {
       int count = 0;
       double totalAmount = 0.0;
       for (Order order : orders) {
           count++;
           totalAmount += order.amount;
       }
       summary.put("count", count);
       summary.put("total_amount", totalAmount);
   }
  
   public static void main(String[] args) {
       // Generate orders.
       List orders = generateOrders(10000);
      
       // Filter orders.
       List highValueCompletedOrders = filterHighValueCompletedOrders(orders);
      
       // Create a mutable summary map.
       Map summaryReport = new HashMap<>();
       summaryReport.put("count", 0);
       summaryReport.put("total_amount", 0.0);
      
       // Update the summary report.
       updateSummary(summaryReport, highValueCompletedOrders);
      
       System.out.println("Summary Report for High-Value Completed Orders:");
       System.out.println(summaryReport);
   }
}

Conclusion

Equipped with features of dynamic typing, list comprehensions, generators, and its approach to argument passing and mutability, Python is making itself a simplified coding while enhancing memory efficiency and performance. As a result, Python has become the ideal programming language for self-learners.

Thank you for reading!

The post Is Python Set to Surpass Its Competitors? appeared first on Towards Data Science.

Manage Environment Variables with Pydantic

Marcello Politi — Wed, 12 Feb 2025 20:10:29 +0000

Introduction

Developers work on applications that are supposed to be deployed on some server in order to allow anyone to use those. Typically in the machine where these apps live, developers set up environment variables that allow the app to run. These variables can be API keys of external services, URL of your database and much more.

For local development though, it is really inconvenient to declare these variables on the machine because it is a slow and messy process. So I’d like to share in this short tutorial how to use Pydantic to handle environment variables in a secure way.

.env file

What you commonly do in a Python project is to store all your environment variables in a file named .env. This is a text file containing all the variables in a key : value format. You can use also the value of one of the variables to declare one of the other variables by leveraging the {} syntax.

The following is an example:

#.env file

OPENAI_API_KEY="sk-your private key"
OPENAI_MODEL_ID="gpt-4o-mini"

# Development settings
DOMAIN=example.org
ADMIN_EMAIL=admin@${DOMAIN}

WANDB_API_KEY="your-private-key"
WANDB_PROJECT="myproject"
WANDB_ENTITY="my-entity"

SERPAPI_KEY= "your-api-key"
PERPLEXITY_TOKEN = "your-api-token"

Be aware the .env file should remain private, so it is important that this file is mentioned in your .gitignore file, to be sure that you never push it on GitHub, otherwise, other developers could steal your keys and use the tools you’ve paid for.

env.example file

To ease the life of developers who will clone your repository, you could include an env.example file in your project. This is a file containing only the keys of what is supposed to go into the .env file. In this way, other people know what APIs, tokens, or secrets in general they need to set to make the scripts work.

#env.example

OPENAI_API_KEY=""
OPENAI_MODEL_ID=""

DOMAIN=""
ADMIN_EMAIL=""

WANDB_API_KEY=""
WANDB_PROJECT=""
WANDB_ENTITY=""

SERPAPI_KEY= ""
PERPLEXITY_TOKEN = ""

python-dotenv

python-dotenv is the library you use to load the variables declared into the .env file. To install this library:

pip install python-dotenv

Now you can use the load_dotenv to load the variables. Then get a reference to these variables with the os module.

import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
OPENAI_MODEL_ID = os.getenv('OPENAI_MODEL_ID')

This method will first look into your .env file to load the variables you’ve declared there. If this file doesn’t exist, the variable will be taken from the host machine. This means that you can use the .env file for your local development but then when the code is deployed to a host environment like a virtual machine or Docker container we are going to directly use the environment variables defined in the host environment.

Pydantic

Pydantic is one of the most used libraries in Python for data validation. It is also used for serializing and deserializing classes into JSON and back. It automatically generates JSON schema, reducing the need for manual schema management. It also provides built-in data validation, ensuring that the serialized data adheres to the expected format. Lastly, it easily integrates with popular web frameworks like FastAPI.

pydantic-settings is a Pydantic feature needed to load and validate settings or config classes from environment variables.

!pip install pydantic-settings

We are going to create a class named Settings. This class will inherit BaseSettings. This makes the default behaviours of determining the values of any fields to be read from the .env file. If no var is found in the .env file it will be used the default value if provided.

from pydantic_settings import BaseSettings, SettingsConfigDict

from pydantic import (
    AliasChoices,
    Field,
    RedisDsn,
)


class Settings(BaseSettings):
    auth_key: str = Field(validation_alias='my_auth_key')  
    api_key: str = Field(alias='my_api_key')  

    redis_dsn: RedisDsn = Field(
        'redis://user:pass@localhost:6379/1', #default value
        validation_alias=AliasChoices('service_redis_dsn', 'redis_url'),  
    )

    model_config = SettingsConfigDict(env_prefix='my_prefix_')

In the Settings class above we have defined several fields. The Field class is used to provide extra information about an attribute.

In our case, we setup a validation_alias. So the variable name to look for in the .env file is overridden. In the case reported above, the environment variable my_auth_key will be read instead of auth_key.

You can also have multiple aliases to look for in the .env file that you can specify by leveraging AliasChoises(choise1, choise2).

The last attribute model_config , contains all the variables regarding a particular topic (e.g connection to a db). And this variable will store all .env var that start with the prefix env_prefix.

Instantiate and use settings

The next step would be to actually instantiate and use these settings in your Python project.

from pydantic_settings import BaseSettings, SettingsConfigDict

from pydantic import (
    AliasChoices,
    Field,
    RedisDsn,
)


class Settings(BaseSettings):
    auth_key: str = Field(validation_alias='my_auth_key')  
    api_key: str = Field(alias='my_api_key')  

    redis_dsn: RedisDsn = Field(
        'redis://user:pass@localhost:6379/1', #default value
        validation_alias=AliasChoices('service_redis_dsn', 'redis_url'),  
    )

    model_config = SettingsConfigDict(env_prefix='my_prefix_')

# create immediately a settings object
settings = Settings()

Now what use the settings in other parts of our codebase.

from Settings import settings

print(settings.auth_key)

You finally have an easy access to your settings, and Pydantic helps you validate that the secrets have the correct format. For more advanced validation tips refer to the Pydantic documentation: https://docs.pydantic.dev/latest/

Final thoughts

Managing the configuration of a project is a boring but important part of software development. Secrets like API keys, db connections are what usually power your application. Naively you can hardcode these variables in your code and it will still work, but for obvious reasons, this could not be a good practice. In this article, I showed you an introduction on how to use pydantic settings to have a structured and safe way to handle your configurations.

Linkedin | X (Twitter) | Website

The post Manage Environment Variables with Pydantic appeared first on Towards Data Science.

How to Find Seasonality Patterns in Time Series

Lorenzo Mezzini — Mon, 03 Feb 2025 13:02:03 +0000

Using Fourier Transforms to detect seasonal components

In my professional life as a data scientist, I have encountered time series multiple times. Most of my knowledge comes from my academic experience, specifically my courses in Econometrics (I have a degree in Economics), where we studied statistical properties and models of time series.

Among the models I studied was SARIMA, which acknowledges the seasonality of a time series, however, we have never studied how to intercept and recognize seasonality patterns.

Most of the time I had to find seasonal patterns I simply relied on visual inspections of data. This was until I stumbled on this YouTube video on Fourier transforms and eventually found out what a periodogram is.

In this blog post, I will explain and apply simple concepts that will turn into useful tools that every DS who’s studying time series should know.

Table of Contents

What is a Fourier Transform?
Fourier Transform in Python
Periodogram

Overview

Let’s assume I have the following dataset (AEP energy consumption, CC0 license):

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("data/AEP_hourly.csv", index_col=0) 
df.index = pd.to_datetime(df.index)
df.sort_index(inplace=True)

fig, ax = plt.subplots(figsize=(20,4))
df.plot(ax=ax)
plt.tight_layout()
plt.show()

AEP hourly energy consumption | Image by Author

It is very clear, just from a visual inspection, that seasonal patterns are playing a role, however it might be trivial to intercept them all.

As explained before, the discovery process I used to perform was mainly manual, and it could have looked something as follows:

fig, ax = plt.subplots(3, 1, figsize=(20,9))

df_3y = df[(df.index >= '2006–01–01') & (df.index < '2010–01–01')]
df_3M = df[(df.index >= '2006–01–01') & (df.index < '2006–04–01')]
df_7d = df[(df.index >= '2006–01–01') & (df.index < '2006–01–08')]

ax[0].set_title('AEP energy consumption 3Y')
df_3y[['AEP_MW']].groupby(pd.Grouper(freq = 'D')).sum().plot(ax=ax[0])
for date in df_3y[[True if x % (24 * 365.25 / 2) == 0 else False for x in range(len(df_3y))]].index.tolist():
 ax[0].axvline(date, color = 'r', alpha = 0.5)

ax[1].set_title('AEP energy consumption 3M')
df_3M[['AEP_MW']].plot(ax=ax[1])
for date in df_3M[[True if x % (24 * 7) == 0 else False for x in range(len(df_3M))]].index.tolist():
 ax[1].axvline(date, color = 'r', alpha = 0.5)

ax[2].set_title('AEP energy consumption 7D')
df_7d[['AEP_MW']].plot(ax=ax[2])
for date in df_7d[[True if x % 24 == 0 else False for x in range(len(df_7d))]].index.tolist():
 ax[2].axvline(date, color = 'r', alpha = 0.5)

plt.tight_layout()
plt.show()

AEP hourly energy consumption, smaller timeframe | Image by Author

This is a more in-depth visualization of this time series. As we can see the following patterns are influencing the data: **- a 6 month cycle,

a weekly cycle,
and a daily cycle.**

This dataset shows energy consumption, so these seasonal patterns are easily inferable just from domain knowledge. However, by relying only on a manual inspection we could miss important informations. These could be some of the main drawbacks:

Subjectivity: We might miss less obvious patterns.
Time-consuming : We need to test different timeframes one by one.
Scalability issues: Works well for a few datasets, but inefficient for large-scale analysis.

As a Data Scientist it would be useful to have a tool that gives us immediate feedback on the most important frequencies that compose the time series. This is where the Fourier Transforms come to help.

1. What is a Fourier Transform

The Fourier Transform is a mathematical tool that allows us to "switch domain".

Usually, we visualize our data in the time domain. However, using a Fourier Transform, we can switch to the frequency domain, which shows the frequencies that are present in the signal and their relative contribution to the original time series.

Intuition

Any well-behaved function f(x) can be written as a sum of sinusoids with different frequencies, amplitudes and phases. In simple terms, every signal (time series) is just a combination of simple waveforms.

Image by Author

Where:

F(f) represents the function in the frequency domain.
f(x) is the original function in the time domain.
exp(−i2πf(x)) is a complex exponential that acts as a "frequency filter".

Thus, F(f) tells us how much frequency f is present in the original function.

Example

Let’s consider a signal composed of three sine waves with frequencies 2 Hz, 3 Hz, and 5 Hz:

A Simple Signal in time domain | Image by Author

Now, let’s apply a Fourier Transform to extract these frequencies from the signal:

A Simple Signal in the frequency domain | Image by Author

The graph above represents our signal expressed in the frequency domain instead of the classic time domain. From the resulting plot, we can see that our signal is decomposed in 3 elements of frequency 2 Hz, 3 Hz and 5 Hz as expected from the starting signal.

As said before, any well-behaved function can be written as a sum of sinusoids. With the information we have so far it is possible to decompose our signal into three sinusoids:

A Simple Signal decomposition in its basic wavelength | Image by Author

The original signal (in blue) can be obtained by summing the three waves (in red). This process can easily be applied in any time series to evaluate the main frequencies that compose the time series.

2 Fourier Transform in Python

Given that it is quite easy to switch between the time domain and the frequency domain, let’s have a look at the AEP energy consumption time series we started studying at the beginning of the article.

Python provides the "numpy.fft" library to compute the Fourier Transform for discrete signals. FFT stands for Fast Fourier Transform which is an algorithm used to decompose a discrete signal into its frequency components:

from numpy import fft

X = fft.fft(df['AEP_MW'])
N = len(X)
frequencies = fft.fftfreq(N, 1)
periods = 1 / frequencies
fft_magnitude = np.abs(X) / N

mask = frequencies >= 0

# Plot the Fourier Transform
fig, ax = plt.subplots(figsize=(20, 3))
ax.step(periods[mask], fft_magnitude[mask]) # Only plot positive frequencies
ax.set_xscale('log')
ax.xaxis.set_major_formatter('{x:,.0f}')
ax.set_title('AEP energy consumption - Frequency-Domain')
ax.set_xlabel('Frequency (Hz)')
ax.set_ylabel('Magnitude')
plt.show()

AEP hourly energy consumption in frequency domain | Image by Author

This is the frequency domain visualization of the AEP_MW energy consumption. When we analyze the graph we can already see that at certain frequencies we have a higher magnitude, implying higher importance of such frequencies.

However, before doing so we add one more piece of theory that will allow us to build a periodogram, that will give us a better view of the most important frequencies.

3. Periodogram

The periodogram is a frequency-domain representation of the power spectral density (PSD) of a signal. While the Fourier Transform tells us which frequencies are present in a signal, the periodogram quantifies the power (or intensity) of those frequencies. This passage is usefull as it reduces the noise of less important frequencies.

Mathematically, the periodogram is given by:

Image by Author

Where:

P(f) is the power spectral density (PSD) at frequency f,
X(f) is the Fourier Transform of the signal,
N is the total number of samples.

This can be achieved in Python as follows:

power_spectrum = np.abs(X)**2 / N # Power at each frequency

fig, ax = plt.subplots(figsize=(20, 3))
ax.step(periods[mask], power_spectrum[mask])
ax.set_title('AEP energy consumption Periodogram')
ax.set_xscale('log')
ax.xaxis.set_major_formatter('{x:,.0f}')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Power')
plt.show()

AEP hourly energy consumption Periodogram | Image by Author

From this periodogram, it is now possible to draw conclusions. As we can see the most powerful frequencies sit at:

24 Hz, corresponding to 24h,
4.380 Hz, corresponding to 6 months,
and at 168 Hz, corresponding to the weekly cycle.

These three are the same Seasonality components we found in the manual exercise done in the visual inspection. However, using this visualization, we can see three other cycles, weaker in power, but present:

a 12 Hz cycle,
an 84 Hz cycle, correspondint to half a week,
an 8.760 Hz cycle, corresponding to a full year.

It is also possible to use the function "periodogram" present in scipy to obtain the same result.

from scipy.signal import periodogram

frequencies, power_spectrum = periodogram(df['AEP_MW'], return_onesided=False)
periods = 1 / frequencies

fig, ax = plt.subplots(figsize=(20, 3))
ax.step(periods, power_spectrum)
ax.set_title('Periodogram')
ax.set_xscale('log')
ax.xaxis.set_major_formatter('{x:,.0f}')
plt.xlabel('Frequency (Hz)')
plt.ylabel('Power')
plt.show()

Conclusions

When we are dealing with time series one of the most important components to consider is seasonalities.

In this blog post, we’ve seen how to easily discover seasonalities within a time series using a periodogram. Providing us with a simple-to-implement tool that will become extremely useful in the exploratory process.

However, this is just a starting point of the possible implementations of Fourier Transform that we could benefit from, as there are many more:

Spectrogram
Feature encoding
Time series decomposition
…

Please leave some claps if you enjoyed the article and feel free to comment, any suggestion and feedback is appreciated!

_Here you can find a notebook with the code from this blog post._

The post How to Find Seasonality Patterns in Time Series appeared first on Towards Data Science.

How to Make a Data Science Portfolio That Stands Out

Egor Howell — Sun, 02 Feb 2025 16:32:01 +0000

My website that we are are going to create.

Many people have asked how I made my website. In this article, I want to describe that exact process and walk through all the tools and tech I used.

Note: This tutorial is mainly for Mac users as that’s what I use and can document.

Pre-Requisites

To deploy our website, we will use Git and GitHub quite a lot. The stuff you need is:

GitHub account -> Create one here.
Install git for command line -> Follow this guide.
Generate SSH keys for cloning repos -> Follow this guide.
Basic understanding of git commands -> Check out a guidebook here.
Understanding of command line -> Here is a good tutorial.

However, don’t worry too much. I will walk through all the git commands in this post anyway, but it’s always better to have some intuition of what’s happening.

You will also need Homebrew. This is dubbed the ‘missing package manager for MacOS’ and is very useful for anyone Coding on their Mac.

You can install Homebrew using the command given on their website:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Verify Homebrew is installed by running brew help (you should get a list of brew commands and how to use them).

Finally, have some IDE, like PyCharm or VSCode. Anything you like will do.

Hugo

Hugo is a tool for creating static websites written in GO. It is straightforward to use but requires some programming experience. I wouldn’t recommend it if you don’t have any coding experience whatsoever; in your scenario, no-code solutions are better.

Installation

The first step is to install GO and Hugo using Homebrew. Run the following command to do this:

brew install go hugo

Verify both are installed by running:

hugo version
go version

Creating Site

First, we will create a new website template by running the following:

hugo new site quicksite

Then we will go into that new directory:

cd quickstart

You can view whats in this folder by running ls.

Now, we will initialise this folder as a GitHub repo in preparation for pushing it to GitHub.

git init

Adding Theme

Hugo has many themes and templates that you can use instead of building your website from scratch. A link to all the templates is here.

Congo is the theme I use for my website, and we will be working with it today. However, this process will work for any theme in the list, so if you don’t like the one I use, I am sure you can find something.

Credit to James Panther for creating this template, I recommend you check his stuff out!

There are different ways to add the theme to your repo; my personal preference is through git submodules.

See here for a complete list of adding themes. Many users prefer to use Hugo modules.

Git submodules allow you to put another git repo/project within your own git repo, and you can reference it. The main project can then use the submodule’s code, but it maintains its own commit and branch history.

It may sound complicated, but it’s not that difficult in practice. In essence, it’s just copying and pasting a directory into a project.

To start, we need to add the Congo theme as a submodule (make sure you are in the projects directory when you do this):

git submodule add https://github.com/jpanther/congo.git themes/congo

Your file structure should look something like:

quicksite/
├─ archetupes
├─ assest
├─ config
├─ content
├─ data
├─ i8n
├─ layouts
├─ public/
├─ static
└─ themes/
   └─ congo/

Now, we need to move some theme files into our main project. I used my IDE for the next few steps, but you can just do what feels comfortable to you.

In the root folder, delete the hugo.toml file that was generated by cloning the submodule.

Copy the *.toml config files from the congo theme into your config/_default/ folder. This will ensure you have all the correct theme settings.

The file structure in config should look like this:

config/_default/
├─ config.toml
├─ languages.en.toml
├─ markup.toml
├─ menus.toml
├─ module.toml
└─ params.toml

Go to the config/_default/config.toml file and uncomment the baseURL line and add theme = "congo". To verify the theme is active, run hugo server and click on the local host link. You page should look something like this.

Congrats, the theme is now up and running!

The author of the Congo theme also provides his own guide on installing the theme if you want a second reference point.

Installation

Editing Theme

Let’s now customise the theme. I won’t go over every detail because it will take ages, and the author has a complete guide on their website that you can use for more information.

In general, the structure of the repo looks as follows:

.
├── assets
│   └── img
│       └── author.jpg
├── config
│   └── _default
├── content
│   ├── _index.md
│   ├── about.md
│   └── posts
└── themes
    └── congo

To customise mine, I viewed some other websites that use this theme, found ones I liked, and simply copied their GitHub code. This one was a big inspiration, and you can find the GitHub code here.

For this tutorial, I will show you how I made my homepage, which you can see at the start of the article.

Create content/_index.md page, and add the following to it:

---
title: "About"
showAuthor: false
showDate: false
showReadingTime: false
showTableOfContents: false
showComments: false
---

I'm a Data Scientist & Machine Learning Engineer with a Master's degree in Physics, currently working at [Deliveroo](https://deliveroo.co.uk/).

In my spare time, I create [YouTube videos](https://www.youtube.com/channel/UC9Tl0-lzeDPH4y7LcRwRSQA), 
and write [newsletters](https://newsletter.egorhowell.com/) and [blogs](https://medium.com/@egorhowell) about Data Science and machine learning.


  For any questions or inquiries, feel free to reach out via email

My languages.en.toml file looks as follows:

languageCode = "en"
languageName = "English"
languageDirection = "ltr"
weight = 1

title = "Egor Howell"
# copyright = "Egor Howell"

[params]
  dateFormat = "2 January 2006"

  description = "Personal website of Egor Howell"

[params.author]
  name = "Egor Howell"
  image = "img/pic2.png"
  headline = "Data Scientist, Machine Learning Engineer & Content Creator"
  bio = "Data Scientist, Machine Learning Engineer & Content Creator."
  links = [
    { youtube = "https://youtube.com/@egorhowell" },
    { medium = "https://medium.com/@egorhowell" },
    { instagram = "https://instagram.com/egorhowell" },
    { github = "https://github.com/egorhowell" },
    { linkedin = "https://www.linkedin.com/in/egorhowell/" },
  ]

params.toml is:

# -- Theme Options --
# These options control how the theme functions and allow you to
# customise the display of your website.
#
# Refer to the theme docs for more details about each of these parameters.
# https://jpanther.github.io/congo/docs/configuration/#theme-parameters

colorScheme = "ocean"
defaultAppearance = "light" # valid options: light or dark
autoSwitchAppearance = false
defaultThemeColor = "#007fff"

enableSearch = false
enableCodeCopy = false
enableImageLazyLoading = true

# robots = ""
fingerprintAlgorithm = "sha256"

[header]
  layout = "basic" # valid options: basic, hamburger, hybrid, custom
  # logo = "img/logo.jpg"
  # logoDark = "img/dark-logo.jpg"
  showTitle = true

[footer]
  showCopyright = true
  showThemeAttribution = true
  showAppearanceSwitcher = true
  showScrollToTop = true

[homepage]
  layout = "profile" # valid options: page, profile, custom
  showRecent = false
  recentLimit = 5

[article]
  showDate = false
  showDateUpdated = false
  showAuthor = false
  showBreadcrumbs = false
  showDraftLabel = false
  showEdit = false
  # editURL = "https://github.com/egorhowell/testing_website/"
  editAppendPath = true
  showHeadingAnchors = false
  showPagination = false
  invertPagination = false
  showReadingTime = false
  showTableOfContents = false
  showTaxonomies = false
  showWordCount = false
  showComments = false
  #sharingLinks = ["facebook", "twitter", "mastodon", "pinterest", "reddit", "linkedin", "email"]

[list]
  showBreadcrumbs = false
  showSummary = false
  showTableOfContents = false
  showTaxonomies = false
  groupByYear = false
  paginationWidth = 1

[sitemap]
  excludedKinds = ["taxonomy", "term"]

[taxonomy]
  showTermCount = true

[fathomAnalytics]
  # site = "ABC12345"
  # domain = "llama.yoursite.com"

[plausibleAnalytics]
  # domain = "blog.yoursite.com"
  # event = ""
  # script = ""

[verification]
  # google = ""
  # bing = ""
  # pinterest = ""
  # yandex = ""

And that should be it; your site should be like this (if you copied the above):

There is a lot to customise, so I recommend just playing around for a few hours to get the feel of the theme and get something you like. You can add more pages, change the layout, colour, etc.

I recommend looking at what other people have done with this theme and my website for inspiration.

Full customisation guide by Congo author.

Getting Started

Pushing To GitHub

We need to push this code up to GitHub to access it from Cloudflare.

The first step is to create a git repo on GitHub. Follow this link to do that.

Then run the following commands.

git remote add origin https://github.com//
git branch -M main 
git add .
git commit -m "first commit"
git push -f origin main

You should see your code on GitHub!

Cloudflare

Our website is on GitHub, and we need to deploy it!

I use Cloudflare to host my website, as it’s free to use and has many features. It is an "all-in-one" platform, so it gives you domains, hosting, and analytics all in the same place. You can also manage all your websites in one place, which I really like if you want to expand your business.

It is straightforward to deploy on Cloudflare.

Make a Cloudflare account; see here.
Go to your Cloudflare dashboard.
In Account Home, select Compute (Workers) > Workers & Pages > Create > Pages > Connect to Git > Select Your Git Repo
Add the below in the Build settings

Your website should now be live at the given domain, which will be some thing like https://quicksite-env.pages.dev/.

And that’s it!

If you want more information, checkout Cloudflare’s guide.

Hugo · Cloudflare Pages docs

Other Thoughts

Here are some other things to try if you want

Get a custom domain to connect your webpage to, and follow this guide if you’re interested.
You don’t have to deploy on Cloudflare, there are many solutions out there. Hugo has a great guide here.
Try out other Hugo themes; list them here.

What I showed here is just one way, but many different ways of generating websites and portfolios exist. The main thing is to just create one and not get too bogged down in the tools or tech.

Done is better than perfect

Another Thing!

I have a free newsletter, Dishing the Data, where I share weekly tips and advice as a practising data scientist. Plus, when you subscribe, you will get my FREE data science resume and short PDF version of my AI roadmap!

Dishing The Data | Egor Howell | Substack

Connect With Me

YouTube, LinkedIn, Instagram
→ Break into data science e-book
Book a 1:1 mentoring call

The post How to Make a Data Science Portfolio That Stands Out appeared first on Towards Data Science.

Programming | Towards Data Science

How to Optimize your Python Program for Slowness

Here are the rule sets we’ll investigate:

Rule Set 1: Anything Goes — Infinite Loop

Rule Set 2: Must Halt, Finite Memory — Nested, Fixed-Range Loops

Rule Set 3: Infinite, Zero-Initialized Memory — 5-State Turing Machine

Rule Set 4: Infinite, Zero-Initialized Memory — 6-State Turing Machine (>10↑↑15 steps)

Rule Set 5: Infinite, Zero-Initialized Memory — Plain Python (compute 10↑↑15 without Turing machine emulation)

Increment

Addition

Multiplication

Exponentiation

Tetration

Conclusion

Here’s what surprised me:

How I Would Learn To Code (If I Could Start Over)

My journey

Choose a language

Learn the bare minimum

Avoid trends

Document your learning

What about AI?

Another thing!

Connect with me

PyScript vs. JavaScript: A Battle of Web Titans

Round 1: What Are They?

Round 2: Performance Battle

JavaScript

PyScript

Round 3: Ease of Use & Readability

Round 4: Ecosystem & Libraries

Final Verdict

Nine Pico PIO Wats with Rust (Part 2)

Wat 5: Inconstant constants

The problem: Constants are not as big as they seem

Workarounds for inconstant constants

Wat 6: Conditionals through the looking-glass

The problem: Lopsided conditionals in action

The solution: The way it must be

Lessons from Humpty Dumpty’s conditionals

Wat 7: Overshooting jumps

Wat 8: Too many “pins”

The problem: Pin ranges and subranges

Example: Distance program for the range finder

Configuring Pins

Configuring Multiple Pins

Lessons from Mrs. McCave

Wat 9: Kludgy debugging

Bonus Wat 10: Putting it all together

Conclusion

Comprehensive Guide to Dependency Management in Python

Introduction

Problem

Solution

Virtual environments in Python

Dependency management

Installing libraries

Viewing dependency details

Deleting dependency

File with requirements

.gitignore

Conclusion

Nine Rules for SIMD Acceleration of Your Rust Code (Part 1)

Rule 1: Use nightly Rust and core::simd, Rust’s experimental standard SIMD module.

Rule 2: CCC: Check, Control, and Choose your computer’s SIMD capabilities.

Rule 3: Learn core::simd, but selectively.

Structs

SimdElements

Simd constructors

Simd conversion

Simd methods and operators

Mask methods and operators

All about lanes

Low-level alignment, offsets, etc.

More, perhaps of interest

Rule 4: Brainstorm candidate algorithms.

Splat0

Splat1 & Splat2

Swizzle

Rotate

Rule 1: Use nightly Rust and `core::simd`, Rust’s experimental standard SIMD module.

Rule 3: Learn `core::simd`, but selectively.

`Simd` constructors

`Simd` conversion

`Simd` methods and operators

`Mask` methods and operators