Ai Alignment And Safety | Towards Data Science

Figure 1. Phase plots for focused and multi-concept reasoning in a 2D Hamiltonian system. Image by author

From Newton to LLM’s

Physics

A new approach to AI reasoning optimization

Javier Marin

October 9, 2024

17 min read

AI Alignment and Totalitarianism

Artificial Intelligence

What would Hannah Arendt say about AI alignment?

Avi Chad-Friedman

October 15, 2022

4 min read

An illustration of the CID of the hypothetical model with an arrow from race to predicted grades. We see that race is not requisite as its paths to accuracy are d-separated by either predicted grades or one of its parents. Nodes that are requisite in this CID have positive VoI in the original CID. Source: author generated

Spotting Unfair or Unsafe AI using Graphical Criteria

Artificial Intelligence

How to use causal influence diagrams to recognize the hidden incentives that shape an AI…

Felix Hofstätter

June 24, 2022

16 min read

Counterfactuals for Reinforcement Learning II: Improving Reward Learning

Artificial Intelligence

In the previous part of this series, I introduced counterfactuals and showed how to encode…

Felix Hofstätter

January 15, 2022

14 min read

Photo by Possessed Photography on Unsplash

How Can We Make Artificial Intelligence Ethical?

Artificial Intelligence

For a more just world, a collective effort is necessary.

Murtaza Ali

December 6, 2021

5 min read

Common misconceptions about differential privacy

Data Science

AI Alignment and Safety There is a plethora of content on differential privacy (DP), ranging…

Lipika Ramaswamy

November 22, 2021

7 min read

An agent trained to do a backflip using human feedback and reward modelling. Source: [2]

How learning reward functions can go wrong

Artificial Intelligence

An AI-safety minded perspective on the risks of Reinforcement Learning agents learning their reward functions

Felix Hofstätter

November 16, 2021

15 min read

Are We Ready for the Script Kiddies of AI?

Artificial Intelligence

AI Alignment and Safety Why aren’t we more worried that no one can explain what…

Marianne Bellotti

November 5, 2021

5 min read

Apple’s NeuralHash – How it works and ways to break it

Apple

A guide to the technology, its vulnerabilities and possible mitigations

Swee Kiat Lim

August 20, 2021

8 min read

Towards a Responsible and Ethical AI

Artificial Intelligence

It is not the technology at fault, but the intention

Vidhi Chugh

July 13, 2021

6 min read