Undoverfit

A small note on noise

Rome, Italy · · #3

There is a famous Tufte line about charts: that the ones we trust most are usually the ones designed best, not the ones that are most true. I think about it often.

Most of the time, when we squint at a time series and feel like we are seeing a pattern, we are seeing noise that happens to point in a direction. The brain hates randomness — it always finds an arrow. The discipline is to ask, before reading anything off the chart, what kind of squiggle pure chance would have produced on the same axes. Usually, it would have produced one a lot like the one in front of you.

This is the part of statistics that does not make for great Twitter content. It says: probably nothing happened. It says: wait. It says: yes, you found a five-sigma event in a dataset of fifty million possible comparisons, please sit down.

I am writing this down because I want to keep myself honest. Every post that I write about data, finance, or anything in between will have to pass a small private test: would I still believe this if the noise had been a little louder?

Most of the time, the answer will be no. Those posts will not be published.

Data is not the new oil

Rome, Italy · · #2

People keep saying that data is the new oil. It is a useful image for a five-second pitch and a dangerous one for everything that comes after.

Oil is a finite, fungible commodity. A barrel from Texas is the same as a barrel from Saudi Arabia. You buy it, you burn it, it is gone. Data is none of that. My browsing history is not interchangeable with yours. It does not get consumed when used — it gets copied, indefinitely, often in ways neither of us can audit. And unlike oil, the value of any single data point is almost nothing; the value lives in the aggregation.

The better analogy, I think, is water. It needs to be cleaned. It can be poisoned. It pools in places we did not expect. And whoever controls the reservoirs has a kind of power that does not show up on any balance sheet.

If we keep using the oil metaphor, we will keep designing oil-shaped policies for it: drilling rights, royalties, extraction taxes. None of which protect the actual thing at stake, which is the integrity of the person on the other end of the pipeline.

Noroom — the data union I am building — starts from the opposite premise. Your data is not a resource to be extracted; it is a part of you that, with consent, can be pooled with other people’s to negotiate from strength. We will talk about the mechanics in another post.

Why undoverfit

Rome, Italy · · #1

Every model that learns from data must choose how seriously to take what it sees.

Overfit too hard and you mistake every small wrinkle of the past for a law of nature. Underfit and you flatten everything into a line that explains nothing and predicts even less. Between those two failures sits the small, honest place I want this blog to live in — close enough to the data to learn something real, far enough to leave room for what comes next.

I called it Undoverfit because I keep meeting that word in my own head. In statistics, in finance, in the way I read news, in the way I evaluate people and ideas, in the way I think about Noroom — the project I am building to give people their data back. The same trap, in different costumes.

So this is the deal. I will write about technology, data, statistics, finance, entrepreneurship, side projects, and whatever quiet thing catches me on a given week. No daily dispatch. No optimization for the algorithm. Just notes, written as honestly as I can, in a place I own.

If you find one useful, that already makes it worth it.