Stats, Money, and NYC

2,777 words

You'll only receive email when 2938 publishes a new post

How Copulas Work

You can simulate a correlated bivariate gaussian distribution easily:

And the marginal distributions are gaussian:

But what if you want a correlated joint distribution where the marginals are whatever distribution you can think of?

You can take your correlated bivariate gaussian sample and feed it through the gaussian CDF function. The resulting distribution is uniform in each dimension but the two uniform distributions are still correlated. To reiterate: you start with a sample that is gaussian in dimension 1, and gaussian in dimension 2, and you end up with a sample that is uniform in dimension 1 and uniform in dimension 2 (but the two uniform dimensions retain the correlation that the original correlated bivariate gaussian sample had.

Here's the uniform joint distribution for the original gaussian sample:

This is an intermediate step. The marginal uniform sample that is jointly correlated is the copula.

And the marginals are uniform:

Then you feed the joint uniform sample through the inverse CDF of whatever distributions you care about. Here I'm using the gamma and beta distributions, but honestly, whatever you want. The resulting joint distribution retains the original correlation from your bivariate gaussian sample:

And the marginals are beta and gamma:

Summary: Correlated joint gaussian -> Intermediate correlated joint uniform -> correlated whatever you want.

The only technical part here is why does the CDF trick work? There's a proof here, but the visual in figure 1 has a good intuitive explanation:


Sexually Oriented Businesses in NYC

Giuliani had no love of the 14th amendment. He made these hilarious laws regarding sexually oriented businesses that still exist today. If you ever see a sexually oriented business in NYC go in and look around.

There's this bizarre law about how only 40% of your floorspace can be devoted to adult content/activities. So if you go into a sexually oriented business, they have huge amounts of floorspace that has nothing to do with their business. Most strip clubs just have huge areas that you can't access.

I went in a adult video/toy store today and saw a hilarious take on the law. They had >60% of their floorspace devoted to family friendly VHS. Ninja Mutant Ninja Turtles on VHS, Shakespeare in Love on VHS, etc. I'm talking two giant rooms full of VHS that they had no intention on selling. Also, why turn the heat on? Customers have no interest in this section so it was about 30 degrees. Also, why turn the lights on? The rooms were more or less pitch black. I soaked in it all in for as long as I could handle the owner staring at me.

Then they had their viewing booths (more on these in a minute). Then they had a tiny room of adult DVDs and toys. I am not exagerating when I say they had 20 times the number of family friendly VHS compared to adult DVDs.

But back to the viewing booth. I didn't go in today, but I went in one once. It was remarkable. I went into an adult shop that I thought was empty so I asked the guy behind the counter what the deal with the viewing booth was.

"What do you mean?"

"What goes on in the booth?"

"You pay 5 dollars and then you get 8 minutes of a TV that has 25 channels."

"Ok, I'll try."

And then I got out my credit card to pay him (this is where I showed him that I wasn't pretending. I had no idea what I was doing). He told me you pay inside the booth. There is a machine attached to the TV and it only takes cash. I dug around in my pockets and found four dollars.

"I only have four dollars, dang"

"You know what? (pulls a dollar out of the register). Here (hands me a dollar)"


I start walking over to the booths. Theres about 10 booths, five on each side of a hallway. Before I made it to the booths, he says "Here, try this one" and points to the booth behind him. This really freaked me out. Why did he want me to go in a specific booth? I saw someone was using one in the back (occupied light was on), pretty far from the one he suggested. This had me even more on edge because I thought the place was empty. I walked towards the back past the door he suggested, but he insisted. "Really, try this one". Alright. Just trying to stay on the bull for my first rodeo.

I assumed the booths would be private. I put my five dollars in the machine and flipped through some channels. Everything looked like it was filmed in the late 80s or early 90s. Who would want this? On top of that, the room wasn't even private. There was a hole in the wall facing the next booth over. Then it all came together. That's a glory hole. People don't use these booths to watch adult films. The guy behind the counter knew I was just curious and he steered me away from accidentally stumbling into a booth next to another dude.

Peak 2014

Transportation in NYC

Above is how much we spend each month on transportation.

Every day I get on the subway, sardined in with too many people. The guy next to me coughs in his hand and then rubs it all over the pole. The guy behind me coughs. The guy laying on the bench taking up 300% too much room has fresh urine all over his pants. Three stops later the first white people get on. The coughing intensifies and so does the number of $4.00 fancy drinks in cardboard cups.

Each conductor has their own method of dealing with passengers that try to cram into the train even though they don't fit and end up delaying the train becuase the conductor can't shut the door. My two favorite conductors do the following:

  1. If the delinquent passenger ends up getting on the train, the conductor mashes the "Please don't hold the door open" PSA button the whole way until the next stop and makes the passenger listen to it over and over while everyone glares at him.

  2. The conductor gets on the mic "WHAT UP WHAT UP NYC. IF YOU HOLD THE DOOR OPEN IMMA SIT HERE UNTIL YOU EXIT THE CAR." Then we sit there for another minute until the delinquent passenger realizes the conductor ain't messin around.

Bookstores in NYC

I had three hours to kill before meeting up with someone after work. In order, how I spent my time:

  1. Subway ride (20 min)
  2. Bought and ate Candy (5 min)
  3. Strip Club (20 min)
  4. Starbucks (90 min)
  5. Bookstore (45 min)

When I got to the bookstore it was 7:15PM in LIC. The bookstore was 2 floors, giant space. Rent must have been $8K/month. They had 3 employees ($22K/month). They closed a total of $0 in sales while I was there, and it wasn't looking like it was going to pick up any time in the next decade. But at least the place looked really hip.

I needed to know how this place could be such a disaster so I talked to the guy behind the counter.

"Is this place new?"

"{long winded way of saying we've been here for a year}"

"Oh. Is it not going well?"

"People seem really happy that we are here"

I know that is word-for-word what his response was because it's been stuck in my head. Oh, great. People are happy you are here.

NBA Markov Chains

I'm working on a way to simulate NBA games where I simplifying all the conditional probabilities by making Markov chain assumption that the probability of each event is conditional only on the previous event. This seems like a really nice compromise between disregarding conditional probabilities altogether and overfitting probability estimates that are conditional on the previous 100 events.

Also, I think very few basketball fans think the event probabilities depend on much more than the previous event.

Well, also conditioned on the players on the court. And the coach?

I'm going to use a baysian approach to estimate the conditional probabilities, but I don't think the prior is too important because there are so many events if you only condition on the previous event.

So far, all I've made so far is a giant database of every possession since 2017 and every detail about the possession. I will email it to you if you ask.

Percent chance of rain

Google's weather service is ambiguously frustrating.

I'm going to be outside tomorrow from 1 pm until 6 pm. What is the chance it rains while I am outside? Google says 20% chance it rains at 1pm, 30% chance it rains at 3pm and 20% chance at at 5 pm. What does that mean for my question?

Do you kind of just average all of those and say there is a 23% chance it rains while I'm outside? Do you say each of those is the probability it rains in the given window, and each window is independent? Then the probability it rains is very very different. The probability it rains in at least one of those windows under this regime is 55%. The assumption that the window probabilities are independent seems definitely wrong, so what are the window-conditional probabilities? Everytime I see these window percentages, this goes through my head.

Brothels in NYC

I used to live in a residential neighbor in Queens. I needed to go to a welding shop and Google said there was one just down the street from my place in a direction I've never walked. I went there and the neighbor was entirely different. It was an industrial area with semi trucks lining the street.

I was there at five in the morning and there was a building with an open sign that I was 20% sure was a brothel. I decided to check it out. When a 45 year old woman in a sparkly and revealing gymnastic outfit answered the door, I was 100% sure it was a brothel.

"how much does it cost?"

"80 dollars for an hour. Plus tip"

"Ok I'll go get cash and be right back"

I never came back but now I can detect brothels everywhere in NYC. I think everyone walks right past them because they aren't looking for them. I usually go in and ask how much. It's always 80 dollars for an hour.

Video Poker in NYC

Gambling is illegal in NYC basically. You can read my previous post on how much I love the 14th amendment if you want to know how I feel about it.

For some reason, all the bars in Queens have a video poker machine in the back. They always say something like "Does not dispense money". But who's paying video poker for fun? Does the bar owner give you tab credit if you win? Cash? How does it work.

One time I thought I would give it a go. I was in a towny bar, with a bunch of townies at the bar. I got up to the machine and started fiddling around my wallet for money. I never got to figure out how it worked because someone came running from the bar and said "whoa whoa whoa I'm in the middle of playing on this machine."

I still don't understand what happened there.

Ranking the Bill of Rights


Do I only not care about 3 becuase it's such a non-issue. If the DNC or GOP started attacking 3 all the sudden, would I really care about it?

I wish 14 was in the Bill of Rights. Of all the amendments, it makes me feel the best thinking about.

My friend born outside this country likes 5 the best. His amateur opinion is that it is what makes our government so special. I say: what good is 5 if you don't have 14 to protect you from the state government. What good is any of this if you don't have any protection from your state government?

Parametric T-SNE

The original (non-parametric) TSNE paper has almost 6000 citations and everyone uses it. One year after the paper came out, the same author wrote a second paper describing parametric TSNE where he trains a neural network to minimize the TSNE loss. This paper only has <200 citations and no one knows about it even though parametric TSNE is much more useful.

You train the NN once and then you can embed an arbitraty number of data points. This is what everyone actually wants to do.

How come everyone uses non-parametric TSNE? I think it's because everyone uses scikit learn which only has non-parametric TSNE.

>>> import numpy as np
>>> from sklearn.manifold import TSNE
>>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
>>> X_embedded = TSNE(n_components=2).fit_transform(X)
>>> X_embedded.shape
(4, 2)

Nothing is easier than fit_transform. The only (good) non-parametric TSNE implementation I could find is in matlab. Yikes.

My main question with parametric TSNE is if data moves a tiny amount in the original space, is it guaranteed to only move a tiny amount in the embedded space? That seems important for a lot of applications (like tracking changes over time).

Drug Addict

My eyes are completely sunken in. Every time I wash my hands after using the bathroom at work, I look in the mirror and think to myself "I should go out there and do the best job I can. If I lose this job, it's going to be really hard finding a new job looking like a drug addict".

Then I go into the same thoughts about how looking like a drug addict runs in my family. Strangely, being a drug addict also runs in my family, but those two things aren't perfectly correlated I guess.

Money Spent September 2018

Two teal blocks on the left are taxes. Middle block is apartment. I don't know how my neighbors make it happen.

My apartment is a rent stabilized scam. Landlord convinced the government to take stolen money from the community to provide cheap housing to the community. Then the landlord doesn't trust the community to live in their property so our building is full of affluent people who do not resemble the community. Then the landlord convinced the government to add so many stipulations to the rent stabilized agreement that the rent is essentially market value two years after the building was constructed. At the end of the day, the government stole money from the poor people in the community so that rich people could move in and the landlord could make more money. Don't even get me started on my clogged drain.

Wasting Money at Costco

That's how I've spent $4,000 at costco over the last year.

I've been tracking all my finances for the last 18 months, and the only thing I've gained from it is realizing how much I spent on seaweed.

Stars and Stripes Soda

$1 for 3L at the Dollar Tree. Comes in cola, diet cola, orange, fruit punch, ginger ale, and more.

They had this deal in Troutdale in 2006, and they have it in NYC now.

Chaining methods in pandas

Chaining methods in pandas makes your code easy to read.

df = (pd.read_csv('test.csv')
      .rename(columns={'column_a': 'ColumnA'})
      .assign(colA=lambda x: x['ColumnA'] * 2)

You don't need to know the pandas API to understand what's going on here. It's also way cleaner than the alternative.

df = pd.read_csv('test.csv')
df = df.set_index('myIndex')

And so on...

The best way to chain methods is to take advantage of pipe. pipe takes a function that returns a dataframe. So you can do anything with method chaining. You don't need to wait for pandas to add chaining methods.

def head20(df)
    return df.head(20)

#Then you can throw head20 into your method chaining


The shittiest dataframe method.

There's (at least) two common ways to filter a dataframe by rows that have a certain value in a column.

a = df.query('name == "myname"')
b = df[df['name'] == 'myname']

pandas.DataFrame.query is really bizarre (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html). It takes a string as a parameter and that string is evaluated with pandas.eval(). The string can use @ to refer to variables.

It's still better than the second method. You can nicely chain the query method, which can't really be said for the second method.

 .query('name == "myname"')