2938

@2938

Stats and Money

1,333 words

Guestbook
You'll only receive email when 2938 publishes a new post

Percent chance of rain

Google's weather service is ambiguously frustrating.

I'm going to be outside tomorrow from 1 pm until 6 pm. What is the chance it rains while I am outside? Google says 20% chance it rains at 1pm, 30% chance it rains at 3pm and 20% chance at at 5 pm. What does that mean for my question?

Do you kind of just average all of those and say there is a 23% chance it rains while I'm outside? Do you say each of those is the probability it rains in the given window, and each window is independent? Then the probability it rains is very very different. The probability it rains in at least one of those windows under this regime is 55%. The assumption that the window probabilities are independent seems definitely wrong, so what are the window-conditional probabilities? Everytime I see these window percentages, this goes through my head.

Brothels in NYC

I used to live in a residential neighbor in Queens. I needed to go to a welding shop and Google said there was one just down the street from my place in a direction I've never walked. I went there and the neighbor was entirely different. It was an industrial area with semi trucks lining the street.

I was there at five in the morning and there was a building with an open sign that I was 20% sure was a brothel. I decided to check it out. When a 45 year old woman in a sparkly and revealing gymnastic outfit answered the door, I was 100% sure it was a brothel.

"how much does it cost?"

"80 dollars for an hour. Plus tip"

"Ok I'll go get cash and be right back"

I never came back but now I can detect brothels everywhere in NYC. I think everyone walks right past them because they aren't looking for them. I usually go in and ask how much. It's always 80 dollars for an hour.

Video Poker in NYC

Gambling is illegal in NYC basically. You can read my previous post on how much I love the 14th amendment if you want to know how I feel about it.

For some reason, all the bars in Queens have a video poker machine in the back. They always say something like "Does not dispense money". But who's paying video poker for fun? Does the bar owner give you tab credit if you win? Cash? How does it work.

One time I thought I would give it a good. I was in a towny bar, with a bunch of townies at the bar. I got up to the machine and started fiddling around my wallet for money. I never got to figure out how it worked because someone came running from the bar and said "whoa whoa whoa I'm in the middle of playing on this machine."

I still don't understand what happened there.

Ranking the Bill of Rights

14
1
2
10
9
4
5
6
7

Do I only not care about 3 becuase it's such a non-issue. If the DNC or GOP started attacking 3 all the sudden, would I really care about it?

I wish 14 was in the Bill of Rights. Of all the amendments, it makes me feel the best thinking about.

My friend born outside this country likes 5 the best. His amateur opinion is that it is what makes our government so special. I say: what good is 5 if you don't have 14 to protect you from the state government. What good is any of this if you don't have any protection from your state government?

Parametric T-SNE

The original (non-parametric) TSNE paper has almost 6000 citations and everyone uses it. One year after the paper came out, the same author wrote a second paper describing parametric TSNE where he trains a neural network to minimize the TSNE loss. This paper only has <200 citations and no one knows about it even though parametric TSNE is much more useful.

You train the NN once and then you can embed an arbitraty number of data points. This is what everyone actually wants to do.

How come everyone uses non-parametric TSNE? I think it's because everyone uses scikit learn which only has non-parametric TSNE.

>>> import numpy as np
>>> from sklearn.manifold import TSNE
>>> X = np.array([[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 1]])
>>> X_embedded = TSNE(n_components=2).fit_transform(X)
>>> X_embedded.shape
(4, 2)

Nothing is easier than fit_transform. The only (good) non-parametric TSNE implementation I could find is in matlab. Yikes.

My main question with parametric TSNE is if data moves a tiny amount in the original space, is it guaranteed to only move a tiny amount in the embedded space? That seems important for a lot of applications (like tracking changes over time).

Drug Addict

My eyes are completely sunken in. Every time I wash my hands after using the bathroom at work, I look in the mirror and think to myself "I should go out there and do the best job I can. If I lose this job, it's going to be really hard finding a new job looking like a drug addict".

Then I go into the same thoughts about how looking like a drug addict runs in my family. Strangely, being a drug addict also runs in my family, but those two things aren't perfectly correlated I guess.

Money Spent September 2018

Two teal blocks on the left are taxes. Middle block is apartment. I don't know how my neighbors make it happen.

My apartment is a rent stabilized scam. Landlord convinced the government to take stolen money from the community to provide cheap housing to the community. Then the landlord doesn't trust the community to live in their property so our building is full of affluent people who do not resemble the community. Then the landlord convinced the government to add so many stipulations to the rent stabilized agreement that the rent is essentially market value two years after the building was constructed. At the end of the day, the government stole money from the poor people in the community so that rich people could move in and the landlord could make more money. Don't even get me started on my clogged drain.

Wasting Money at Costco

That's how I've spent $4,000 at costco over the last year.

I've been tracking all my finances for the last 18 months, and the only thing I've gained from it is realizing how much I spent on seaweed.

Stars and Stripes Soda

$1 for 3L at the Dollar Tree. Comes in cola, diet cola, orange, fruit punch, ginger ale, and more.

They had this deal in Troutdale in 2006, and they have it in NYC now.

Chaining methods in pandas

Chaining methods in pandas makes your code easy to read.

df = (pd.read_csv('test.csv')
      .set_index('myIndex')
      .rename(columns={'column_a': 'ColumnA'})
      .assign(colA=lambda x: x['ColumnA'] * 2)
      .sort_values('colA')
      .tail())

You don't need to know the pandas API to understand what's going on here. It's also way cleaner than the alternative.

df = pd.read_csv('test.csv')
df = df.set_index('myIndex')

And so on...

The best way to chain methods is to take advantage of pipe. pipe takes a function that returns a dataframe. So you can do anything with method chaining. You don't need to wait for pandas to add chaining methods.

def head20(df)
    return df.head(20)

#Then you can throw head20 into your method chaining
df.pipe(head20)


pandas.DataFrame.query

The shittiest dataframe method.

There's (at least) two common ways to filter a dataframe by rows that have a certain value in a column.

a = df.query('name == "myname"')
b = df[df['name'] == 'myname']

pandas.DataFrame.query is really bizarre (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html). It takes a string as a parameter and that string is evaluated with pandas.eval(). The string can use @ to refer to variables.

It's still better than the second method. You can nicely chain the query method, which can't really be said for the second method.

(df
 .query('name == "myname"')
 .groupby('city')
 .sum()
 .reset_index()
 .head())