Share: Harry Potter and the Methods of Rationality

Is there some amazing rational thing you do when your mind’s running in all different directions?” she managed.
“My own approach is usually to identify the different desires, give them names, conceive of them as separate individuals, and let them argue it out inside my head. So far the main persistent ones are my Hufflepuff, Ravenclaw, Gryffindor, and Slytherin sides, my Inner Critic, and my simulated copies of you, Neville, Draco, Professor McGonagall, Professor Flitwick, Professor Quirrell, Dad, Mum, Richard Feynman, and Douglas Hofstadter.”
Hermione considered trying this before her Common Sense warned that it might be a dangerous sort of thing to pretend. “There’s a copy of me inside your head?”
“Of course there is!” Harry said. The boy suddenly looked a bit more vulnerable. “You mean there isn’t a copy of me living in your head?”
There was, she realized; and not only that, it talked in Harry’s exact voice.
“It’s rather unnerving now that I think about it,” said Hermione. “I do have a copy of you living in my head. It’s talking to me right now using your voice, arguing how this is perfectly normal.”
“Good,” Harry said seriously. “I mean, I don’t see how people could be friends without that.”
She continued reading her book, then, Harry seeming content to watch the pages over her shoulder.
She’d gotten all the way to number seventy, Katherine Scott, who’d apparently invented a way to turn small animals into lemon tarts, when she finally worked up the courage to speak.

Regularization

Based on a small post found here.

One of the standard problems in ML with meta modelling algorithms(Algorithms that run multiple statistical models over given data and identifies the best fitting model. For ex: randomforest  or the rarely practical genetic algorithm) ,  is that they might favour overly complex models that over fit the given training data but on live/test data perform poorly.

The way these meta modelling algorithms work is they have a objective function(usually the RMS of error of the stats/sub model with the data)  they pick the model based on.(i.e: whichever model yields lowest value of the objective function).  So we can just add a complexity penalty(one obvious idea is the rank of the polynomial that model uses to fit, but how does that work for comparing with exponential functions?)  and the objective function suddenly becomes RMS(Error) + Complexity_penalty(model).

 

Now depending on the right choice of Error function and Complexity penalty this can find models that may perform less than more complex models on the training data, but can perform better in the live model scenario.

The idea of complexity penalty itself is not new, I don’t dare say ML borrowed it from scientific experimentation methods or some thing but the idea that the more complex a theory or a model it should be penalized over a simpler theory or model is very old. Here’s a better written post on it.

Related Post: https://softwaremechanic.wordpress.com/2016/08/12/bayesians-vs-frequentistsaka-sampling-theorists/

 

Sleeper Theorems

This inspired me to compile a list:
Since, I’m not a mathematician(pure/applied) I just compiled things from the blog post combining
with the comments:
* Bayes Theorem P(A|B) = P(B|A) *P(A)/P(B)

  • Jensen’s Inequality \psi(E(X)) <= E(\psi(X)) if \psi is a convex function and X is
    a random variable. Extends convexity from sums to integrals(aka discrete to continuous)
  • lto’s lemma: aka(Merton, Black and Scholes option pricing formula)
  • Complex analysis.. should i disqualify this as not a theorem??
  • Standard error of the mean.details link
  • Jordan Curve Theorem: A closed curve has an inside and an outside. (sounds obvious in 2D
    and 3D, perhaps with time as 4D, keeping options open is staying outside closed curves??)
  • kullback-leibler positivity:(no clue need to look up wolfram alpha or wikipedia)
  • Hahn-Banach Theorem (again needs searching)
  • Pigeon-Hole principle link here
  • Taylor’s theorem, (once again continuous function approximated by sum of discrete
    components/expressions) Used in:
  • Approximating any function with nth degree precision
  • Bounding the error term of an approximation
  • Decomposing functions into linear combinations of other functions
  • Kolmogorov’s Inequality for the maximum absolute value of the partial sums of a sequence of IID random variables.( the basis of martingale theory)
  • Karush-Kuhn-Tucker optimality conditions for nonlinear programming, link
    here
  • Envelope Theorem — from economics
  • Zorn’s lemma , also Axiom of Choice
  • Fourier Transform and Fast Fourier Transform

Elbow Method

What is elbow method?:
Elbow method So elbowing is this mechanism of
social reiforcement/communication about something that is generally considered bad to say
aloud or is too subtle to try to find words for.

Okay, just kidding, while that’s kinda true, I was just pranking on y’all. What I want to
talk about is a stats/math/Machine Learning method used when trying to find clusters in a
given dataset. So [Elbow Method] (https://en.wikipedia.org/wiki/Elbow_method_(clustering))
is basically a measure/method for interpretation and validation of conistency of a cluster.
Ugh.. the original sentence in Wikipedia is so long with all 10-letter words, I couldn’t
even type it again.(Above attempt was simplified during typing-on-the-fly)

The basic issue is that, during a cluster analysis we need to settle on a few things:
* A measure for distance within, across and between clusters and points in the
clusters

  • A method/algorithm for updating, re-assigning the points to clusters.
  • Optional: A formula for guessing the number of algorithms. In most cases this is
    optional, and parameterized.

In the case of elbow method it is a visual method for the third option. Basically, it’s a
ratio of variance (within clusters) divided by overall variance. So it explains how much(or
what %)of
the total variance is explained by choosing “n” number of clusters.

The name elbow method comes from visually plotting the number of clusters Vs the ratio(% of
variance explained) and finding that point where there’s an acute bend(if no.of.clusters is
in X-axis), picking the number of clusters at that point.

Share: Harry Potter and the Methods of Rationality

The world around us redunds with opportunities, explodes with opportunities, which nearly all folk ignore because it would require them to violate a habit of thought; in every battle a thousand Hufflepuff bones waiting to be sharpened into spears. If you had thought to try a massed Finite Incantatem on general principles, you would have dispelled Mr. Potter’s suit of chainmail and everything else he was wearing except his underwear, which leads me to suspect that Mr. Potter did not quite realize his own vulnerability. Or you could have had your soldiers swarm Mr. Potter and Mr. Longbottom and physically wrest the wands from their hands. Mr. Malfoy’s own response was not what I would term well-reasoned, but at least he did not wholly ignore his thousand alternatives.” A sardonic smile. “But you, Miss Granger, had the misfortune to remember how to cast the Stunning Hex, and so you did not search your excellent memory for a dozen easier spells that might have proved efficacious. And you pinned all your army’s hopes on your own person, so they lost spirit when you fell. Afterward they continued to cast their futile Sleep Hexes, governed by the habits of fighting that had been trained into them, unable to break the pattern as Mr. Malfoy did. I cannot quite comprehend what goes through people’s minds when they repeat the same failed strategy over and over, but apparently it is an astonishingly rare realization that you can try something else. And so the Sunshine Regiment was wiped out by two soldiers.” The Defense Professor grinned mirthlessly. “One perceives certain similarities to how fifty Death Eaters dominated all of magical Britain, and how our much-loved Ministry continues in its rul

F-test

We’ve already seen what F-score is. Now let’s see what
F-test. Side note: I came across it when I was writing
Elbow Method and my thoughts were, cool another F-word for my readers, so

Here you go:

  • F-test is any stats test that uses F-distribution

  • It is often used when comparing stats models that have been fitted to a data set.. Ahh.. That
    sounds no different from F-score then.. May be just different
    fields(Statistics and Machine Learning) have different naming conventions?? Anyway two different
    F-words.. So let’s just say what F-score/test?? Why two names for samething and move on…

Examples:

  • Null Hypothesis: Means of a given set of normally distributed populations all having same standard deviation being equal.(used in ANOVA)

  • The hypothesis that a proposed regression model fits the data well.

  • The hypothesis that a data set in a regression analysis follows the simpler of two proposed linear models that are nested within each other.

  • It(non-regression type) is also a test of homoskedasticity

Drawbacks:

Formula

  • Formula: explained variance/un-explained variance or between-group-variability/within-group-variability Ok. That doesn’t sound like the F-score

  • Formula(for regression models): ((RSS\_1 - RSS\_2) / (p_2 - p_1)/(RSS_2/(n-p_2)))

Visualization Grammar — VEGA

VEGA:

A visualization grammar, a language for:
* — creating, saving and sharing interactive visualization designs
* — describe visual appearance and interactive behaviour of visualization in json
* — reactive signals that dynamically modify a visualization in response to input
event streams

Key Semantics:

The key semantics are:
* — width, height, padding, autosize (all are for specifying the size of the
visualization)
* — data (an array of data definitions, can define type, name, stream, url, and values of the
data type)

  • — scales (Configurations for as to map columns of data to pixel positions or
    colors, or type of representation(for ex: categorical==> bands etc)).

  • — axes (Configuration of axes)

  • — marks (Graphical primitives, which are used to encode data. Has properties
    position, size, shape, color. Examples are: dot, circle, rectangle(bar-chart),
    star etc..)

  • — Have sub properties encode which marks the graphical primitives
  • — Encode’s Sub property enter and exit configure interactive parts when
    the mark is added or removed.
  • — marks sub property hover, update configure overall interactive parts
  • — each of the hover, update properties can be triggered/linked to signals
    and changed accordingly
  • — A special type of mark called group is present and can contain other
    marks(for composition of graphical primitives to create complex ones)

  • — signals (act as dynamic variables, or as event-listeners to use js parlance)

  • — Has sub property event streams
  • — Can set dynamically evaluated variables as values on events as
    defined.
  • — Events can be mouse over, mouse out, click,drag etc..
  • — Event streams
  • — Has sub properties source, type, marktype, between, consume, filetr etc.
  • — Each sub property decides which mark to change/update, based on which
    event type/user-action/data-change
  • — Event streams also have CSS-style selectors

  • — Legends

  • — Can create legends for the visualizatinos
  • — customize them with sub properties type, orient, fill, opacity, shape

  • — Transforms

  • — As the name implies it can transform data streams
  • — Has sub properties ilke filter, stack, aggregate, bin, collect, fold,
    impute etc

Share: Harry Potter and the Methods of Rationality

There was a legendary episode in social psychology called the Robbers Cave experiment. It had been set up in the bewildered aftermath of World War II, with the intent of investigating the causes and remedies of conflicts between groups. The scientists had set up a summer camp for 22 boys from 22 different schools, selecting them to all be from stable middle-class families. The first phase of the experiment had been intended to investigate what it took to start a conflict between groups. The 22 boys had been divided into two groups of 11 –
– and this had been quite sufficient.
The hostility had started from the moment the two groups had become aware of each others’ existences in the state park, insults being hurled on the first meeting. They’d named themselves the Eagles and the Rattlers (they hadn’t needed names for themselves when they thought they were the only ones in the park) and had proceeded to develop contrasting group stereotypes, the Rattlers thinking of themselves as rough-and-tough and swearing heavily, the Eagles correspondingly deciding to think of themselves as upright-and-proper.
The other part of the experiment had been testing how to resolve group conflicts. Bringing the boys together to watch fireworks hadn’t worked at all. They’d just shouted at each other and stayed apart. What had worked was warning them that there might be vandals in the park, and the two groups needing to work together to solve a failure of the park’s water system. A common task, a common enemy.
Harry had a strong suspicion Professor Quirrell had understood this principle very well indeed when he had chosen to create three armies per year.