Hoeffding's Inequality states, loosely, that cannot be too far from .
is called probably approximately correct (PAC-learning)
Example: ; draw a sample and observe .
- 99% of the time
- (This is implied from setting and using given )
- 99.9999996% of the time %
What does this mean?
If I repeatedly pick a sample of size 1,000, observe and claim that
(or that the error bar is ) I will be right 99% of the time.
On any particular sample you may be wrong, but not often.
Suppose a bag is filled with x9 and x1.
Calculate the odds of drawing 3 in a row (obviously, with replacement).
Provide the formula for draws in a row of .
Show that Hoeffding's Inequality holds for six consecutive draws of .
Critical requirement: samples must be independent.
If the sample is constructed in some arbitrary fashion, then indeed we cannot say anything.
Even with independence, can take on arbitrary values.
- Some values are way more likely than others. This is what allows us to learn something – it is likely that .
- The bound does not depend on or the size of the bag.
- The bag can be infinite.
- It’s great that it does not depend on because is unknown.
The key player in the bound is .
- If with very very high probabilty, but not for sure.
In learning, the unknown object is an entire function ; in the bag it was a single number .
White area in second figure:
Green area in second figure:
Define the following notion:
That is, this is the "size" of the green region.
- "marble": .
We can re-frame in terms of
Out-of-sample error: %\pause
Victory! If we just minimize in-sample error, we are likely to be right out of sample!
The entire previous argument assumed a FIXED and then came the data.
Given , a sample can verify whether or not it is good (w.r.t. ):
- if is small, is good, with high confidence.
- if is large, is bad with high confidence.
In this (artificial example) world: we have no control over .
In learning, you actually try to fit the data!
- e.g., perceptron model results from searching an entire hypothesis set for a hypothesis with small .