As previousy mentioned, your "final" is a group project.
Accordingly, you need to start planning relatively soon.
To aid your planning, here are the required elements of that project:
You must let me know your team by this Sunday. This will allow us to assign teams by next Tuesday.
If you fail to report your team, then you will be added to the "willing to be randomly assigned" pool.
The course website has a survey to help aid us in putting together teams.
To combat additional freeloading, we will use a reporting system. Any team member can email me to report another team member's lack of participation secretly. Two strikes will result in a 10% grade deduction; three strikes will result in a 20% deduction.
The following are the basic requirements for statistical learning:
emissions is a response or target that we wish to predict.
We generically refer to the response as .
GDP is a feature, or input, or predictor, or regressor; call it .
Likewise let's test our postulate and call
westernhem our , and so on.
We can refer to the input vector collectively as
We are seeking some unknow function that maps to .
Put another way, we are seeking to explain as follows:
We call the function the target function.
The target function is always unknown. It is the object of learning.
With a good estimate of we can make predictions of at new points .
We can understand which components of are important in explaining , and which are (potentially) irrelevant.
yearsindustrializedhave a big impact on
hydroutilizationtypically does not.
Depending on the complexity of , we may be able to meaningfully understand how each component of affects .
(But we should be careful about assigning causal interpretations)
A "solution" to the learning problem does not consist of .
Rather, the solutions are the algorithm and the hypotheses that the algorithm may choose from---aka the hypothesis set, denoted .
The algorithm and hypothesis set are inseparable.
For example, if one restricts attention to hypotheses that take a linear form, then the hypothesis set could be functions such that
Suppose we want to minimize the squared difference between our predictions and the truth.
That is, we wish to minimize:
Note . This is the irreducible error in the learning problem.
The term represents the reducible error in the problem.
Examining binary outcomes:
signedKyotoProtocol is our response, coded as .
Given some input vector , we categorize
as "likely" members of Kyoto Protocol.
G8country) large weight
A simple form of binary learning takes the following mathematical form:
This can be formally written as
where the "bias weight" corresponds to the threshold.
This is equivalent to a hypothesis set .
This hypothesis set is called the linear separator.
A perceptron predicts the data by using a line or a plane to separate the
How to find a hyperplane that separates the data?
We want to select such that .
We certainly want on the data set .
How do we find such a in the infinite hypothesis set , if it exists?
Start with some weight vector and try to improve it.
A simple iterative method in
foreach iteration where the weight vector is
chooseone misclassified example
updatethe weight such that:
PLA implements our idea: start at some weights and try to improve.
Theorem: If the data can be fit by a linear separator, then after some finite number of steps, the perceptron learning algorithm will find one.
...but after how many steps and what if it can't be separated and is there a faster way?
An easy visual learning problem is seemingly very messy.
For every that fits the data and is
+1'' on the new point, there is one that is−1.''
Since is unknown, it can take on any value outside the data, no matter how large the data.
You cannot know anything for sure about outside the data without making assumptions.
Is there any hope to know anything about outside the data set without making assumptions about ?
Yes, if we are willing to give up the "for sure."
Within this bag of marbles are and marbles
We are going to pick a sample of marbles (with replacement).
Consider a sample composed of
Question: Can we say anything about (outside the data) after observing (the data)?
Question: Then why do we do polling (e.g. to predict the outcome of the presidential election)?
Hoeffding's Inequality states, loosely, that cannot be too far from .
is called probably approximately correct (PAC-learning)
Example: ; draw a sample and observe .
What does this mean?
If I repeatedly pick a sample of size 1,000, observe and claim that
(or that the error bar is ) I will be right 99% of the time.
On any particular sample you may be wrong, but not often.