There has been another calculus workshop. Since I’m a scrub tier data sciencer I find this stuff very valuable. So I’m going to go through this stuff again. This time I will be more focused on the questions and my solutions as I have a limited amount of time available to me.
Before we defined the loss of the x,y point classifier using mean square error. This was based on the output of our network being
We set
If we set the
x | y | target |
---|---|---|
2 | 0 | 1 |
-1 | 1 | 0 |
Then we can write out the mse as follows:
I wonder if derivatives are cool with just applying them to the denominators? I can check my work with wolfram alpha. This may mean that me expanding the equation wasn’t required.
I’ve just tested this on Wolfram Alpha and the derivative it calculates is complex. I think this is beyond me at the moment.
So the next question that was introduced was calculating derivatives for multiple variables at once. The core principle is that you can treat a N variable derivative as N separate derivatives, one for each variable. When calculating the derivative of a variable you treat the values of the other variables as constants.
For example, to calculate the derivative of
The two excersizes are:
The first one is straightforward:
I’m not sure how to calculate the derivative for the powers. Ug. It’s the chain rule.
You multiply the derivatives of the two functions. Remember that the value of the outer function is the derivative of the inner function.
Also
Wolfram Alpha agrees with me.
The next challenge is to compute the Jacobian matrix for a given vector valued function. A vector valued function is one that maps a vector to a vector.
A vector is a list of
Here the function maps a vector of size
We are limiting this function to a given point and then attempting to calculate a gradient at that point. The gradient being the derivative of the function at that point.
I think my explanation is confused here really. The vector valued function does not have to be linear, it can be anything. The derivative is calculated by taking a specific point
For this to work
This matrix is introduced as the Jacobian Matrix, and it may not exist. It is also referred to as the total derivative of x.
For an
The excersize is to calculate the Jacobian matrix for a
I’m expecting the matrix of derivatives to look like this:
My formatting could use work however this seems legit? I’ve run out of time at this point and I’m probably going to leave this here.