코세라 - 딥러닝 테스트 기록 : Optimization algorithms

코세라 - 딥러닝 테스트 기록 : Optimization algorithms

2020. 11. 4. 13:05ㆍ캐리의 데이터 세상/캐리의 데이터 공부 기록

코세라 - 딥러닝

Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

2주차 테스트 문제 기록

80%를 넘겨야 통과되는데 틀려도 계속 시도할 수 있음. 영어라... 이렇게라도 써놔야 나중에 다시 볼듯하여 문제 및 정답을 긁어서 기록용으로 남김.

Q. Which of these statements about mini-batch gradient descent do you agree with?

One iteration of mini-batch gradient descent (computing on a single mini-batch) is faster than one iteration of batch gradient descent.

Q. Why is the best mini-batch size usually not 1 and not m, but instead something in-between?

If the mini-batch size is 1, you lose the benefits of vectorization across examples in the mini-batch.
If the mini-batch size is m, you end up with batch gradient descent, which has to process the whole training set before making progress.

Q. Suppose your learning algorithm’s cost JJ, plotted as a function of the number of iterations, looks like this:

If you’re using mini-batch gradient descent, this looks acceptable. But if you’re using batch gradient descent, something is wrong.

Q. You use an exponentially weighted average on the London temperature dataset. You use the following to track the temperature: v_{t} = \beta v_{t-1} + (1-\beta)\theta_tvt=βvt−1+(1−β)θt. The red line below was computed using \beta = 0.9β=0.9. What would happen to your red curve as you vary \betaβ? (Check the two that apply)

Increasing \betaβ will shift the red line slightly to the right.

-> True, remember that the red line corresponds to \beta = 0.9β=0.9. In lecture we had a green line $$\beta = 0.98) that is slightly shifted to the right.

Decreasing \betaβ will create more oscillation within the red line.

-> True, remember that the red line corresponds to \beta = 0.9β=0.9. In lecture we had a yellow line $$\beta = 0.98 that had a lot of oscillations.

Q. Consider this figure:

These plots were generated with gradient descent; with gradient descent with momentum (\betaβ = 0.5) and gradient descent with momentum (\betaβ = 0.9). Which curve corresponds to which algorithm?

(1) is gradient descent. (2) is gradient descent with momentum (small \betaβ). (3) is gradient descent with momentum (large \betaβ)

Q. Which of the following statements about Adam is False?

(False) Adam should be used with batch gradient computations, not with mini-batches.

(True) The learning rate hyperparameter \alphaα in Adam usually needs to be tuned.
(True) Adam combines the advantages of RMSProp and momentum
(True) We usually use “default” values for the hyperparameters \beta_1, \beta_2β1,β2 and \varepsilonε in Adam (\beta_1 = 0.9β1=0.9, \beta_2 = 0.999β2=0.999, \varepsilon = 10^{-8}ε=10−8)

저작자표시

'캐리의 데이터 세상 > 캐리의 데이터 공부 기록' 카테고리의 다른 글

인공지능 - 장고 프로젝트 (0)	2020.11.12
코세라 - case study (딥러닝-Bird recognition 문제 중) (0)	2020.11.11
코세라 강의노트 - Structuring Machine Learning Projects : dev/test 사이즈 (0)	2020.11.06
10/29 공부기록 (2)	2020.10.29
10/23 공부 기록 - 하이퍼파라미터 (0)	2020.10.23
10/16 공부 기록 (2)	2020.10.16
공부 기록 - django ~ (0)	2020.10.14

캐리의 데이터 세상

캐리의 데이터 세상