Skip to content

Commit b56c746

Browse files
authored
Remove python2 references from Week 07 practice (yandexdataschool#486)
1 parent 24daa47 commit b56c746

File tree

2 files changed

+22
-28
lines changed

2 files changed

+22
-28
lines changed

week07_seq2seq/practice_tf.ipynb

+10-13
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
" # https://stackoverflow.com/a/62482183\n",
1313
" !pip uninstall -y tensorflow\n",
1414
" !pip install tensorflow-gpu==1.13.1 keras==2.3.1\n",
15-
" \n",
15+
"\n",
1616
" !wget -q https://raw.githubusercontent.com/yandexdataschool/Practical_RL/master/setup_colab.sh -O- | bash\n",
1717
"\n",
1818
" !wget -q https://raw.githubusercontent.com/yandexdataschool/Practical_RL/master/week07_seq2seq/basic_model_tf.py\n",
@@ -56,7 +56,7 @@
5656
" * [Image captioning](https://cocodataset.org/#captions-2015) and [image2latex](https://openai.com/requests-for-research/#im2latex) (convolutional encoder, recurrent decoder)\n",
5757
" * Generating [images by captions](https://arxiv.org/abs/1511.02793) (recurrent encoder, convolutional decoder)\n",
5858
" * Grapheme2phoneme - convert words to transcripts\n",
59-
" \n",
59+
"\n",
6060
"We chose simplified __Hebrew->English__ machine translation for words and short phrases (character-level), as it is relatively quick to train even without a gpu cluster."
6161
]
6262
},
@@ -88,10 +88,7 @@
8888
"\n",
8989
"This is mostly due to the fact that many words have several correct translations.\n",
9090
"\n",
91-
"We have implemented this thing for you so that you can focus on more interesting parts.\n",
92-
"\n",
93-
"\n",
94-
"__Attention python2 users!__ You may want to cast everything to unicode later during homework phase, just make sure you do it _everywhere_."
91+
"We have implemented this thing for you so that you can focus on more interesting parts."
9592
]
9693
},
9794
{
@@ -312,7 +309,7 @@
312309
"\n",
313310
"def translate(lines):\n",
314311
" \"\"\"\n",
315-
" You are given a list of input lines. \n",
312+
" You are given a list of input lines.\n",
316313
" Make your neural network translate them.\n",
317314
" :return: a list of output lines\n",
318315
" \"\"\"\n",
@@ -595,7 +592,7 @@
595592
"\n",
596593
" Params:\n",
597594
" - words_ix - a matrix of letter indices, shape=[batch_size,word_length]\n",
598-
" - words_mask - a matrix of zeros/ones, \n",
595+
" - words_mask - a matrix of zeros/ones,\n",
599596
" 1 means \"word is still not finished\"\n",
600597
" 0 means \"word has already finished and this is padding\"\n",
601598
"\n",
@@ -716,7 +713,7 @@
716713
"\n",
717714
"In this section you'll implement algorithm called self-critical sequence training (here's an [article](https://arxiv.org/abs/1612.00563)).\n",
718715
"\n",
719-
"The algorithm is a vanilla policy gradient with a special baseline. \n",
716+
"The algorithm is a vanilla policy gradient with a special baseline.\n",
720717
"\n",
721718
"$$ \\nabla J = E_{x \\sim p(s)} E_{y \\sim \\pi(y|x)} \\nabla log \\pi(y|x) \\cdot (R(x,y) - b(x)) $$\n",
722719
"\n",
@@ -893,13 +890,13 @@
893890
"* You will likely need to adjust pre-training time for such a network.\n",
894891
"* Supervised pre-training may benefit from clipping gradients somehow.\n",
895892
"* SCST may indulge a higher learning rate in some cases and changing entropy regularizer over time.\n",
896-
"* It's often useful to save pre-trained model parameters to not re-train it every time you want new policy gradient parameters. \n",
893+
"* It's often useful to save pre-trained model parameters to not re-train it every time you want new policy gradient parameters.\n",
897894
"* When leaving training for nighttime, try setting REPORT_FREQ to a larger value (e.g. 500) not to waste time on it.\n",
898895
"\n",
899896
"__Formal criteria:__\n",
900897
"To get 5 points, we want you to build an architecture that:\n",
901898
"* _doesn't consist of single GRU_\n",
902-
"* _works better_ than single GRU baseline. \n",
899+
"* _works better_ than single GRU baseline.\n",
903900
"* We also want you to provide either learning curve or trained model, preferably both\n",
904901
"* ... and write a brief report or experiment log describing what you did and how it fared.\n",
905902
"\n",
@@ -908,7 +905,7 @@
908905
" * __Vanilla:__ layer_i of encoder last state goes to layer_i of decoder initial state\n",
909906
" * __Every tick:__ feed encoder last state _on every iteration_ of decoder.\n",
910907
" * __Attention:__ allow decoder to \"peek\" at one (or several) positions of encoded sequence on every tick.\n",
911-
" \n",
908+
"\n",
912909
"The most effective (and cool) of those is, of course, attention.\n",
913910
"You can read more about attention [in this nice blog post](https://distill.pub/2016/augmented-rnns/). The easiest way to begin is to use \"soft\" attention with \"additive\" or \"dot-product\" intermediate layers.\n",
914911
"\n",
@@ -975,4 +972,4 @@
975972
},
976973
"nbformat": 4,
977974
"nbformat_minor": 1
978-
}
975+
}

week07_seq2seq/practice_torch.ipynb

+12-15
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
" * [Image captioning](https://cocodataset.org/#captions-2015) and [image2latex](https://htmlpreview.github.io/?https://github.com/openai/requests-for-research/blob/master/_requests_for_research/im2latex.html) (convolutional encoder, recurrent decoder)\n",
2828
" * Generating [images by captions](https://arxiv.org/abs/1511.02793) (recurrent encoder, convolutional decoder)\n",
2929
" * Grapheme2phoneme - convert words to transcripts\n",
30-
" \n",
30+
"\n",
3131
"We chose simplified __Hebrew->English__ machine translation for words and short phrases (character-level), as it is relatively quick to train even without a gpu cluster."
3232
]
3333
},
@@ -74,10 +74,7 @@
7474
"\n",
7575
"This is mostly due to the fact that many words have several correct translations.\n",
7676
"\n",
77-
"We have implemented this thing for you so that you can focus on more interesting parts.\n",
78-
"\n",
79-
"\n",
80-
"__Attention python2 users!__ You may want to cast everything to unicode later during homework phase, just make sure you do it _everywhere_."
77+
"We have implemented this thing for you so that you can focus on more interesting parts."
8178
]
8279
},
8380
{
@@ -289,7 +286,7 @@
289286
"source": [
290287
"def translate(lines, max_len=MAX_OUTPUT_LENGTH):\n",
291288
" \"\"\"\n",
292-
" You are given a list of input lines. \n",
289+
" You are given a list of input lines.\n",
293290
" Make your neural network translate them.\n",
294291
" :return: a list of output lines\n",
295292
" \"\"\"\n",
@@ -545,7 +542,7 @@
545542
"\n",
546543
"* __Train loss__ - that's your model's crossentropy over minibatches. It should go down steadily. Most importantly, it shouldn't be NaN :)\n",
547544
"* __Val score distribution__ - distribution of translation edit distance (score) within batch. It should move to the left over time.\n",
548-
"* __Val score / training time__ - it's your current mean edit distance. This plot is much whimsier than loss, but make sure it goes below 8 by 2500 steps. \n",
545+
"* __Val score / training time__ - it's your current mean edit distance. This plot is much whimsier than loss, but make sure it goes below 8 by 2500 steps.\n",
549546
"\n",
550547
"If it doesn't, first try to re-create both model and opt. You may have changed its weight too much while debugging. If that doesn't help, it's debugging time."
551548
]
@@ -584,7 +581,7 @@
584581
"\n",
585582
"In this section you'll implement algorithm called self-critical sequence training (here's an [article](https://arxiv.org/abs/1612.00563)).\n",
586583
"\n",
587-
"The algorithm is a vanilla policy gradient with a special baseline. \n",
584+
"The algorithm is a vanilla policy gradient with a special baseline.\n",
588585
"\n",
589586
"$$ \\nabla J = E_{x \\sim p(s)} E_{y \\sim \\pi(y|x)} \\nabla log \\pi(y|x) \\cdot (R(x,y) - b(x)) $$\n",
590587
"\n",
@@ -637,7 +634,7 @@
637634
"\n",
638635
" # compute log_pi(a_t|s_t), shape = [batch, seq_length]\n",
639636
" logp_sample = <YOUR CODE>\n",
640-
" \n",
637+
"\n",
641638
" # ^-- hint: look at how crossentropy is implemented in supervised learning loss above\n",
642639
" # mind the sign - this one should not be multiplied by -1 :)\n",
643640
"\n",
@@ -727,11 +724,11 @@
727724
"<img src=https://github.com/yandexdataschool/Practical_RL/raw/master/yet_another_week/_resource/do_something_scst.png width=400>\n",
728725
"\n",
729726
" * As usual, don't expect improvements right away, but in general the model should be able to show some positive changes by 5k steps.\n",
730-
" * Entropy is a good indicator of many problems. \n",
727+
" * Entropy is a good indicator of many problems.\n",
731728
" * If it reaches zero, you may need greater entropy regularizer.\n",
732729
" * If it has rapid changes time to time, you may need gradient clipping.\n",
733730
" * If it oscillates up and down in an erratic manner... it's perfectly okay for entropy to do so. But it should decrease at the end.\n",
734-
" \n",
731+
"\n",
735732
" * We don't show loss_history cuz it's uninformative for pseudo-losses in policy gradient. However, if something goes wrong you can check it to see if everything isn't a constant zero."
736733
]
737734
},
@@ -800,13 +797,13 @@
800797
"* You will likely need to adjust pre-training time for such a network.\n",
801798
"* Supervised pre-training may benefit from clipping gradients somehow.\n",
802799
"* SCST may indulge a higher learning rate in some cases and changing entropy regularizer over time.\n",
803-
"* It's often useful to save pre-trained model parameters to not re-train it every time you want new policy gradient parameters. \n",
800+
"* It's often useful to save pre-trained model parameters to not re-train it every time you want new policy gradient parameters.\n",
804801
"* When leaving training for nighttime, try setting REPORT_FREQ to a larger value (e.g. 500) not to waste time on it.\n",
805802
"\n",
806803
"__Formal criteria:__\n",
807804
"To get 5 points, we want you to build an architecture that:\n",
808805
"* _doesn't consist of single GRU_\n",
809-
"* _works better_ than single GRU baseline. \n",
806+
"* _works better_ than single GRU baseline.\n",
810807
"* We also want you to provide either learning curve or trained model, preferably both\n",
811808
"* ... and write a brief report or experiment log describing what you did and how it fared.\n",
812809
"\n",
@@ -815,7 +812,7 @@
815812
" * __Vanilla:__ layer_i of encoder last state goes to layer_i of decoder initial state\n",
816813
" * __Every tick:__ feed encoder last state _on every iteration_ of decoder.\n",
817814
" * __Attention:__ allow decoder to \"peek\" at one (or several) positions of encoded sequence on every tick.\n",
818-
" \n",
815+
"\n",
819816
"The most effective (and cool) of those is, of course, attention.\n",
820817
"You can read more about attention [in this nice blog post](https://distill.pub/2016/augmented-rnns/). The easiest way to begin is to use \"soft\" attention with \"additive\" or \"dot-product\" intermediate layers.\n",
821818
"\n",
@@ -875,4 +872,4 @@
875872
},
876873
"nbformat": 4,
877874
"nbformat_minor": 1
878-
}
875+
}

0 commit comments

Comments
 (0)