Bibliographie commentée¶
Livres¶
Introduction au machine learning¶
« Learning From Data », Yaser S. Abu-Mostafa, Malik Magdon-Ismail et Hsuan-Tien Lin, 2012 [AMMIL12]
« Neural Networks and Deep Learning », Michael A. Nielsen, 2015 [Nie15] (gratuit)
« Deep Learning », Ian Goodfellow, Yoshua Bengio et Aaron Courville, 2016 [GBC16]
« Introduction to Deep Learning », Eugene Charniak, 2019 [Cha19]
« Introduction au Machine Learning », Chloé-Agathe Azencott, 2022 [Aze22] (en français, avec une version gratuite en PDF)
« Deep Learning for Coders with fastai and PyTorch », Jeremy Howard et Sylvain Gugger,2020 [HG20]
« Deep Learning with Python », François Chollet, seconde édition, 2021 [Cho21]
Guide utilisateur de scikit-learn [coub]
« Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow », Aurélien Géron, 2023 [Ger23a] [Ger23b] [Ger]
« The Little Book of Deep Learning », François Fleuret, 2024 [Fle24] (gratuit)
NLP¶
« A Primer on Neural Network Models for Natural Language Processing », Yoav Goldberg [Gol15]
« Natural Language Processing », Jacob Eisenstein, 2018 [Eis18]. Voir également les supports de cours [Eis] associés
« Natural Language Processing with PyTorch », Delip Rao et Brian McMahan, 2019 [DM19]
« Transformers for Natural Language Processing », Denis Rothman, 2021 [Rot21]
« Natural Language Processing with Transformers », Lewis Tunstall, Leandro von Werra et Thomas Wolf, 2022 [TvWW22]
« Speech and Language Processing », Daniel Jurafsky et James H. Martin, 2024 [JM24]
Cours¶
Cours « Speech and natural language processing » du Master MVA (Mathématiques, Vision, Apprentissage) de l’ENS Paris-Saclay [BCW+24]
Cours « Deep Learning » de François Fleuret à l’Université de Genève [Fle22]
Cours « Natural Language Processing with Deep Learning » de Stanford (CS224N) [cou24]
Cours « Apprendre les langues aux machines » de Benoît Sagot au Collège de France, dans le cadre de la chaire annuelle « Informatique et sciences numériques » [Sag23]
Cours NLP de « Hugging Face » [coua]
Tutoriels¶
Neural Networks: Zero to Hero¶
Andrej Karpathy est un ingénieur travaillant sur les réseaux de neurones. Il met en ligne de nombreuses ressources éducatives sur le sujet, comme une introduction d’une heure sur les LLM: Intro to Large Language Models [Kar23c] ou une série de tutoriels très complets pour construire des LLM, Neural Networks: Zero to Hero [Kar], dont voici la liste:
The spelled-out intro to neural networks and backpropagation: building micrograd [Kar22f]
micrograd on github
Notebooks jupyter sur github
Exercices sur Google colab
The spelled-out intro to langage modeling: building makemore [Kar22e]
Building makemore part 2: MLP [Kar22a]
Building makemore part 3: Activations & Gradients, BatchNorm [Kar22b]
Building makemore part 4: Becoming a BackProp Ninja [Kar22c]
Building makemore part 5: Building a WaveNet [Kar22d]
Let’s build GPT: from scratch, in code, spelled out [Kar23a]
State of GPT [Kar23b]
Let’s build the GPT Tokenizer [Kar24a]
Let’s reproduce GPT-2 (124M) [Kar24b]
Suivre ces tutoriels et apprendre la construction « from scratch » d’architecture de réseaux de neurones pour faire du NLP est une expérience très enrichissante. Comme le dit Andrej:
These 94 lines of code are everything that is needed to train a neural network. Everything else is just efficiency.
This is my earlier project Micrograd. It implements a scalar-valued auto-grad engine. You start with some numbers at the leafs (usually the input data and the neural network parameters), build up a computational graph with operations like + and * that mix them, and the graph ends with a single value at the very end (the loss). You then go backwards through the graph applying chain rule at each node to calculate the gradients. The gradients tell you how to nudge your parameters to decrease the loss (and hence improve your network).
Sometimes when things get too complicated, I come back to this code and just breathe a little. But ok ok you also do have to know what the computational graph should be (e.g. MLP -> Transformer), what the loss function should be (e.g. autoregressive/diffusion), how to best use the gradients for a parameter update (e.g. SGD -> AdamW) etc etc. But it is the core of what is mostly happening.
The 1986 paper from Rumelhart, Hinton, Williams that popularized and used this algorithm (backpropagation) for training neural nets [RHW86],
micrograd
on Github [Kar24c] and my (now somewhat old) YouTube video where I very slowly build and explain: [Kar22f]—Andrej Karpathy sur X, juin 2024.
Références¶
Hugging Face NLP Course. URL: https://huggingface.co/learn/nlp-course/chapter0/1.
Scikit-learn user guide. URL: https://scikit-learn.org/stable/user_guide.html.
Natural Language Processing with Deep Learning (CS224N). 2024. URL: https://web.stanford.edu/class/cs224n/.
Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin. Learning From Data. AML Book, 2012. URL: https://amlbook.com.
Chloé-Agathe Azencott. Introduction au Machine Learning. Dunod, second edition, 2022. URL: https://www.dunod.com/sciences-techniques/introduction-au-machine-learning-1.
Rachel Bawden, Chloé Clavel, Guillaume Wisniewski, Benoît Sagot, and Djamé Seddah. Cours du MVA "Speech and Natural Language Processing". 2024. URL: https://github.com/rbawden/MVA_2024_SL.
Eugene Charniak. Introduction to Deep Learning. The MIT Press, 2019. URL: https://mitpress.mit.edu/9780262039512/introduction-to-deep-learning/.
François Chollet. Deep Learning with Python. Manning, second edition edition, 2021.
Delip Dao and Brian McMahan. Natural Language Processing with PyTorch. O'Reilly, 2019.
Jacob Eisenstein. Course materials for Georgia Tech CS 4650 and 7650, "Natural Language". URL: https://github.com/jacobeisenstein/gt-nlp-class.
Jacob Eisenstein. Natural Language Processing. Auto-édité, November 2018.
François Fleuret. Deep Learning Course. 2022. URL: https://fleuret.org/dlc/.
François Fleuret. The Little Book of Deep Learning. May 2024. URL: https://fleuret.org/francois/lbdl.html.
Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing. October 2015. URL: http://arxiv.org/abs/1510.00726.
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. The MIT Press, 2016. URL: https://www.deeplearningbook.org/.
Aurélien Géron. Machine Learning Notebooks (Google Colab). URL: https://colab.research.google.com/github/ageron/handson-ml3/blob/main/index.ipynb.
Aurélien Géron. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow. O'Reilly, third edition edition, 2023.
Aurélien Géron. Machine Learning Notebooks. 2023. URL: https://github.com/ageron/handson-ml3/.
Jeremy Howard and Sylvain Gugger. Deep Learning for Coders with fastai and PyTorch. O'Reilly, 2020.
Daniel Jurafsky and James H. Martin. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition with language models. Stanford University, 3rd edition, August 2024. URL: https://web.stanford.edu/~jurafsky/slp3/.
Andrej Karpathy. Neural Networks: Zero to Hero. URL: http://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ.
Andrej Karpathy. Building makemore Part 2: MLP. September 2022. URL: https://www.youtube.com/watch?v=TCH_1BHY58I.
Andrej Karpathy. Building makemore Part 3: Activations & Gradients, BatchNorm. October 2022. URL: https://www.youtube.com/watch?v=P6sfmUTpUmc.
Andrej Karpathy. Building makemore Part 4: Becoming a Backprop Ninja. October 2022. URL: https://www.youtube.com/watch?v=q8SA3rM6ckI.
Andrej Karpathy. Building makemore Part 5: Building a WaveNet. November 2022. URL: https://www.youtube.com/watch?v=t3YJ5hKiMQ0.
Andrej Karpathy. The spelled-out intro to language modeling: building makemore. September 2022. URL: https://www.youtube.com/watch?v=PaCmpygFfXo.
Andrej Karpathy. The spelled-out intro to neural networks and backpropagation: building micrograd. August 2022. URL: https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&index=1&pp=iAQB.
Andrej Karpathy. Let's build GPT: from scratch, in code, spelled out. January 2023. URL: https://www.youtube.com/watch?v=kCc8FmEb1nY.
Andrej Karpathy. State of GPT \textbar BRK216HFS. May 2023. URL: https://www.youtube.com/watch?v=bZQun8Y4L2A.
Andrej Karpathy. [1hr Talk] Intro to Large Language Models. November 2023. URL: https://www.youtube.com/watch?v=zjkBMFhNj_g&list=PLAqhIrjkxbuW9U8-vZ_s_cjKPT_FqRStI&index=1&pp=iAQB.
Andrej Karpathy. Let's build the GPT Tokenizer. February 2024. URL: https://www.youtube.com/watch?v=zduSFxRajkE.
Andrej Karpathy. Let's reproduce GPT-2 (124M). June 2024. URL: https://www.youtube.com/watch?v=l8pRSuU81PU.
Andrej Karpathy. Micrograd. June 2024. URL: https://github.com/karpathy/micrograd.
Michael A. Nielsen. Neural Networks and Deep Learning. Determination Press, 2015. URL: http://neuralnetworksanddeeplearning.com.
Denis Rothman. Transformers for Natural Language Processing. Packt, 2021.
David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning representations by back-propagating errors. Nature, 323(6088):533–536, October 1986. URL: https://doi.org/10.1038/323533a0.
Benoît Sagot. Apprendre les langues aux machines. 2023. URL: https://www.college-de-france.fr/fr/chaire/benoit-sagot-informatique-et-sciences-numeriques-annual-chair.
Lewis Tunstall, Leandro von Werra, and Thomas Wolf. Natural Language Processing with Transformers. O'Reilly, revised edition, 2022.