Natural Language Processing with Python Quick Start Guide : Going from a Python Developer to an Effective Natural Language Processing Engineer
монография

Содержание

  • Cover; Title Page; Copyright and Credits; About Packt; Contributors; Table of Contents; Preface; Chapter 1: Getting Started with Text Classification; What is NLP?; Why learn about NLP?; You have a problem in mind; Technical achievement; Do something new; Is this book for you?; NLP workflow template; Understanding the problem; Understanding and preparing the data; Quick wins -- proof of concept; Iterating and improving; Algorithms; Pre-processing; Evaluation and deployment; Evaluation; Deployment; Example -- text classification workflow; Launchpad -- programming environment setup
  • Text classification in 30 lines of codeGetting the data; Text to numbers; Machine learning; Summary; Chapter 2: Tidying your Text; Bread and butter -- most common tasks; Loading the data; Exploring the loaded data; Tokenization; Intuitive -- split by whitespace; The hack -- splitting by word extraction; Introducing Regexes; spaCy for tokenization; How does the spaCy tokenizer work?; Sentence tokenization; Stop words removal and case change; Stemming and lemmatization; spaCy for lemmatization; -PRON-; Case-insensitive; Conversion -- meeting to meet; spaCy compared with NLTK and CoreNLP
  • Correcting spellingFuzzyWuzzy; Jellyfish; Phonetic word similarity; What is a phonetic encoding?; Runtime complexity; Cleaning a corpus with FlashText; Summary; Chapter 3: Leveraging Linguistics; Linguistics and NLP; Getting started; Introducing textacy; Redacting names with named entity recognition; Entity types; Automatic question generation; Part-of-speech tagging; Creating a ruleset; Question and answer generation using dependency parsing; Visualizing the relationship; Introducing textacy; Leveling up -- question and answer; Putting it together and the end; Summary
  • Chapter 4: Text Representations -- Words to NumbersVectorizing a specific dataset; Word representations; How do we use pre-trained embeddings?; KeyedVectors API; What is missing in both word2vec and GloVe?; How do we handle Out Of Vocabulary words?; Getting the dataset; Training fastText embedddings; Training word2vec embeddings; fastText versus word2vec; Document embedding; Understanding the doc2vec API; Negative sampling; Hierarchical softmax; Data exploration and model evaluation; Summary; Chapter 5: Modern Methods for Classification; Machine learning for text
  • Sentiment analysis as text classification Simple classifiers; Optimizing simple classifiers; Ensemble methods; Getting the data; Reading data; Simple classifiers; Logistic regression; Removing stop words; Increasing ngram range; Multinomial Naive Bayes; Adding TF-IDF; Removing stop words; Changing fit prior to false; Support vector machines; Decision trees; Random forest classifier; Extra trees classifier; Optimizing our classifiers; Parameter tuning using RandomizedSearch; GridSearch; Ensembling models; Voting ensembles -- Simple majority (aka hard voting); Voting ensembles -- soft voting