Similar authors to follow
Manage your follows
About Benjamin Bengfort
Benjamin Bengfort is a Data Scientist who lives inside the beltway but ignores politics (the normal business of DC) favoring technology instead. He is currently working to finish his PhD at the University of Maryland where he studies machine learning and artificial intelligence. His lab does have robots (though this field of study is not one he favors) and, much to his chagrin, they seem to constantly arm said robots with knives and tools; presumably to pursue culinary accolades. Having seen a robot attempt to slice a tomato, Benjamin prefers his own adventures in the kitchen where he specializes in fusion French and Guyanese cuisine as well as BBQ of all types. A professional programmer by trade, a Data Scientist by avocation, Benjamin's writing pursues a diverse range of subjects from Natural Language Processing, to Data Science with Python to analytics with Hadoop.
Customers Also Bought Items By
Books By Benjamin Bengfort
From news and speeches to informal chatter on social media, natural language is one of the richest and most underutilized sources of data. Not only does it come in a constant stream, always changing and adapting in context; it also contains information that is not conveyed by traditional data sources. The key to unlocking natural language is through the creative application of text analytics. This practical book presents a data scientist’s approach to building language-aware products with applied machine learning.
You’ll learn robust, repeatable, and scalable techniques for text analysis with Python, including contextual and linguistic feature engineering, vectorization, classification, topic modeling, entity resolution, graph analysis, and visual steering. By the end of the book, you’ll be equipped with practical methods to solve any number of complex real-world problems.
- Preprocess and vectorize text into high-dimensional feature representations
- Perform document classification and topic modeling
- Steer the model selection process with visual diagnostics
- Extract key phrases, named entities, and graph structures to reason about data in text
- Build a dialog framework to enable chatbots and language-driven interaction
- Use Spark to scale processing power and neural networks to scale model complexity
Os cientistas e os analistas de dados aprenderão a usar diversas técnicas que variam da escrita de aplicações MapReduce e Spark com Python ao uso de modelagem avançada e gerenciamento de dados com Spark MLlib, Hive e HBase. Você também conhecerá os processos analíticos e os sistemas de dados disponíveis para desenvolver e conferir eficácia aos produtos de dados capazes de lidar com – e que, na verdade, exigem – quantidades enormes de dados.
•Entenda os conceitos principais do Hadoop e do processamento em cluster.
•Utilize padrões de projeto e algoritmos analíticos paralelos para criar jobs de análise de dados distribuídos.
•Adquira conhecimentos sobre gerenciamento de dados, mineração e armazém de dados em um contexto distribuído usando Apache Hive e HBase.
•Utilize Sqoop e Apache Flume para entrada de dados a partir de bancos de dados relacionais.
•Programe aplicações Hadoop e Spark complexas com Apache Pig e Spark DataFrames.
•Utilize técnicas de aprendizado de máquina, como classificação, clustering e filtragem colaborativa, com a MLib do Spark.
Ready to use statistical and machine-learning techniques across large data sets? This practical guide shows you why the Hadoop ecosystem is perfect for the job. Instead of deployment, operations, or software development usually associated with distributed computing, you’ll focus on particular analyses you can build, the data warehousing techniques that Hadoop provides, and higher order data workflows this framework can produce.
Data scientists and analysts will learn how to perform a wide range of techniques, from writing MapReduce and Spark applications with Python to using advanced modeling and data management with Spark MLlib, Hive, and HBase. You’ll also learn about the analytical processes and data systems available to build and empower data products that can handle—and actually require—huge amounts of data.
- Understand core concepts behind Hadoop and cluster computing
- Use design patterns and parallel analytical algorithms to create distributed data analysis jobs
- Learn about data management, mining, and warehousing in a distributed context using Apache Hive and HBase
- Use Sqoop and Apache Flume to ingest data from relational databases
- Program complex Hadoop and Spark applications with Apache Pig and Spark DataFrames
- Perform machine learning techniques such as classification, clustering, and collaborative filtering with Spark’s MLlib