Take a look at our Data Mining books. Shulph carries a great selection of Data Mining books, and we are always adding more.
A practical guide for solving complex data processing challenges by applying the best optimizations techniques in Apache Spark. Key Features Learn about the core concepts and the latest developments in Apache Spark Master writing efficient big data applications with Spark's built-in modules for SQL, Streaming, Machine Learning and Graph analysis Get introduced to a variety of optimizations based on the actual experience Book Description Apache Spark is a flexible framework that allows processing of batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to get started with Apache Spark 2.0 and write big data applications for a variety of use cases. It will also introduce you to Apache Spark – one of the most popular Big Data processing frameworks. Although this book is intended to help you get started with Apache Spark, but it also focuses on explaining the core concepts. This practical guide provides a quick start to the Spark 2.0 architecture and its components. It teaches you how to set up Spark on your local machine. As we move ahead, you will be introduced to resilient distributed datasets (RDDs) and DataFrame APIs, and their corresponding transformations and actions. Then, we move on to the life cycle of a Spark application and learn about the techniques used to debug slow-running applications. You will also go through Spark's built-in modules for SQL, streaming, machine learning, and graph analysis. Finally, the book will lay out the best practices and optimization techniques that are key for writing efficient Spark applications. By the end of this book, you will have a sound fundamental understanding of the Apache Spark framework and you will be able to write and optimize Spark applications. What you will learn Learn core concepts such as RDDs, DataFrames, transformations, and more Set up a Spark development environment Choose the right APIs for your applications Understand Spark's architecture and the execution flow of a Spark application Explore built-in modules for SQL, streaming, ML, and graph analysis Optimize your Spark job for better performance Who this book is for If you are a big data enthusiast and love processing huge amount of data, this book is for you. If you are data engineer and looking for the best optimization techniques for your Spark applications, then you will find this book helpful. This book also helps data scientists who want to implement their machine learning algorithms in Spark. You need to have a basic understanding of any one of the programming languages such as Scala, Python or Java.
Enhance your data analysis and predictive modeling skills using popular Python tools Key Features Cover all fundamental libraries for operation and manipulation of Python for data analysis Implement real-world datasets to perform predictive analytics with Python Access modern data analysis techniques and detailed code with scikit-learn and SciPy Book Description Python is one of the most common and popular languages preferred by leading data analysts and statisticians for working with massive datasets and complex data visualizations. Become a Python Data Analyst introduces Python's most essential tools and libraries necessary to work with the data analysis process, right from preparing data to performing simple statistical analyses and creating meaningful data visualizations. In this book, we will cover Python libraries such as NumPy, pandas, matplotlib, seaborn, SciPy, and scikit-learn, and apply them in practical data analysis and statistics examples. As you make your way through the chapters, you will learn to efficiently use the Jupyter Notebook to operate and manipulate data using NumPy and the pandas library. In the concluding chapters, you will gain experience in building simple predictive models and carrying out statistical computation and analysis using rich Python tools and proven data analysis techniques. By the end of this book, you will have hands-on experience performing data analysis with Python. What you will learn Explore important Python libraries and learn to install Anaconda distribution Understand the basics of NumPy Produce informative and useful visualizations for analyzing data Perform common statistical calculations Build predictive models and understand the principles of predictive analytics Who this book is for Become a Python Data Analyst is for entry-level data analysts, data engineers, and BI professionals who want to make complete use of Python tools for performing efficient data analysis. Prior knowledge of Python programming is necessary to understand the concepts covered in this book
Learn quick and effective techniques for developing blockchain-based distributed ledgers with ease Key Features Discover why blockchain is a game changer in the technology landscape Set up blockchain networks using Hyperledger Fabric Write smart contracts at speed with Hyperledger Composer Book Description Blockchain and Hyperledger are open source technologies that power the development of decentralized applications. This Learning Path is your helpful reference for exploring and building blockchain networks using Ethereum, Hyperledger Fabric, and Hyperledger Composer. Blockchain Development with Hyperledger will start off by giving you an overview of blockchain and demonstrating how you can set up an Ethereum development environment for developing, packaging, building, and testing campaign-decentralized applications. You'll then explore the de facto language Solidity, which you can use to develop decentralized applications in Ethereum. Following this, you'll be able to configure Hyperledger Fabric and use it to build private blockchain networks and applications that connect to them. Toward the later chapters, you'll learn how to design and launch a network, and even implement smart contracts in chain code. By the end of this Learning Path, you'll be able to build and deploy your own decentralized applications by addressing the key pain points encountered in the blockchain life cycle. This Learning Path includes content from the following Packt products: Blockchain Quick Start Guide by Xun (Brian) Wu and Weimin Sun Hands-On Blockchain with Hyperledger by Nitin Gaur et al. What you will learn Understand why decentralized applications are necessary Develop and test a decentralized application with Hyperledger Fabric and Hyperledger Composer Write and test a smart contract using Solidity Design transaction models and chain code with Golang Deploy the Composer REpresentational State Transfer (REST) Gateway to access Composer transactions Maintain, monitor, and manage your blockchain solutions Who this book is for This Learning Path is designed for blockchain developers who want to build decentralized applications and smart contracts from scratch using Hyperledger. Basic familiarity with or exposure to any programming language will be useful to get started with this course.
Get unique insights from your data by combining the power of SQL Server, R and Python Key Features Use the features of SQL Server 2017 to implement the data science project life cycle Leverage the power of R and Python to design and develop efficient data models find unique insights from your data with powerful techniques for data preprocessing and analysis Book Description SQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you. This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding,through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment. You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm. What you will learn Use the popular programming languages,T-SQL, R, and Python, for data science Understand your data with queries and introductory statistics Create and enhance the datasets for ML Visualize and analyze data using basic and advanced graphs Explore ML using unsupervised and supervised models Deploy models in SQL Server and perform predictions Who this book is for SQL Server professionals who want to start with data science, and data scientists who would like to start using SQL Server in their projects will find this book to be useful. Prior exposure to SQL Server will be helpful.
Perform efficient fast text representation and classification with Facebook's fastText library Key Features Introduction to Facebook's fastText library for NLP Perform efficient word representations, sentence classification, vector representation Build better, more scalable solutions for text representation and classification Book Description Facebook's fastText library handles text representation and classification, used for Natural Language Processing (NLP). Most organizations have to deal with enormous amounts of text data on a daily basis, and gaining efficient data insights requires powerful NLP tools such as fastText. This book is your ideal introduction to fastText. You will learn how to create fastText models from the command line, without the need for complicated code. You will explore the algorithms that fastText is built on and how to use them for word representation and text classification. Next, you will use fastText in conjunction with other popular libraries and frameworks such as Keras, TensorFlow, and PyTorch. Finally, you will deploy fastText models to mobile devices. By the end of this book, you will have all the required knowledge to use fastText in your own applications at work or in projects. What you will learn Create models using the default command line options in fastText Understand the algorithms used in fastText to create word vectors Combine command line text transformation capabilities and the fastText library to implement a training, validation, and prediction pipeline Explore word representation and sentence classification using fastText Use Gensim and spaCy to load the vectors, transform, lemmatize, and perform other NLP tasks efficiently Develop a fastText NLP classifier using popular frameworks, such as Keras, Tensorflow, and PyTorch Who this book is for This book is for data analysts, data scientists, and machine learning developers who want to perform efficient word representation and sentence classification using Facebook's fastText library. Basic knowledge of Python programming is required.
Discover the power of location data to build effective, intelligent data models with Geospatial ecosystems Key Features Manipulate location-based data and create intelligent geospatial data models Build effective location recommendation systems used by popular companies such as Uber A hands-on guide to help you consume spatial data and parallelize GIS operations effectively Book Description Data scientists, who have access to vast data streams, are a bit myopic when it comes to intrinsic and extrinsic location-based data and are missing out on the intelligence it can provide to their models. This book demonstrates effective techniques for using the power of data science and geospatial intelligence to build effective, intelligent data models that make use of location-based data to give useful predictions and analyses. This book begins with a quick overview of the fundamentals of location-based data and how techniques such as Exploratory Data Analysis can be applied to it. We then delve into spatial operations such as computing distances, areas, extents, centroids, buffer polygons, intersecting geometries, geocoding, and more, which adds additional context to location data. Moving ahead, you will learn how to quickly build and deploy a geo-fencing system using Python. Lastly, you will learn how to leverage geospatial analysis techniques in popular recommendation systems such as collaborative filtering and location-based recommendations, and more. By the end of the book, you will be a rockstar when it comes to performing geospatial analysis with ease. What you will learn Learn how companies now use location data Set up your Python environment and install Python geospatial packages Visualize spatial data as graphs Extract geometry from spatial data Perform spatial regression from scratch Build web applications which dynamically references geospatial data Who this book is for Data Scientists who would like to leverage location-based data and want to use location-based intelligence in their data models will find this book useful. This book is also for GIS developers who wish to incorporate data analysis in their projects. Knowledge of Python programming and some basic understanding of data analysis are all you need to get the most out of this book.
Put your Haskell skills to work and generate publication-ready visualizations in no time at all Key Features Take your data analysis skills to the next level using the power of Haskell Understand regression analysis, perform multivariate regression, and untangle different cluster varieties Create publication-ready visualizations of data Book Description Every business and organization that collects data is capable of tapping into its own data to gain insights how to improve. Haskell is a purely functional and lazy programming language, well-suited to handling large data analysis problems. This book will take you through the more difficult problems of data analysis in a hands-on manner. This book will help you get up-to-speed with the basics of data analysis and approaches in the Haskell language. You'll learn about statistical computing, file formats (CSV and SQLite3), descriptive statistics, charts, and progress to more advanced concepts such as understanding the importance of normal distribution. While mathematics is a big part of data analysis, we've tried to keep this course simple and approachable so that you can apply what you learn to the real world. By the end of this book, you will have a thorough understanding of data analysis, and the different ways of analyzing data. You will have a mastery of all the tools and techniques in Haskell for effective data analysis. What you will learn Learn to parse a CSV file and read data into the Haskell environment Create Haskell functions for common descriptive statistics functions Create an SQLite3 database using an existing CSV file Learn the versatility of SELECT queries for slicing data into smaller chunks Apply regular expressions in large-scale datasets using both CSV and SQLite3 files Create a Kernel Density Estimator visualization using normal distribution Who this book is for This book is intended for people who wish to expand their knowledge of statistics and data analysis via real-world examples. A basic understanding of the Haskell language is expected. If you are feeling brave, you can jump right into the functional programming style.
Solve all big data problems by learning how to create efficient data models Key Features Create effective models that get the most out of big data Apply your knowledge to datasets from Twitter and weather data to learn big data Tackle different data modeling challenges with expert techniques presented in this book Book Description Modeling and managing data is a central focus of all big data projects. In fact, a database is considered to be effective only if you have a logical and sophisticated data model. This book will help you develop practical skills in modeling your own big data projects and improve the performance of analytical queries for your specific business requirements. To start with, you'll get a quick introduction to big data and understand the different data modeling and data management platforms for big data. Then you'll work with structured and semi-structured data with the help of real-life examples. Once you've got to grips with the basics, you'll use the SQL Developer Data Modeler to create your own data models containing different file types such as CSV, XML, and JSON. You'll also learn to create graph data models and explore data modeling with streaming data using real-world datasets. By the end of this book, you'll be able to design and develop efficient data models for varying data sizes easily and efficiently. What you will learn Get insights into big data and discover various data models Explore conceptual, logical, and big data models Understand how to model data containing different file types Run through data modeling with examples of Twitter, Bitcoin, IMDB and weather data modeling Create data models such as Graph Data and Vector Space Model structured and unstructured data using Python and R Who this book is for This book is great for programmers, geologists, biologists, and every professional who deals with spatial data. If you want to learn how to handle GIS, GPS, and remote sensing data, then this book is for you. Basic knowledge of R and QGIS would be helpful.
Step-by-step guide to build high performing predictive applications Key Features Use the Python data analytics ecosystem to implement end-to-end predictive analytics projects Explore advanced predictive modeling algorithms with an emphasis on theory with intuitive explanations Learn to deploy a predictive model's results as an interactive application Book Description Predictive analytics is an applied field that employs a variety of quantitative methods using data to make predictions. It involves much more than just throwing data onto a computer to build a model. This book provides practical coverage to help you understand the most important concepts of predictive analytics. Using practical, step-by-step examples, we build predictive analytics solutions while using cutting-edge Python tools and packages. The book's step-by-step approach starts by defining the problem and moves on to identifying relevant data. We will also be performing data preparation, exploring and visualizing relationships, building models, tuning, evaluating, and deploying model. Each stage has relevant practical examples and efficient Python code. You will work with models such as KNN, Random Forests, and neural networks using the most important libraries in Python's data science stack: NumPy, Pandas, Matplotlib, Seaborn, Keras, Dash, and so on. In addition to hands-on code examples, you will find intuitive explanations of the inner workings of the main techniques and algorithms used in predictive analytics. By the end of this book, you will be all set to build high-performance predictive analytics solutions using Python programming. What you will learn Get to grips with the main concepts and principles of predictive analytics Learn about the stages involved in producing complete predictive analytics solutions Understand how to define a problem, propose a solution, and prepare a dataset Use visualizations to explore relationships and gain insights into the dataset Learn to build regression and classification models using scikit-learn Use Keras to build powerful neural network models that produce accurate predictions Learn to serve a model's predictions as a web application Who this book is for This book is for data analysts, data scientists, data engineers, and Python developers who want to learn about predictive modeling and would like to implement predictive analytics solutions using Python's data stack. People from other backgrounds who would like to enter this exciting field will greatly benefit from reading this book. All you need is to be proficient in Python programming and have a basic understanding of statistics and college-level algebra.
Collect and scrape different complexities of data from the modern Web using the latest tools, best practices, and techniques Key Features Learn various scraping techniques using a range of Python libraries such as Scrapy and Beautiful Soup Build scrapers and crawlers to extract relevant information from the web Automate web scraping operations to bridge the accuracy gap and ease complex business needs Book Description Web scraping is an essential technique used in many organizations to scrape valuable data from web pages. This book will enable you to delve deeply into web scraping techniques and methodologies. This book will introduce you to the fundamental concepts of web scraping techniques and how they can be applied to multiple sets of web pages. We'll use powerful libraries from the Python ecosystem—such as Scrapy, lxml, pyquery, bs4, and others—to carry out web scraping operations. We will take an in-depth look at essential tasks to carry out simple to intermediate scraping operations such as identifying information from web pages, using patterns or attributes to retrieve information, and others. This book adopts a practical approach to web scraping concepts and tools, guiding you through a series of use cases and showing you how to use the best tools and techniques to efficiently scrape web pages. This book also covers the use of other popular web scraping tools, such as Selenium, Regex, and web-based APIs. By the end of this book, you will have learned how to efficiently scrape the web using different techniques with Python and other popular tools. What you will learn Analyze data and Information from web pages Learn how to use browser-based developer tools from the scraping perspective Use XPath and CSS selectors to identify and explore markup elements Learn to handle and manage cookies Explore advanced concepts in handling HTML forms and processing logins Optimize web securities, data storage, and API use to scrape data Use Regex with Python to extract data Deal with complex web entities by using Selenium to find and extract data Who this book is for This book is for Python programmers, data analysts, web scraping newbies, and anyone who wants to learn how to perform web scraping from scratch. If you want to begin your journey in applying web scraping techniques to a range of web pages, then this book is what you need! A working knowledge of the Python programming language is expected.