Take a look at our Data Mining books. Shulph carries a great selection of Data Mining books, and we are always adding more.
A practical guide for solving complex data processing challenges by applying the best optimizations techniques in Apache Spark. Key Features Learn about the core concepts and the latest developments in Apache Spark Master writing efficient big data applications with Spark's built-in modules for SQL, Streaming, Machine Learning and Graph analysis Get introduced to a variety of optimizations based on the actual experience Book Description Apache Spark is a flexible framework that allows processing of batch and real-time data. Its unified engine has made it quite popular for big data use cases. This book will help you to get started with Apache Spark 2.0 and write big data applications for a variety of use cases. It will also introduce you to Apache Spark – one of the most popular Big Data processing frameworks. Although this book is intended to help you get started with Apache Spark, but it also focuses on explaining the core concepts. This practical guide provides a quick start to the Spark 2.0 architecture and its components. It teaches you how to set up Spark on your local machine. As we move ahead, you will be introduced to resilient distributed datasets (RDDs) and DataFrame APIs, and their corresponding transformations and actions. Then, we move on to the life cycle of a Spark application and learn about the techniques used to debug slow-running applications. You will also go through Spark's built-in modules for SQL, streaming, machine learning, and graph analysis. Finally, the book will lay out the best practices and optimization techniques that are key for writing efficient Spark applications. By the end of this book, you will have a sound fundamental understanding of the Apache Spark framework and you will be able to write and optimize Spark applications. What you will learn Learn core concepts such as RDDs, DataFrames, transformations, and more Set up a Spark development environment Choose the right APIs for your applications Understand Spark's architecture and the execution flow of a Spark application Explore built-in modules for SQL, streaming, ML, and graph analysis Optimize your Spark job for better performance Who this book is for If you are a big data enthusiast and love processing huge amount of data, this book is for you. If you are data engineer and looking for the best optimization techniques for your Spark applications, then you will find this book helpful. This book also helps data scientists who want to implement their machine learning algorithms in Spark. You need to have a basic understanding of any one of the programming languages such as Scala, Python or Java.
Enhance your data analysis and predictive modeling skills using popular Python tools Key Features Cover all fundamental libraries for operation and manipulation of Python for data analysis Implement real-world datasets to perform predictive analytics with Python Access modern data analysis techniques and detailed code with scikit-learn and SciPy Book Description Python is one of the most common and popular languages preferred by leading data analysts and statisticians for working with massive datasets and complex data visualizations. Become a Python Data Analyst introduces Python's most essential tools and libraries necessary to work with the data analysis process, right from preparing data to performing simple statistical analyses and creating meaningful data visualizations. In this book, we will cover Python libraries such as NumPy, pandas, matplotlib, seaborn, SciPy, and scikit-learn, and apply them in practical data analysis and statistics examples. As you make your way through the chapters, you will learn to efficiently use the Jupyter Notebook to operate and manipulate data using NumPy and the pandas library. In the concluding chapters, you will gain experience in building simple predictive models and carrying out statistical computation and analysis using rich Python tools and proven data analysis techniques. By the end of this book, you will have hands-on experience performing data analysis with Python. What you will learn Explore important Python libraries and learn to install Anaconda distribution Understand the basics of NumPy Produce informative and useful visualizations for analyzing data Perform common statistical calculations Build predictive models and understand the principles of predictive analytics Who this book is for Become a Python Data Analyst is for entry-level data analysts, data engineers, and BI professionals who want to make complete use of Python tools for performing efficient data analysis. Prior knowledge of Python programming is necessary to understand the concepts covered in this book
Learn quick and effective techniques for developing blockchain-based distributed ledgers with ease Key Features Discover why blockchain is a game changer in the technology landscape Set up blockchain networks using Hyperledger Fabric Write smart contracts at speed with Hyperledger Composer Book Description Blockchain and Hyperledger are open source technologies that power the development of decentralized applications. This Learning Path is your helpful reference for exploring and building blockchain networks using Ethereum, Hyperledger Fabric, and Hyperledger Composer. Blockchain Development with Hyperledger will start off by giving you an overview of blockchain and demonstrating how you can set up an Ethereum development environment for developing, packaging, building, and testing campaign-decentralized applications. You'll then explore the de facto language Solidity, which you can use to develop decentralized applications in Ethereum. Following this, you'll be able to configure Hyperledger Fabric and use it to build private blockchain networks and applications that connect to them. Toward the later chapters, you'll learn how to design and launch a network, and even implement smart contracts in chain code. By the end of this Learning Path, you'll be able to build and deploy your own decentralized applications by addressing the key pain points encountered in the blockchain life cycle. This Learning Path includes content from the following Packt products: Blockchain Quick Start Guide by Xun (Brian) Wu and Weimin Sun Hands-On Blockchain with Hyperledger by Nitin Gaur et al. What you will learn Understand why decentralized applications are necessary Develop and test a decentralized application with Hyperledger Fabric and Hyperledger Composer Write and test a smart contract using Solidity Design transaction models and chain code with Golang Deploy the Composer REpresentational State Transfer (REST) Gateway to access Composer transactions Maintain, monitor, and manage your blockchain solutions Who this book is for This Learning Path is designed for blockchain developers who want to build decentralized applications and smart contracts from scratch using Hyperledger. Basic familiarity with or exposure to any programming language will be useful to get started with this course.
Build and deploy your own data pipelines on GCP, make key architectural decisions, and gain the confidence to boost your career as a data engineerKey FeaturesUnderstand data engineering concepts, the role of a data engineer, and the benefits of using GCP for building your solutionLearn how to use the various GCP products to ingest, consume, and transform data and orchestrate pipelinesDiscover tips to prepare for and pass the Professional Data Engineer examBook DescriptionWith this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards. Starting with a quick overview of the fundamental concepts of data engineering, you'll learn the various responsibilities of a data engineer and how GCP plays a vital role in fulfilling those responsibilities. As you progress through the chapters, you'll be able to leverage GCP products to build a sample data warehouse using Cloud Storage and BigQuery and a data lake using Dataproc. The book gradually takes you through operations such as data ingestion, data cleansing, transformation, and integrating data with other sources. You'll learn how to design IAM for data governance, deploy ML pipelines with the Vertex AI, leverage pre-built GCP models as a service, and visualize data with Google Data Studio to build compelling reports. Finally, you'll find tips on how to boost your career as a data engineer, take the Professional Data Engineer certification exam, and get ready to become an expert in data engineering with GCP. By the end of this data engineering book, you'll have developed the skills to perform core data engineering tasks and build efficient ETL data pipelines with GCP.What you will learnLoad data into BigQuery and materialize its output for downstream consumptionBuild data pipeline orchestration using Cloud ComposerDevelop Airflow jobs to orchestrate and automate a data warehouseBuild a Hadoop data lake, create ephemeral clusters, and run jobs on the Dataproc clusterLeverage Pub/Sub for messaging and ingestion for event-driven systemsUse Dataflow to perform ETL on streaming dataUnlock the power of your data with Data StudioCalculate the GCP cost estimation for your end-to-end data solutionsWho this book is forThis book is for data engineers, data analysts, and anyone looking to design and manage data processing pipelines using GCP. You'll find this book useful if you are preparing to take Google's Professional Data Engineer exam. Beginner-level understanding of data science, the Python programming language, and Linux commands is necessary. A basic understanding of data processing and cloud computing, in general, will help you make the most out of this book.
Get unique insights from your data by combining the power of SQL Server, R and Python Key Features Use the features of SQL Server 2017 to implement the data science project life cycle Leverage the power of R and Python to design and develop efficient data models find unique insights from your data with powerful techniques for data preprocessing and analysis Book Description SQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you. This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding,through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment. You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm. What you will learn Use the popular programming languages,T-SQL, R, and Python, for data science Understand your data with queries and introductory statistics Create and enhance the datasets for ML Visualize and analyze data using basic and advanced graphs Explore ML using unsupervised and supervised models Deploy models in SQL Server and perform predictions Who this book is for SQL Server professionals who want to start with data science, and data scientists who would like to start using SQL Server in their projects will find this book to be useful. Prior exposure to SQL Server will be helpful.
Perform efficient fast text representation and classification with Facebook's fastText library Key Features Introduction to Facebook's fastText library for NLP Perform efficient word representations, sentence classification, vector representation Build better, more scalable solutions for text representation and classification Book Description Facebook's fastText library handles text representation and classification, used for Natural Language Processing (NLP). Most organizations have to deal with enormous amounts of text data on a daily basis, and gaining efficient data insights requires powerful NLP tools such as fastText. This book is your ideal introduction to fastText. You will learn how to create fastText models from the command line, without the need for complicated code. You will explore the algorithms that fastText is built on and how to use them for word representation and text classification. Next, you will use fastText in conjunction with other popular libraries and frameworks such as Keras, TensorFlow, and PyTorch. Finally, you will deploy fastText models to mobile devices. By the end of this book, you will have all the required knowledge to use fastText in your own applications at work or in projects. What you will learn Create models using the default command line options in fastText Understand the algorithms used in fastText to create word vectors Combine command line text transformation capabilities and the fastText library to implement a training, validation, and prediction pipeline Explore word representation and sentence classification using fastText Use Gensim and spaCy to load the vectors, transform, lemmatize, and perform other NLP tasks efficiently Develop a fastText NLP classifier using popular frameworks, such as Keras, Tensorflow, and PyTorch Who this book is for This book is for data analysts, data scientists, and machine learning developers who want to perform efficient word representation and sentence classification using Facebook's fastText library. Basic knowledge of Python programming is required.
Discover the power of location data to build effective, intelligent data models with Geospatial ecosystems Key Features Manipulate location-based data and create intelligent geospatial data models Build effective location recommendation systems used by popular companies such as Uber A hands-on guide to help you consume spatial data and parallelize GIS operations effectively Book Description Data scientists, who have access to vast data streams, are a bit myopic when it comes to intrinsic and extrinsic location-based data and are missing out on the intelligence it can provide to their models. This book demonstrates effective techniques for using the power of data science and geospatial intelligence to build effective, intelligent data models that make use of location-based data to give useful predictions and analyses. This book begins with a quick overview of the fundamentals of location-based data and how techniques such as Exploratory Data Analysis can be applied to it. We then delve into spatial operations such as computing distances, areas, extents, centroids, buffer polygons, intersecting geometries, geocoding, and more, which adds additional context to location data. Moving ahead, you will learn how to quickly build and deploy a geo-fencing system using Python. Lastly, you will learn how to leverage geospatial analysis techniques in popular recommendation systems such as collaborative filtering and location-based recommendations, and more. By the end of the book, you will be a rockstar when it comes to performing geospatial analysis with ease. What you will learn Learn how companies now use location data Set up your Python environment and install Python geospatial packages Visualize spatial data as graphs Extract geometry from spatial data Perform spatial regression from scratch Build web applications which dynamically references geospatial data Who this book is for Data Scientists who would like to leverage location-based data and want to use location-based intelligence in their data models will find this book useful. This book is also for GIS developers who wish to incorporate data analysis in their projects. Knowledge of Python programming and some basic understanding of data analysis are all you need to get the most out of this book.
Get hands-on with deploying and managing your database services to provide scalable and high-speed data access on CockroachDBKey FeaturesGain insights into CockroachDB and build highly reliable cloud-native applicationsExplore the power of a scalable and highly available cloud-native SQL database to distribute data and workloads automaticallyBuild high-speed database services using CockroachDB and troubleshoot performance issuesBook DescriptionGetting Started with CockroachDB will introduce you to the inner workings of CockroachDB and help you to understand how it provides faster access to distributed data through a SQL interface. The book will also uncover how you can use the database to provide solutions where the data is highly available.Starting with CockroachDB's installation, setup, and configuration, this SQL book will familiarize you with the database architecture and database design principles. You'll then discover several options that CockroachDB provides to store multiple copies of your data to ensure fast data access. The book covers the internals of CockroachDB, how to deploy and manage it on the cloud, performance tuning to get the best out of CockroachDB, and how to scale data across continents and serve it locally. In addition to this, you'll get to grips with fault tolerance and auto-rebalancing, how indexes work, and the CockroachDB Admin UI. The book will guide you in building scalable cloud services on top of CockroachDB, covering administrative and security aspects and tips for troubleshooting, performance enhancements, and a brief guideline on migrating from traditional databases.By the end of this book, you'll have gained sufficient knowledge to manage your data on CockroachDB and interact with it from your application layer.What you will learnBecome well-versed with the overall architecture and design concepts of CockroachDBUnderstand how auto-rebalancing of data can avoid performance bottlenecksGet to know how CockroachDB achieves atomicity, consistency, isolation, and durabilityPartition your data across multiple geolocations to ensure very low latency when serving dataFind out how indexes are stored and the optimizations used to serve query results fasterDiscover the key concepts of deploying and managing CockroachDB clustersWho this book is forSoftware engineers, database developers, database administrators, and anyone who wishes to learn about the features of CockroachDB and how to build database solutions that are fast, highly available, and cater to business-critical applications, will find this book useful. Although no prior exposure to CockroachDB is required, familiarity with database concepts will help you to get the most out of this book.
Put your Haskell skills to work and generate publication-ready visualizations in no time at all Key Features Take your data analysis skills to the next level using the power of Haskell Understand regression analysis, perform multivariate regression, and untangle different cluster varieties Create publication-ready visualizations of data Book Description Every business and organization that collects data is capable of tapping into its own data to gain insights how to improve. Haskell is a purely functional and lazy programming language, well-suited to handling large data analysis problems. This book will take you through the more difficult problems of data analysis in a hands-on manner. This book will help you get up-to-speed with the basics of data analysis and approaches in the Haskell language. You'll learn about statistical computing, file formats (CSV and SQLite3), descriptive statistics, charts, and progress to more advanced concepts such as understanding the importance of normal distribution. While mathematics is a big part of data analysis, we've tried to keep this course simple and approachable so that you can apply what you learn to the real world. By the end of this book, you will have a thorough understanding of data analysis, and the different ways of analyzing data. You will have a mastery of all the tools and techniques in Haskell for effective data analysis. What you will learn Learn to parse a CSV file and read data into the Haskell environment Create Haskell functions for common descriptive statistics functions Create an SQLite3 database using an existing CSV file Learn the versatility of SELECT queries for slicing data into smaller chunks Apply regular expressions in large-scale datasets using both CSV and SQLite3 files Create a Kernel Density Estimator visualization using normal distribution Who this book is for This book is intended for people who wish to expand their knowledge of statistics and data analysis via real-world examples. A basic understanding of the Haskell language is expected. If you are feeling brave, you can jump right into the functional programming style.
Solve all big data problems by learning how to create efficient data models Key Features Create effective models that get the most out of big data Apply your knowledge to datasets from Twitter and weather data to learn big data Tackle different data modeling challenges with expert techniques presented in this book Book Description Modeling and managing data is a central focus of all big data projects. In fact, a database is considered to be effective only if you have a logical and sophisticated data model. This book will help you develop practical skills in modeling your own big data projects and improve the performance of analytical queries for your specific business requirements. To start with, you'll get a quick introduction to big data and understand the different data modeling and data management platforms for big data. Then you'll work with structured and semi-structured data with the help of real-life examples. Once you've got to grips with the basics, you'll use the SQL Developer Data Modeler to create your own data models containing different file types such as CSV, XML, and JSON. You'll also learn to create graph data models and explore data modeling with streaming data using real-world datasets. By the end of this book, you'll be able to design and develop efficient data models for varying data sizes easily and efficiently. What you will learn Get insights into big data and discover various data models Explore conceptual, logical, and big data models Understand how to model data containing different file types Run through data modeling with examples of Twitter, Bitcoin, IMDB and weather data modeling Create data models such as Graph Data and Vector Space Model structured and unstructured data using Python and R Who this book is for This book is great for programmers, geologists, biologists, and every professional who deals with spatial data. If you want to learn how to handle GIS, GPS, and remote sensing data, then this book is for you. Basic knowledge of R and QGIS would be helpful.