Take a look at our Data Science books. Shulph carries a great selection of Data Science books, and we are always adding more.
Use PySpark to easily crush messy data at-scale and discover proven techniques to create testable, immutable, and easily parallelizable Spark jobs Key Features Work with large amounts of agile data using distributed datasets and in-memory caching Source data from all popular data hosting platforms, such as HDFS, Hive, JSON, and S3 Employ the easy-to-use PySpark API to deploy big data Analytics for production Book Description Apache Spark is an open source parallel-processing framework that has been around for quite some time now. One of the many uses of Apache Spark is for data analytics applications across clustered computers. In this book, you will not only learn how to use Spark and the Python API to create high-performance analytics with big data, but also discover techniques for testing, immunizing, and parallelizing Spark jobs. You will learn how to source data from all popular data hosting platforms, including HDFS, Hive, JSON, and S3, and deal with large datasets with PySpark to gain practical big data experience. This book will help you work on prototypes on local machines and subsequently go on to handle messy data in production and at scale. This book covers installing and setting up PySpark, RDD operations, big data cleaning and wrangling, and aggregating and summarizing data into useful reports. You will also learn how to implement some practical and proven techniques to improve certain aspects of programming and administration in Apache Spark. By the end of the book, you will be able to build big data analytical solutions using the various PySpark offerings and also optimize them effectively. What you will learn Get practical big data experience while working on messy datasets Analyze patterns with Spark SQL to improve your business intelligence Use PySpark's interactive shell to speed up development time Create highly concurrent Spark programs by leveraging immutability Discover ways to avoid the most expensive operation in the Spark API: the shuffle operation Re-design your jobs to use reduceByKey instead of groupBy Create robust processing pipelines by testing Apache Spark jobs Who this book is for This book is for developers, data scientists, business analysts, or anyone who needs to reliably analyze large amounts of large-scale, real-world data. Whether you're tasked with creating your company's business intelligence function or creating great data platforms for your machine learning models, or are looking to use code to magnify the impact of your business, this book is for you.
Optimize your marketing strategies through analytics and machine learning Key Features Understand how data science drives successful marketing campaigns Use machine learning for better customer engagement, retention, and product recommendations Extract insights from your data to optimize marketing strategies and increase profitability Book Description Regardless of company size, the adoption of data science and machine learning for marketing has been rising in the industry. With this book, you will learn to implement data science techniques to understand the drivers behind the successes and failures of marketing campaigns. This book is a comprehensive guide to help you understand and predict customer behaviors and create more effectively targeted and personalized marketing strategies. This is a practical guide to performing simple-to-advanced tasks, to extract hidden insights from the data and use them to make smart business decisions. You will understand what drives sales and increases customer engagements for your products. You will learn to implement machine learning to forecast which customers are more likely to engage with the products and have high lifetime value. This book will also show you how to use machine learning techniques to understand different customer segments and recommend the right products for each customer. Apart from learning to gain insights into consumer behavior using exploratory analysis, you will also learn the concept of A/B testing and implement it using Python and R. By the end of this book, you will be experienced enough with various data science and machine learning techniques to run and manage successful marketing campaigns for your business. What you will learn Learn how to compute and visualize marketing KPIs in Python and R Master what drives successful marketing campaigns with data science Use machine learning to predict customer engagement and lifetime value Make product recommendations that customers are most likely to buy Learn how to use A/B testing for better marketing decision making Implement machine learning to understand different customer segments Who this book is for If you are a marketing professional, data scientist, engineer, or a student keen to learn how to apply data science to marketing, this book is what you need! It will be beneficial to have some basic knowledge of either Python or R to work through the examples. This book will also be beneficial for beginners as it covers basic-to-advanced data science concepts and applications in marketing with real-life examples.
A hands-on guide for professionals to perform various data science tasks in R Key Features Explore the popular R packages for data science Use R for efficient data mining, text analytics and feature engineering Become a thorough data science professional with the help of hands-on examples and use-cases in R Book Description R is the most widely used programming language, and when used in association with data science, this powerful combination will solve the complexities involved with unstructured datasets in the real world. This book covers the entire data science ecosystem for aspiring data scientists, right from zero to a level where you are confident enough to get hands-on with real-world data science problems. The book starts with an introduction to data science and introduces readers to popular R libraries for executing data science routine tasks. This book covers all the important processes in data science such as data gathering, cleaning data, and then uncovering patterns from it. You will explore algorithms such as machine learning algorithms, predictive analytical models, and finally deep learning algorithms. You will learn to run the most powerful visualization packages available in R so as to ensure that you can easily derive insights from your data. Towards the end, you will also learn how to integrate R with Spark and Hadoop and perform large-scale data analytics without much complexity. What you will learn Understand the R programming language and its ecosystem of packages for data science Obtain and clean your data before processing Master essential exploratory techniques for summarizing data Examine various machine learning prediction, models Explore the H2O analytics platform in R for deep learning Apply data mining techniques to available datasets Work with interactive visualization packages in R Integrate R with Spark and Hadoop for large-scale data analytics Who this book is for If you are a budding data scientist keen to learn about the popular pandas library, or a Python developer looking to step into the world of data analysis, this book is the ideal resource you need to get started. Some programming experience in Python will be helpful to get the most out of this course
Find, explore, and extract big data to transform into actionable insights Key Features Perform end-to-end data analysis—from exploration to visualization Real-world examples, tasks, and interview queries to be a proficient data scientist Understand how SQL is used for big data processing using HiveQL and SparkSQL Book Description SQL Server is a relational database management system that enables you to cover end-to-end data science processes using various inbuilt services and features. Hands-On Data Science with SQL Server 2017 starts with an overview of data science with SQL to understand the core tasks in data science. You will learn intermediate-to-advanced level concepts to perform analytical tasks on data using SQL Server. The book has a unique approach, covering best practices, tasks, and challenges to test your abilities at the end of each chapter. You will explore the ins and outs of performing various key tasks such as data collection, cleaning, manipulation, aggregations, and filtering techniques. As you make your way through the chapters, you will turn raw data into actionable insights by wrangling and extracting data from databases using T-SQL. You will get to grips with preparing and presenting data in a meaningful way, using Power BI to reveal hidden patterns. In the concluding chapters, you will work with SQL Server integration services to transform data into a useful format and delve into advanced examples covering machine learning concepts such as predictive analytics using real-world examples. By the end of this book, you will be in a position to handle the growing amounts of data and perform everyday activities that a data science professional performs. What you will learn Understand what data science is and how SQL Server is used for big data processing Analyze incoming data with SQL queries and visualizations Create, train, and evaluate predictive models Make predictions using trained models and establish regular retraining courses Incorporate data source querying into SQL Server Enhance built-in T-SQL capabilities using SQLCLR Visualize data with Reporting Services, Power View, and Power BI Transform data with R, Python, and Azure Who this book is for Hands-On Data Science with SQL Server 2017 is intended for data scientists, data analysts, and big data professionals who want to master their skills learning SQL and its applications. This book will be helpful even for beginners who want to build their career as data science professionals using the power of SQL Server 2017. Basic familiarity with SQL language will aid with understanding the concepts covered in this book.
Big data processing and analytics at speed and scale using command line tools. Key Features Perform string processing, numerical computations, and more using CLI tools Understand the essential components of data science development workflow Automate data pipeline scripts and visualization with the command line Book Description The Command Line has been in existence on UNIX-based OSes in the form of Bash shell for over 3 decades. However, very little is known to developers as to how command-line tools can be OSEMN (pronounced as awesome and standing for Obtaining, Scrubbing, Exploring, Modeling, and iNterpreting data) for carrying out simple-to-advanced data science tasks at speed. This book will start with the requisite concepts and installation steps for carrying out data science tasks using the command line. You will learn to create a data pipeline to solve the problem of working with small-to medium-sized files on a single machine. You will understand the power of the command line, learn how to edit files using a text-based and an. You will not only learn how to automate jobs and scripts, but also learn how to visualize data using the command line. By the end of this book, you will learn how to speed up the process and perform automated tasks using command-line tools. What you will learn Understand how to set up the command line for data science Use AWK programming language commands to search quickly in large datasets. Work with files and APIs using the command line Share and collect data with CLI tools Perform visualization with commands and functions Uncover machine-level programming practices with a modern approach to data science Who this book is for This book is for data scientists and data analysts with little to no knowledge of the command line but has an understanding of data science. Perform everyday data science tasks using the power of command line tools.
Add a touch of data analytics to your healthcare systems and get insightful outcomes Key Features Perform healthcare analytics with Python and SQL Build predictive models on real healthcare data with pandas and scikit-learn Use analytics to improve healthcare performance Book Description In recent years, machine learning technologies and analytics have been widely utilized across the healthcare sector. Healthcare Analytics Made Simple bridges the gap between practising doctors and data scientists. It equips the data scientists' work with healthcare data and allows them to gain better insight from this data in order to improve healthcare outcomes. This book is a complete overview of machine learning for healthcare analytics, briefly describing the current healthcare landscape, machine learning algorithms, and Python and SQL programming languages. The step-by-step instructions teach you how to obtain real healthcare data and perform descriptive, predictive, and prescriptive analytics using popular Python packages such as pandas and scikit-learn. The latest research results in disease detection and healthcare image analysis are reviewed. By the end of this book, you will understand how to use Python for healthcare data analysis, how to import, collect, clean, and refine data from electronic health record (EHR) surveys, and how to make predictive models with this data through real-world algorithms and code examples. What you will learn Gain valuable insight into healthcare incentives, finances, and legislation Discover the connection between machine learning and healthcare processes Use SQL and Python to analyze data Measure healthcare quality and provider performance Identify features and attributes to build successful healthcare models Build predictive models using real-world healthcare data Become an expert in predictive modeling with structured clinical data See what lies ahead for healthcare analytics Who this book is for Healthcare Analytics Made Simple is for you if you are a developer who has a working knowledge of Python or a related programming language, although you are new to healthcare or predictive modeling with healthcare data. Clinicians interested in analytics and healthcare computing will also benefit from this book. This book can also serve as a textbook for students enrolled in an introductory course on machine learning for healthcare.
Understand the constructs of the Python programming language and use them to build data science projects Key Features Learn the basics of developing applications with Python and deploy your first data application Take your first steps in Python programming by understanding and using data structures, variables, and loops Delve into Jupyter, NumPy, Pandas, SciPy, and sklearn to explore the data science ecosystem in Python Book Description Python is the most widely used programming language for building data science applications. Complete with step-by-step instructions, this book contains easy-to-follow tutorials to help you learn Python and develop real-world data science projects. The “secret sauce” of the book is its curated list of topics and solutions, put together using a range of real-world projects, covering initial data collection, data analysis, and production. This Python book starts by taking you through the basics of programming, right from variables and data types to classes and functions. You'll learn how to write idiomatic code and test and debug it, and discover how you can create packages or use the range of built-in ones. You'll also be introduced to the extensive ecosystem of Python data science packages, including NumPy, Pandas, scikit-learn, Altair, and Datashader. Furthermore, you'll be able to perform data analysis, train models, and interpret and communicate the results. Finally, you'll get to grips with structuring and scheduling scripts using Luigi and sharing your machine learning models with the world as a microservice. By the end of the book, you'll have learned not only how to implement Python in data science projects, but also how to maintain and design them to meet high programming standards. What you will learn Code in Python using Jupyter and VS Code Explore the basics of coding – loops, variables, functions, and classes Deploy continuous integration with Git, Bash, and DVC Get to grips with Pandas, NumPy, and scikit-learn Perform data visualization with Matplotlib, Altair, and Datashader Create a package out of your code using poetry and test it with PyTest Make your machine learning model accessible to anyone with the web API Who this book is for If you want to learn Python or data science in a fun and engaging way, this book is for you. You'll also find this book useful if you're a high school student, researcher, analyst, or anyone with little or no coding experience with an interest in the subject and courage to learn, fail, and learn from failing. A basic understanding of how computers work will be useful.
An easy-to-follow, step-by-step guide for getting to grips with the real-world application of machine learning algorithms Key Features Explore statistics and complex mathematics for data-intensive applications Discover new developments in EM algorithm, PCA, and bayesian regression Study patterns and make predictions across various datasets Book Description Machine learning has gained tremendous popularity for its powerful and fast predictions with large datasets. However, the true forces behind its powerful output are the complex algorithms involving substantial statistical analysis that churn large datasets and generate substantial insight. This second edition of Machine Learning Algorithms walks you through prominent development outcomes that have taken place relating to machine learning algorithms, which constitute major contributions to the machine learning process and help you to strengthen and master statistical interpretation across the areas of supervised, semi-supervised, and reinforcement learning. Once the core concepts of an algorithm have been covered, you'll explore real-world examples based on the most diffused libraries, such as scikit-learn, NLTK, TensorFlow, and Keras. You will discover new topics such as principal component analysis (PCA), independent component analysis (ICA), Bayesian regression, discriminant analysis, advanced clustering, and gaussian mixture. By the end of this book, you will have studied machine learning algorithms and be able to put them into production to make your machine learning applications more innovative. What you will learn Study feature selection and the feature engineering process Assess performance and error trade-offs for linear regression Build a data model and understand how it works by using different types of algorithm Learn to tune the parameters of Support Vector Machines (SVM) Explore the concept of natural language processing (NLP) and recommendation systems Create a machine learning architecture from scratch Who this book is for Machine Learning Algorithms is for you if you are a machine learning engineer, data engineer, or junior data scientist who wants to advance in the field of predictive analytics and machine learning. Familiarity with R and Python will be an added advantage for getting the best from this book.
Understand data science concepts and methodologies to manage and deliver top-notch solutions for your organization Key Features Learn the basics of data science and explore its possibilities and limitations Manage data science projects and assemble teams effectively even in the most challenging situations Understand management principles and approaches for data science projects to streamline the innovation process Book Description Data science and machine learning can transform any organization and unlock new opportunities. However, employing the right management strategies is crucial to guide the solution from prototype to production. Traditional approaches often fail as they don't entirely meet the conditions and requirements necessary for current data science projects. In this book, you'll explore the right approach to data science project management, along with useful tips and best practices to guide you along the way. After understanding the practical applications of data science and artificial intelligence, you'll see how to incorporate them into your solutions. Next, you will go through the data science project life cycle, explore the common pitfalls encountered at each step, and learn how to avoid them. Any data science project requires a skilled team, and this book will offer the right advice for hiring and growing a data science team for your organization. Later, you'll be shown how to efficiently manage and improve your data science projects through the use of DevOps and ModelOps. By the end of this book, you will be well versed with various data science solutions and have gained practical insights into tackling the different challenges that you'll encounter on a daily basis. What you will learn Understand the underlying problems of building a strong data science pipeline Explore the different tools for building and deploying data science solutions Hire, grow, and sustain a data science team Manage data science projects through all stages, from prototype to production Learn how to use ModelOps to improve your data science pipelines Get up to speed with the model testing techniques used in both development and production stages Who this book is for This book is for data scientists, analysts, and program managers who want to use data science for business productivity by incorporating data science workflows efficiently. Some understanding of basic data science concepts will be useful to get the most out of this book.
Explore Python frameworks like pandas, Jupyter notebooks, and Matplotlib to build data pipelines and data visualization Key Features Learn to set up data analysis pipelines with pandas and Jupyter notebooks Effective techniques for data selection, manipulation, and visualization Introduction to Matplotlib for interactive data visualization using charts and plots Book Description The pandas is a Python library that lets you manipulate, transform, and analyze data. It is a popular framework for exploratory data visualization and analyzing datasets and data pipelines based on their properties. This book will be your practical guide to exploring datasets using pandas. You will start by setting up Python, pandas, and Jupyter Notebooks. You will learn how to use Jupyter Notebooks to run Python code. We then show you how to get data into pandas and do some exploratory analysis, before learning how to manipulate and reshape data using pandas methods. You will also learn how to deal with missing data from your datasets, how to draw charts and plots using pandas and Matplotlib, and how to create some effective visualizations for your audience. Finally, you will wrapup your newly gained pandas knowledge by learning how to import data out of pandas into some popular file formats. By the end of this book, you will have a better understanding of exploratory analysis and how to build exploratory data pipelines with Python. What you will learn Learn how to read different kinds of data into pandas DataFrames for data analysis Manipulate, transform, and apply formulas to data imported into pandas DataFrames Use pandas to analyze and visualize different kinds of data to gain real-world insights Extract transformed data form pandas DataFrames and convert it into the formats your application expects Manipulate model time-series data, perform algorithmic trading, derive results on fixed and moving windows, and more Effective data visualization using Matplotlib Who this book is for If you are a budding data scientist looking to learn the popular pandas library, or a Python developer looking to step into the world of data analysis, this book is the ideal resource you need to get started. Some programming experience in Python will be helpful to get the most out of this course