IBM Spectrum Scale Big Data and Analytics Solution Brief

IBM Spectrum Scale  Big Data and Analytics Solution Brief
Author: Wei G. Gong,Sandeep R. Patil,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 14
Release: 2018-01-23
ISBN: 0738456632
Category: Computers
Language: EN, FR, DE, ES & NL

IBM Spectrum Scale Big Data and Analytics Solution Brief Book Excerpt:

This IBM® RedguideTM publication describes big data and analytics deployments that are built on IBM Spectrum ScaleTM. IBM Spectrum Scale is a proven enterprise-level distributed file system that is a high-performance and cost-effective alternative to Hadoop Distributed File System (HDFS) for Hadoop analytics services. IBM Spectrum Scale includes NFS, SMB, and Object services and meets the performance that is required by many industry workloads, such as technical computing, big data, analytics, and content management. IBM Spectrum Scale provides world-class, web-based storage management with extreme scalability, flash accelerated performance, and automatic policy-based storage tiering from flash through disk to the cloud, which reduces storage costs up to 90% while improving security and management efficiency in cloud, big data, and analytics environments. This Redguide publication is intended for technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing Hadoop analytics services and are interested in learning about the benefits of the use of IBM Spectrum Scale as an alternative to HDFS.

IBM Spectrum Scale

IBM Spectrum Scale
Author: Wei G. Gong,Sandeep R. Patil
Publsiher: Unknown
Total Pages: 135
Release: 2019
ISBN: 1928374650XXX
Category: Big data
Language: EN, FR, DE, ES & NL

IBM Spectrum Scale Book Excerpt:

Making Data Smarter with IBM Spectrum Discover Practical AI Solutions

Making Data Smarter with IBM Spectrum Discover  Practical AI Solutions
Author: Ivaylo B. Bozhinov,Isom Crawford Jr.,Joseph Dain,Mathias Defiebre,Maxime Deloche,Kiran Ghag,Vasfi Gucer,Xin Liu,Abeer Selim,Gauthier Siri,Christopher Vollmar,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 150
Release: 2020-10-19
ISBN: 0738459135
Category: Computers
Language: EN, FR, DE, ES & NL

Making Data Smarter with IBM Spectrum Discover Practical AI Solutions Book Excerpt:

More than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, and so on. Many organizations face significant challenges to manage this deluge of unstructured data, such as the following examples: Pinpointing and activating relevant data for large-scale analytics Lacking the fine-grained visibility that is needed to map data to business priorities Removing redundant, obsolete, and trivial (ROT) data Identifying and classifying sensitive data IBM® Spectrum Discover is a modern metadata management software that provides data insight for petabyte-scale file and Object Storage, storage on-premises, and in the cloud. This software enables organizations to make better business decisions and gain and maintain a competitive advantage. IBM Spectrum® Discover provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. It improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research. This IBM Redbooks® publication presents several use cases that are focused on artificial intelligence (AI) solutions with IBM Spectrum Discover. This book helps storage administrators and technical specialists plan and implement AI solutions by using IBM Spectrum Discover and several other IBM Storage products.

Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale

Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale
Author: Wei Gong,Linda Cham,Prashanth Shetty,John Sing,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 32
Release: 2021-04-23
ISBN: 0738459380
Category: Computers
Language: EN, FR, DE, ES & NL

Cloudera Data Platform Private Cloud Base with IBM Spectrum Scale Book Excerpt:

This IBM® Redpaper publication provides guidance on building an enterprise-grade data lake by using IBM Spectrum® Scale and Cloudera Data Platform (CDP) Private Cloud Base for performing in-place Cloudera Hadoop or Cloudera Spark-based analytics. It also covers the benefits of the integrated solution and gives guidance about the types of deployment models and considerations during the implementation of these models.

IBM Data Engine for Hadoop and Spark

IBM Data Engine for Hadoop and Spark
Author: Dino Quintero,Luis Bolinches,Aditya Gandakusuma Sutandyo,Nicolas Joly,Reinaldo Tetsuo Katahira,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 122
Release: 2016-08-24
ISBN: 0738441937
Category: Computers
Language: EN, FR, DE, ES & NL

IBM Data Engine for Hadoop and Spark Book Excerpt:

This IBM® Redbooks® publication provides topics to help the technical community take advantage of the resilience, scalability, and performance of the IBM Power SystemsTM platform to implement or integrate an IBM Data Engine for Hadoop and Spark solution for analytics solutions to access, manage, and analyze data sets to improve business outcomes. This book documents topics to demonstrate and take advantage of the analytics strengths of the IBM POWER8® platform, the IBM analytics software portfolio, and selected third-party tools to help solve customer's data analytic workload requirements. This book describes how to plan, prepare, install, integrate, manage, and show how to use the IBM Data Engine for Hadoop and Spark solution to run analytic workloads on IBM POWER8. In addition, this publication delivers documentation to complement available IBM analytics solutions to help your data analytic needs. This publication strengthens the position of IBM analytics and big data solutions with a well-defined and documented deployment model within an IBM POWER8 virtualized environment so that customers have a planned foundation for security, scaling, capacity, resilience, and optimization for analytics workloads. This book is targeted at technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering analytics solutions and support on IBM Power Systems.

Implementing an Optimized Analytics Solution on IBM Power Systems

Implementing an Optimized Analytics Solution on IBM Power Systems
Author: Dino Quintero,Kanako Harada,Reinaldo Tetsuo Katahira,Antonio Moreira de Oliveira Neto,Robert Simon,Brian Yaeger,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 296
Release: 2016-06-01
ISBN: 0738441686
Category: Computers
Language: EN, FR, DE, ES & NL

Implementing an Optimized Analytics Solution on IBM Power Systems Book Excerpt:

This IBM® Redbooks® publication addresses topics to use the virtualization strengths of the IBM POWER8® platform to solve clients' system resource utilization challenges and maximize systems' throughput and capacity. This book addresses performance tuning topics that will help answer clients' complex analytic workload requirements, help maximize systems' resources, and provide expert-level documentation to transfer the how-to-skills to the worldwide teams. This book strengthens the position of IBM Analytics and Big Data solutions with a well-defined and documented deployment model within a POWER8 virtualized environment, offering clients a planned foundation for security, scaling, capacity, resilience, and optimization for analytics workloads. This book is targeted toward technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing analytics solutions and support on IBM Power SystemsTM.

IBM Elastic Storage Server Implementation Guide for Version 5 3

IBM Elastic Storage Server Implementation Guide for Version 5 3
Author: Luis Bolinches,Puneet Chaudhary,Kiran Ghag,Poornima Gupte,Vasfi Gucer,Nikhil Khandelwal,Ravindra Sure,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 102
Release: 2019-02-05
ISBN: 0738457418
Category: Computers
Language: EN, FR, DE, ES & NL

IBM Elastic Storage Server Implementation Guide for Version 5 3 Book Excerpt:

This IBM® RedpaperTM publication introduces and describes the IBM Elastic StorageTM Server as a scalable, high-performance data and file management solution. The solution is built on proven IBM SpectrumTM Scale technology, formerly IBM General Parallel File System (GPFSTM). IBM Elastic Storage Servers can be implemented for a range of diverse requirements, providing reliability, performance, and scalability. This publication helps you to understand the solution and its architecture and helps you to plan the installation and integration of the environment. The following combination of physical and logical components are required: Hardware Operating system Storage Network Applications This paper provides guidelines for several usage and integration scenarios. Typical scenarios include Cluster Export Services (CES) integration, disaster recovery, and multicluster integration. This paper addresses the needs of technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who must deliver cost-effective cloud services and big data solutions.

Data Accelerator for AI and Analytics

Data Accelerator for AI and Analytics
Author: Simon Lorenz,Gero Schmidt,TJ Harris,Mike Knieriemen,Nils Haustein,Abhishek Dave,Venkateswara Puvvada,Christof Westhues,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 88
Release: 2021-01-20
ISBN: 0738459321
Category: Computers
Language: EN, FR, DE, ES & NL

Data Accelerator for AI and Analytics Book Excerpt:

This IBM® Redpaper publication focuses on data orchestration in enterprise data pipelines. It provides details about data orchestration and how to address typical challenges that customers face when dealing with large and ever-growing amounts of data for data analytics. While the amount of data increases steadily, artificial intelligence (AI) workloads must speed up to deliver insights and business value in a timely manner. This paper provides a solution that addresses these needs: Data Accelerator for AI and Analytics (DAAA). A proof of concept (PoC) is described in detail. This paper focuses on the functions that are provided by the Data Accelerator for AI and Analytics solution, which simplifies the daily work of data scientists and system administrators. This solution helps increase the efficiency of storage systems and data processing to obtain results faster while eliminating unnecessary data copies and associated data management.

Implementation Guide for IBM Elastic Storage System 5000

Implementation Guide for IBM Elastic Storage System 5000
Author: Brian Herr,Farida Yaragatti,Jay Vaddi,John Sing,Jonathan Terner,Luis Bolinches,Mary Jane Zajac,Puneet Chaudhary,Ravindra Sure,Ricardo D. Zamora Ruvalcaba,Robert Guthrie,Shradha Thakare,Stephen M Tee,Steve Duersch,Sukumar Vankadhara,Sumit Kumar,Todd M Tosseth,Van Smith,Vasfi Gucer,Wesley Jones,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 130
Release: 2020-12-08
ISBN: 0738459224
Category: Computers
Language: EN, FR, DE, ES & NL

Implementation Guide for IBM Elastic Storage System 5000 Book Excerpt:

This IBM® Redbooks® publication introduces and describes the IBM Elastic Storage® Server 5000 (ESS 5000) as a scalable, high-performance data and file management solution. The solution is built on proven IBM Spectrum® Scale technology, formerly IBM General Parallel File System (IBM GPFS). ESS is a modern implementation of software-defined storage, making it easier for you to deploy fast, highly scalable storage for AI and big data. With the lightning-fast NVMe storage technology and industry-leading file management capabilities of IBM Spectrum Scale, the ESS 3000 and ESS 5000 nodes can grow to over YB scalability and can be integrated into a federated global storage system. By consolidating storage requirements from the edge to the core data center — including kubernetes and Red Hat OpenShift — IBM ESS can reduce inefficiency, lower acquisition costs, simplify storage management, eliminate data silos, support multiple demanding workloads, and deliver high performance throughout your organization. This book provides a technical overview of the ESS 5000 solution and helps you to plan the installation of the environment. We also explain the use cases where we believe it fits best. Our goal is to position this book as the starting point document for customers that would use the ESS 5000 as part of their IBM Spectrum Scale setups. This book is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective storage solutions with ESS 5000.

IBM System Storage Solutions Handbook

IBM System Storage Solutions Handbook
Author: Ezgi Coskun,Mikael Lindström,Maciej Olejniczak,Oliver Stark,Megan Gilge,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 288
Release: 2016-07-15
ISBN: 0738441740
Category: Computers
Language: EN, FR, DE, ES & NL

IBM System Storage Solutions Handbook Book Excerpt:

The IBM® System Storage® Solutions Handbook helps you solve your current and future data storage business requirements. It helps you achieve enhanced storage efficiency by design to allow managed cost, capacity of growth, greater mobility, and stronger control over storage performance and management. It describes the most current IBM storage products, including the IBM SpectrumTM family, IBM FlashSystem®, disk, and tape, as well as virtualized solutions such IBM Storage Cloud. This IBM Redbooks® publication provides overviews and information about the most current IBM System Storage products. It shows how IBM delivers the right mix of products for nearly every aspect of business continuance and business efficiency. IBM storage products can help you store, safeguard, retrieve, and share your data. This book is intended as a reference for basic and comprehensive information about the IBM Storage products portfolio. It provides a starting point for establishing your own enterprise storage environment. This book describes the IBM Storage products as of March, 2016.

IBM Private Public and Hybrid Cloud Storage Solutions

IBM Private  Public  and Hybrid Cloud Storage Solutions
Author: Larry Coyne,Joe Dain,Eric Forestier,Patrizia Guaitani,Robert Haas,Christopher D. Maestas,Antoine Maille,Tony Pearson,Brian Sherman,Christopher Vollmar,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 186
Release: 2018-11-27
ISBN: 0738456845
Category: Computers
Language: EN, FR, DE, ES & NL

IBM Private Public and Hybrid Cloud Storage Solutions Book Excerpt:

This IBM® RedpaperTM publication takes you on a journey that surveys cloud computing to answer several fundamental questions about storage cloud technology. What are storage clouds? How can a storage cloud help solve your current and future data storage business requirements? What can IBM do to help you implement a storage cloud solution that addresses these needs? This paper shows how IBM storage clouds use the extensive cloud computing experience, services, proven technologies, and products of IBM to support a smart storage cloud solution designed for your storage optimization efforts. Clients face many common storage challenges and some have variations that make them unique. It describes various successful client storage cloud implementations and the options that are available to meet your current needs and position you to avoid storage issues in the future. IBM CloudTM Services (IBM Cloud Managed Services® and IBM SoftLayer®) are highlighted as well as the contributions of IBM to OpenStack cloud storage. This paper is intended for anyone who wants to learn about storage clouds and how IBM addresses data storage challenges with smart storage cloud solutions. It is suitable for IBM clients, storage solution integrators, and IBM specialist sales representatives.

Deployment and Usage Guide for Running AI Workloads on Red Hat OpenShift and NVIDIA DGX Systems with IBM Spectrum Scale

Deployment and Usage Guide for Running AI Workloads on Red Hat OpenShift and NVIDIA DGX Systems with IBM Spectrum Scale
Author: Simon Lorenz,Gero Schmidt,Thomas Schoenemeyer,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 60
Release: 2020-11-30
ISBN: 0738459097
Category: Computers
Language: EN, FR, DE, ES & NL

Deployment and Usage Guide for Running AI Workloads on Red Hat OpenShift and NVIDIA DGX Systems with IBM Spectrum Scale Book Excerpt:

This IBM® Redpaper publication describes the architecture, installation procedure, and results for running a typical training application that works on an automotive data set in an orchestrated and secured environment that provides horizontal scalability of GPU resources across physical node boundaries for deep neural network (DNN) workloads. This paper is mostly relevant for systems engineers, system administrators, or system architects that are responsible for data center infrastructure management and typical day-to-day operations such as system monitoring, operational control, asset management, and security audits. This paper also describes IBM Spectrum® LSF® as a workload manager and IBM Spectrum Discover as a metadata search engine to find the right data for an inference job and automate the data science workflow. With the help of this solution, the data location, which may be on different storage systems, and time of availability for the AI job can be fully abstracted, which provides valuable information for data scientists.

IBM Platform Computing Solutions for High Performance and Technical Computing Workloads

IBM Platform Computing Solutions for High Performance and Technical Computing Workloads
Author: Dino Quintero,Daniel de Souza Casali,Marcelo Correia Lima,Istvan Gabor Szabo,Maciej Olejniczak,Tiago Rodrigues de Mello,Nilton Carlos dos Santos,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 176
Release: 2015-06-19
ISBN: 0738440752
Category: Computers
Language: EN, FR, DE, ES & NL

IBM Platform Computing Solutions for High Performance and Technical Computing Workloads Book Excerpt:

This IBM® Redbooks® publication is a refresh of IBM Technical Computing Clouds, SG24-8144, Enhance Inbound and Outbound Marketing with a Trusted Single View of the Customer, SG24-8173, and IBM Platform Computing Integration Solutions, SG24-8081, with a focus on High Performance and Technical Computing on IBM Power SystemsTM. This book describes synergies across the IBM product portfolio by using case scenarios and showing solutions such as IBM SpectrumTM Scale (formerly GPFSTM). This book also reflects and documents the IBM Platform Computing Cloud Services as part of IBM Platform Symphony® for analytics workloads and IBM Platform LSF® (with new features, such as a Hadoop connector, a MapReduce accelerator, and dynamic cluster) for job scheduling. Both products are used to help customers schedule and analyze large amounts of data for business productivity and competitive advantages. This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering cost-effective cloud services and big data solutions on IBM Power Systems to uncover insights among client data so that they can take actions to optimize business results, product development, and scientific discoveries.

IBM Cloud Pak for Data with IBM Spectrum Scale Container Native

IBM Cloud Pak for Data with IBM Spectrum Scale Container Native
Author: Gero Schmidt,Tara Astigarraga,Paulina Acevedo,JJ Miller,Dessa Simpson,Austen Stewart,Todd Tosseth,Jayson Tsingine,Israel Andres Vizcarra Godinez,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 106
Release: 2021-12-17
ISBN: 0738460095
Category: Computers
Language: EN, FR, DE, ES & NL

IBM Cloud Pak for Data with IBM Spectrum Scale Container Native Book Excerpt:

This IBM® Redpaper® publication describes configuration guidelines and best practices when IBM Spectrum® Scale Container Native Storage Access is used as a storage provider for IBM Cloud® Pak for Data on Red Hat OpenShift Container Platform. It also provides the steps to install IBM Db2® and several assemblies within IBM Cloud Pak® for Data, including Watson Knowledge Catalog, Watson Studio, IBM DataStage®, Db2 Warehouse, Watson Machine Learning, Watson OpenScale, Data Virtualization, Data Management Console, and Apache Spark. This IBM Redpaper publication was written for IT architects, IT specialists, developers, and others who are interested in installing IBM Cloud Pak for Data with IBM Spectrum Scale Container Native.

Securing Data on Threat Detection by Using IBM Spectrum Scale and IBM QRadar An Enhanced Cyber Resiliency Solution

Securing Data on Threat Detection by Using IBM Spectrum Scale and IBM QRadar  An Enhanced Cyber Resiliency Solution
Author: Boudhayan Chakrabarty,Sandeep R Patil,Shashank Shingornikar,Ashish Kothekar,Praphullachandra Mujumdar,Smita Raut,Digvijay Ukirde,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 68
Release: 2021-09-13
ISBN: 073846001X
Category: Computers
Language: EN, FR, DE, ES & NL

Securing Data on Threat Detection by Using IBM Spectrum Scale and IBM QRadar An Enhanced Cyber Resiliency Solution Book Excerpt:

Having appropriate storage for hosting business-critical data and advanced Security Information and Event Management (SIEM) software for deep inspection, detection, and prioritization of threats has become a necessity for any business. This IBM® Redpaper publication explains how the storage features of IBM Spectrum® Scale, when combined with the log analysis, deep inspection, and detection of threats that are provided by IBM QRadar®, help reduce the impact of incidents on business data. Such integration provides an excellent platform for hosting unstructured business data that is subject to regulatory compliance requirements. This paper describes how IBM Spectrum Scale File Audit Logging can be integrated with IBM QRadar. Using IBM QRadar, an administrator can monitor, inspect, detect, and derive insights for identifying potential threats to the data that is stored on IBM Spectrum Scale. When the threats are identified, you can quickly act on them to mitigate or reduce the impact of incidents. We further demonstrate how the threat detection by IBM QRadar can proactively trigger data snapshots or cyber resiliency workflow in IBM Spectrum Scale to protect the data during threat. This third edition has added the section "Ransomware threat detection", where we describe a ransomware attack scenario within an environment to leverage IBM Spectrum Scale File Audit logs integration with IBM QRadar. This paper is intended for chief technology officers, solution engineers, security architects, and systems administrators. This paper assumes a basic understanding of IBM Spectrum Scale and IBM QRadar and their administration.

AI and Big Data on IBM Power Systems Servers

AI and Big Data on IBM Power Systems Servers
Author: Scott Vetter,Ivaylo B. Bozhinov,Anto A John,Rafael Freitas de Lima,Ahmed.(Mash) Mashhour,James Van Oosten,Fernando Vermelho,Allison White,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 162
Release: 2019-04-10
ISBN: 0738457515
Category: Computers
Language: EN, FR, DE, ES & NL

AI and Big Data on IBM Power Systems Servers Book Excerpt:

As big data becomes more ubiquitous, businesses are wondering how they can best leverage it to gain insight into their most important business questions. Using machine learning (ML) and deep learning (DL) in big data environments can identify historical patterns and build artificial intelligence (AI) models that can help businesses to improve customer experience, add services and offerings, identify new revenue streams or lines of business (LOBs), and optimize business or manufacturing operations. The power of AI for predictive analytics is being harnessed across all industries, so it is important that businesses familiarize themselves with all of the tools and techniques that are available for integration with their data lake environments. In this IBM® Redbooks® publication, we cover the best practices for deploying and integrating some of the best AI solutions on the market, including: IBM Watson Machine Learning Accelerator (see note for product naming) IBM Watson Studio Local IBM Power SystemsTM IBM SpectrumTM Scale IBM Data Science Experience (IBM DSX) IBM Elastic StorageTM Server Hortonworks Data Platform (HDP) Hortonworks DataFlow (HDF) H2O Driverless AI We map out all the integrations that are possible with our different AI solutions and how they can integrate with your existing or new data lake. We also walk you through some of our client use cases and show you how some of the industry leaders are using Hortonworks, IBM PowerAI, and IBM Watson Studio Local to drive decision making. We also advise you on your deployment options, when to use a GPU, and why you should use the IBM Elastic Storage Server (IBM ESS) to improve storage management. Lastly, we describe how to integrate IBM Watson Machine Learning Accelerator and Hortonworks with or without IBM Watson Studio Local, how to access real-time data, and security. Note: IBM Watson Machine Learning Accelerator is the new product name for IBM PowerAI Enterprise. Note: Hortonworks merged with Cloudera in January 2019. The new company is called Cloudera. References to Hortonworks as a business entity in this publication are now referring to the merged company. Product names beginning with Hortonworks continue to be marketed and sold under their original names.

Enterprise Data Warehouse Optimization with Hadoop on IBM Power Systems Servers

Enterprise Data Warehouse Optimization with Hadoop on IBM Power Systems Servers
Author: Scott Vetter,Helen Lu,Maciej Olejniczak,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 82
Release: 2018-01-31
ISBN: 0738456608
Category: Computers
Language: EN, FR, DE, ES & NL

Enterprise Data Warehouse Optimization with Hadoop on IBM Power Systems Servers Book Excerpt:

Data warehouses were developed for many good reasons, such as providing quick query and reporting for business operations, and business performance. However, over the years, due to the explosion of applications and data volume, many existing data warehouses have become difficult to manage. Extract, Transform, and Load (ETL) processes are taking longer, missing their allocated batch windows. In addition, data types that are required for business analysis have expanded from structured data to unstructured data. The Apache open source Hadoop platform provides a great alternative for solving these problems. IBM® has committed to open source since the early years of open Linux. IBM and Hortonworks together are committed to Apache open source software more than any other company. IBM Power SystemsTM servers are built with open technologies and are designed for mission-critical data applications. Power Systems servers use technology from the OpenPOWER Foundation, an open technology infrastructure that uses the IBM POWER® architecture to help meet the evolving needs of big data applications. The combination of Power Systems with Hortonworks Data Platform (HDP) provides users with a highly efficient platform that provides leadership performance for big data workloads such as Hadoop and Spark. This IBM RedpaperTM publication provides details about Enterprise Data Warehouse (EDW) optimization with Hadoop on Power Systems. Many people know Power Systems from the IBM AIX® platform, but might not be familiar with IBM PowerLinuxTM, so part of this paper provides a Power Systems overview. A quick introduction to Hadoop is provided for those not familiar with the topic. Details of HDP on Power Reference architecture are included that will help both software architects and infrastructure architects understand the design. In the optimization chapter, we describe various topics: traditional EDW offload, sizing guidelines, performance tuning, IBM Elastic StorageTM Server (ESS) for data-intensive workload, IBM Big SQL as the common structured query language (SQL) engine for Hadoop platform, and tools that are available on Power Systems that are related to EDW optimization. We also dedicate some pages to the analytics components (IBM Data Science Experience (IBM DSX) and IBM SpectrumTM Conductor for Spark workload) for the Hadoop infrastructure.

Implementing an IBM High Performance Computing Solution on IBM Power System S822LC

Implementing an IBM High Performance Computing Solution on IBM Power System S822LC
Author: Dino Quintero,Luis Carlos Cruz Huertas,Tsuyoshi Kamenoue,Wainer dos Santos Moschetta,Mauricio Faria de Oliveira,Georgy E Pavlov,Alexander Pozdneev,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 342
Release: 2016-07-25
ISBN: 0738441872
Category: Computers
Language: EN, FR, DE, ES & NL

Implementing an IBM High Performance Computing Solution on IBM Power System S822LC Book Excerpt:

This IBM® Redbooks® publication demonstrates and documents that IBM Power SystemsTM high-performance computing and technical computing solutions deliver faster time to value with powerful solutions. Configurable into highly scalable Linux clusters, Power Systems offer extreme performance for demanding workloads such as genomics, finance, computational chemistry, oil and gas exploration, and high-performance data analytics. This book delivers a high-performance computing solution implemented on the IBM Power System S822LC. The solution delivers high application performance and throughput based on its built-for-big-data architecture that incorporates IBM POWER8® processors, tightly coupled Field Programmable Gate Arrays (FPGAs) and accelerators, and faster I/O by using Coherent Accelerator Processor Interface (CAPI). This solution is ideal for clients that need more processing power while simultaneously increasing workload density and reducing datacenter floor space requirements. The Power S822LC offers a modular design to scale from a single rack to hundreds, simplicity of ordering, and a strong innovation roadmap for graphics processing units (GPUs). This publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) responsible for delivering cost effective high-performance computing (HPC) solutions that help uncover insights from their data so they can optimize business results, product development, and scientific discoveries

IBM High Performance Computing Insights with IBM Power System AC922 Clustered Solution

IBM High Performance Computing Insights with IBM Power System AC922 Clustered Solution
Author: Dino Quintero,Miguel Gomez Gonzalez,Ahmad Y Hussein,Jan-Frode Myklebust,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 352
Release: 2019-05-02
ISBN: 0738457450
Category: Computers
Language: EN, FR, DE, ES & NL

IBM High Performance Computing Insights with IBM Power System AC922 Clustered Solution Book Excerpt:

This IBM® Redbooks® publication documents and addresses topics to set up a complete infrastructure environment and tune the applications to use an IBM POWER9TM hardware architecture with the technical computing software stack. This publication is driven by a CORAL project solution. It explores, tests, and documents how to implement an IBM High-Performance Computing (HPC) solution on a POWER9 processor-based system by using IBM technical innovations to help solve challenging scientific, technical, and business problems. This book documents the HPC clustering solution with InfiniBand on IBM Power SystemsTM AC922 8335-GTH and 8335-GTX servers with NVIDIA Tesla V100 SXM2 graphics processing units (GPUs) with NVLink, software components, and the IBM SpectrumTM Scale parallel file system. This solution includes recommendations about the components that are used to provide a cohesive clustering environment that includes job scheduling, parallel application tools, scalable file systems, administration tools, and a high-speed interconnect. This book is divided into three parts: Part 1 focuses on the planners of the solution, Part 2 focuses on the administrators, and Part 3 focuses on the developers. This book targets technical professionals (consultants, technical support staff, IT architects, and IT specialists) who are responsible for delivering cost-effective HPC solutions that help uncover insights among clients' data so that they can act to optimize business results, product development, and scientific discoveries.

Highly Efficient Data Access with RoCE on IBM Elastic Storage Systems and IBM Spectrum Scale

Highly Efficient Data Access with RoCE on IBM Elastic Storage Systems and IBM Spectrum Scale
Author: Olaf Weiser,Gero Schmidt,Piyush Chaudhary,IBM Redbooks
Publsiher: IBM Redbooks
Total Pages: 58
Release: 2022-02-07
ISBN: 0738460273
Category: Computers
Language: EN, FR, DE, ES & NL

Highly Efficient Data Access with RoCE on IBM Elastic Storage Systems and IBM Spectrum Scale Book Excerpt:

With Remote Direct Memory Access (RDMA), you can make a subset of a host's memory directly available to a remote host. RDMA is available on standard Ethernet-based networks by using the RDMA over Converged Ethernet (RoCE) interface. The RoCE network protocol is an industry-standard initiative by the InfiniBand Trade Association. This IBM® Redpaper publication describes how to set up RoCE to use within an IBM Spectrum® Scale cluster and IBM Elastic Storage® Systems (ESSs). This book is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective storage solutions with IBM Spectrum Scale and IBM ESSs.