IBM Supercharges Management of Massive Amounts of Data -- A Billion Files at Lightning Speed
Posted on: Tuesday, 2 October 2007, 12:00 CDT
IBM (NYSE: IBM) today announced plans to begin shipping a new version of its General Parallel File System (GPFS) software that acts like a search engine to identify and migrate files between different storage pools and feed high-speed business intelligence and scientific computers. Featuring new policy-based automation capabilities, GPFS is a mainstay of technical computing that is increasingly finding its way into the data center and enhancing financial analytics, retail operations and other commercial applications.
The process of managing data from where it is placed when it is created, to where it moves in the storage hierarchy based on management parameters, to where it is copied for disaster recovery or document retention to its eventual archival or deletion is often referred to as information lifecycle management, or ILM. GPFS tightly integrates the policy driven ILM functionality into the file system. Using file virtualization technology to analyze and identify data, this high-performance engine allows GPFS to support policy-based file operations on billions of files in hours instead of weeks.
For instance, with the pre-release version of GPFS, IBM was able to scan one billion files in less than three hours in an internal performance benchmark. Further improving policy performance through parallelization techniques, the company is working to better those performance numbers in future tests.
Recent enhancements to the latest edition of GPFS, Version 3.2, planned for availability October 5 (1), have vastly accelerated the file identification process for managing tiered storage. Additionally, GPFS now supports pools of storage which can be comprised of tape, enabling the seamless maintenance of ever-growing storage infrastructures.
As the performance of the world's leading supercomputers and business systems continues to increase, the need to manage the massive stores of data generated by these systems is also growing at unprecedented rates. More importantly, the combination of the scale of the computers combined with much larger data set sizes supported by parallel file systems allows new classes of questions to be answered with computer technology.
Concurrent access at lightning speed
GPFS provides concurrent access at lightning speed to multiple disk drives and storage devices, fulfilling a key requirement of powerful business intelligence and scientific computing applications that analyze vast quantities of often unstructured information, which may include video, audio, books, transactions, reports and presentations.
"We are taking business intelligence to the next level through the analysis of this metadata," said Scott Handy, vice president of marketing and strategy, IBM Power Systems. "A 500-horsepower engine needs the right mix of fuel and air at the right time to operate at top speed. It's the same with large computer systems. When dealing with massive amounts of data to get deeper levels of business insight, systems need the right mix of data at the right time to operate at full speed. GPFS achieves high levels of performance by making it possible to read and write data in parallel, distributed across multiple disks or servers."
Running on both the AIX, IBM's UNIX-based operating system, and Linux operating systems, the newest release of GPFS offers improvements in scalability and performance, simplified manageability, monitoring and availability. GPFS provides fast, reliable, and flexible access to structured and unstructured data.
For example GPFS can support stunning access speeds of 130+GB/sec to a single file on a two-petabyte file system. It addresses the needs of the most demanding commercial customers by providing storage consolidation, file virtualization and simplified high performance policy-based file management.
The latest version of GPFS includes a number of innovative features, such as:
-- Policy-driven automation, a flexible rule-based processing technique that allows matching the cost, performance or reliability of storage to the value of the data to improve overall performance. Alternately, this may also provide a cost savings to the customer by automatically and transparently moving data (without a path change) to less expensive storage when performance is not critical for that data. -- Clustered network file system (NFS), a scalable, management feature that enables storage administrators to easily deploy and manage a clustered file-serving solution.
Handling data-intensive applications
GPFS is designed to meet the needs of data-intensive applications -- such as risk management and other forms of financial analysis, data mining to determine customer buying behaviors across massive data sets, engineering design, digital media and entertainment, seismic data processing, weather modeling and scientific research -- by providing a single consolidated view of information across multiple systems.
With GPFS for instance, financial services companies are using analytics grids to process financial data for fraud detection. Retailers are able to analyze daily transactions and determine discount policies, optimizing revenues and improving efficiencies. Customers have used GPFS to create a scalable NFS file-serving solution that is capable of supporting hundreds of NFS file servers and petabytes of storage within a single, highly reliable file system.
GPFS continues to enable high-end technical and high performance computing by supporting multi-petabytes of storage and hundreds or thousands of nodes accessing a single file system. For more information on GPFS, please visit http://www.ibm.com/systems/clusters/software/gpfs.html.
GPFS Version 3.2 supports the IBM System p family, including the new POWER6-based IBM System p 570 server, and machines based on Intel or AMD processors such as an IBM System x family environment. Supported operating systems for GPFS Version 3.2 include AIX Version 5.3 and selected versions of Red Hat and SUSE Linux distributions.
About IBM
For more information, please visit www.ibm.com.
IBM is a trademark of IBM Corporation in the United States and/or other countries. All other company/product names and service marks may be trademarks or registered trademarks of their respective companies. UNIX is a registered trademark in the United States and other countries licenses exclusively through The Open Group. Linux is a trademark of Linus Torvalds.
1 - All statements regarding IBM future directions and intent are subject to charge or withdrawal without notice and represent goals and objectives only.
Contact information Rick Bause IBM Media Relations (845) 892-5463 rbause@us.ibm.com
SOURCE: IBM
Source: MARKET WIRE
Related Articles
- LSI High-Performance Storage System Receives Honors in 2009 HPCwire Readers' and Editors' Choice Awards
- LSI 3ware PCI-e SATA RAID Controller Cards Selected by FilmLight for High-Performance Color Grading System
- Sirana Announces New Version of Appanalyzer to Extend Reporting in Microsoft System Center Operations Manager
- NetQoS Performance Center Metrics Available in Microsoft System Center Operations Manager 2007
- Objective Interface and Interpeak Collaborate to Increase Performance of Embedded Systems on a Network
- IBM Sees Collaboration Driving New Systems Paradigms
- RASILIENT Systems and Inventec Group to Co-Develop Low Cost, High Performance Storage Systems for OEMs
- Appro Launches High-Performance Storage Systems to Address Small to Midsize Enterprise and Departmental Installations
- San Diego Supercomputer Center's High-Performance Storage System Achieves Petabyte Milestone
- Exanet and Kodak Polychrome Graphics Enter Into an Agreement for the Worldwide Distribution of ExaStore(TM) Scalable Storage Cluster System Valued at $300 Million Annual Revenues
User Comments (0)

RSS Feeds