Petabyte
Symbol: PBWorldwide
O que é um/uma Petabyte (PB)?
Formal Definition
The petabyte (symbol: PB) is a unit of digital information storage equal to 10¹⁵ bytes, or exactly 1,000,000,000,000,000 bytes (one quadrillion bytes). In the International System of Units (SI), the prefix "peta-" denotes a factor of 10¹⁵, making one petabyte equal to 1,000 terabytes, 1,000,000 gigabytes, or 10⁹ megabytes. The petabyte follows the standard decimal (base-10) convention used by the SI and by storage device manufacturers.
It is important to distinguish the petabyte from the pebibyte (PiB), which is its binary counterpart defined by the International Electrotechnical Commission (IEC). One pebibyte equals 2⁵⁰ bytes, or 1,125,899,906,842,624 bytes — approximately 12.6% more than one petabyte. Operating systems such as Windows historically reported storage sizes using binary calculations but labeled them with SI prefixes, leading to widespread confusion. Modern standards increasingly adopt the IEC binary prefixes (pebi-, tebi-, gibi-) for powers of 1024 and reserve SI prefixes (peta-, tera-, giga-) for powers of 1000.
Role in Digital Storage
The petabyte sits in the upper tier of commonly used storage units, above the terabyte and below the exabyte. As data generation accelerates globally, the petabyte has transitioned from an abstract concept to a practical unit routinely used by cloud service providers, scientific research institutions, and large enterprises. Major data centers operated by companies such as Google, Amazon, and Microsoft collectively store hundreds of exabytes of data, meaning individual facilities commonly manage tens of petabytes.
Etymology
Origin of the Prefix
The prefix "peta-" was adopted by the International System of Units in 1975, during the 15th General Conference on Weights and Measures (CGPM). It derives from the Greek word "pente" (πέντε), meaning "five," because the petabyte represents 1000⁵ bytes. The naming convention follows the SI pattern of using Greek-derived prefixes for large multipliers: kilo (10³), mega (10⁶), giga (10⁹), tera (10¹²), and peta (10¹⁵).
Evolution in Computing Context
The word "byte" itself was coined by Werner Buchholz at IBM in 1956 during the design of the IBM Stretch computer. Originally, a byte could vary in size, but the eight-bit byte became standard by the 1970s. As storage technology advanced through magnetic tape, hard disk drives, optical media, and solid-state drives, the need for larger unit prefixes grew. The petabyte entered common technical vocabulary in the 2000s as enterprise storage systems and scientific datasets began approaching and exceeding this scale. The term gained broader public recognition around 2010 as cloud computing and big data analytics became mainstream concepts.
Precise Definition
SI Standard
Under the International System of Units (SI), one petabyte is defined as exactly 10¹⁵ bytes, or 1,000 terabytes. This definition is used consistently by storage device manufacturers, telecommunications companies, and international standards bodies including the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). When a hard drive or SSD is labeled as having a capacity of 1 PB, it contains 10¹⁵ bytes of usable storage before formatting overhead.
Binary Equivalent
The binary equivalent of the petabyte is the pebibyte (PiB), defined as 2⁵⁰ bytes (1,125,899,906,842,624 bytes). The distinction exists because computers operate in binary (base-2), where memory and addressing naturally scale in powers of 1024 rather than 1000. The IEC introduced the binary prefixes in 1998 (kibi-, mebi-, gibi-, tebi-, pebi-, exbi-) to resolve the ambiguity, but adoption has been slow outside of technical documentation. In practice, when system administrators discuss petabytes in data center contexts, they may mean either 10¹⁵ or 2⁵⁰ bytes depending on the tool or platform in use.
Data Transfer Context
In data transfer and networking, petabytes are also used to measure cumulative throughput. Internet exchange points, undersea cables, and content delivery networks routinely move petabytes of data per day. The distinction between petabytes and petabits (Pb, equal to 10¹⁵ bits or one-eighth of a petabyte) is critical in networking contexts, where bandwidth is typically measured in bits per second.
História
The Growth of Digital Storage
The concept of a petabyte would have been incomprehensible to early computing pioneers. The ENIAC computer of 1945 had no persistent storage at all. The first commercial hard disk drive, the IBM 350 Disk Storage Unit of 1956, held approximately 3.75 megabytes — roughly 267 million times less than a single petabyte. Throughout the following decades, storage capacity grew exponentially, roughly doubling every 12 to 18 months in accordance with trends observed by Gordon Moore and others.
By the mid-1990s, the largest data repositories in the world — such as those maintained by the US National Security Agency and CERN — were reaching the petabyte scale. The Large Hadron Collider (LHC) at CERN, which began operations in 2008, generates approximately 1 petabyte of raw data per second during collisions, though only a fraction is retained after real-time filtering. The Worldwide LHC Computing Grid processes and stores roughly 200 petabytes of data annually.
Commercialization of Petabyte Storage
The commercialization of petabyte-scale storage began in earnest around 2005-2010 with the rise of cloud computing. Amazon Web Services launched its Simple Storage Service (S3) in 2006, and by 2012 the service stored over 1 exabyte (1,000 petabytes) of customer data. Google processes over 20 petabytes of data per day through its MapReduce and BigQuery systems. Facebook reported storing over 300 petabytes of user data by 2014, a number that has since grown many times over.
Physical Petabyte Media
The first single storage systems capable of holding one petabyte were tape libraries. IBM's TS3500 tape library, introduced in the late 2000s, could scale to multiple petabytes using thousands of tape cartridges. The first individual hard drives to reach the terabyte mark appeared in 2007 (Hitachi Deskstar 7K1000), meaning a petabyte array required roughly 1,000 drives. By 2024, individual hard drives reached 30 TB, reducing a petabyte to approximately 34 drives. Solid-state drives have followed a similar trajectory, with enterprise SSDs reaching 100 TB in capacity by 2023.
Uso atual
Cloud Computing and Data Centers
The petabyte is the standard working unit for enterprise cloud storage and data center capacity planning. Major cloud providers including Amazon Web Services, Microsoft Azure, and Google Cloud Platform offer storage tiers measured in petabytes. Organizations routinely store and process petabytes of data for analytics, machine learning model training, and archival purposes. A single large enterprise customer may maintain 10 to 100 petabytes in cloud storage.
Scientific Research
In scientific research, the petabyte is essential for high-energy physics, genomics, astronomy, and climate science. The Square Kilometre Array (SKA) radio telescope, currently under construction, is expected to generate approximately 1 exabyte of raw data per day. The Human Genome Project's data archives exceed 40 petabytes. NASA's Earth Observing System Data and Information System (EOSDIS) manages over 60 petabytes of Earth science data, growing by several petabytes per year.
Media and Entertainment
Netflix stores its entire content library — including multiple encoded versions at different resolutions and bitrates for adaptive streaming — in approximately 100-200 petabytes. YouTube receives over 500 hours of video uploads per minute, generating petabytes of new content weekly. Major film studios use petabyte-scale storage for visual effects rendering, where a single feature film may require 1 to 10 petabytes of intermediate data during production.
Government and Intelligence
Government agencies are among the largest consumers of petabyte-scale storage. The US National Security Agency's data center in Bluffdale, Utah, reportedly has a storage capacity in the exabyte range. National weather services worldwide store petabytes of observational data and model outputs. Census bureaus, tax authorities, and healthcare systems all manage datasets measured in petabytes.
Everyday Use
Putting Petabytes in Perspective
While individual consumers rarely encounter the petabyte directly, it provides a useful frame of reference for understanding the scale of digital information. One petabyte is equivalent to approximately 500 billion pages of standard printed text, or roughly 13.3 years of continuous HD video (at 1080p, 5 Mbps). It would take about 745 million floppy disks (1.44 MB each) to store one petabyte. If printed out on standard A4 paper, one petabyte of text would create a stack approximately 64 kilometers high.
Consumer Data Generation
Collectively, consumers generate petabytes of data daily through social media, messaging, photography, and video. An average smartphone user generates approximately 6 to 7 GB of data per month. Multiplied across billions of smartphone users worldwide, this amounts to many exabytes per month. A single petabyte could store approximately 150,000 users' annual smartphone data.
Home Storage Trends
Consumer storage has grown dramatically but remains well below the petabyte level. A typical household in 2024 might have 2-10 TB of total storage across computers, external drives, and NAS devices. Reaching a petabyte of home storage would require approximately 100 of the largest consumer hard drives available. However, household data needs are growing rapidly due to 4K and 8K video, virtual reality content, and increasing numbers of connected devices.
In Science & Industry
Particle Physics
Particle physics was among the first scientific disciplines to require petabyte-scale data management. The Large Hadron Collider at CERN produces approximately 1 PB of collision data per second, though sophisticated triggering systems reduce the recorded data to roughly 50 PB per year. The data is distributed to over 170 computing centers in 42 countries through the Worldwide LHC Computing Grid. Analyzing this data led to the discovery of the Higgs boson in 2012.
Genomics and Bioinformatics
Modern genomics generates vast quantities of data. Sequencing a single human genome produces approximately 200 GB of raw data. Large-scale projects such as the UK Biobank (500,000 genomes), the All of Us Research Program (1 million+ genomes), and the 100,000 Genomes Project produce data measured in petabytes. The Sequence Read Archive (SRA) at the National Center for Biotechnology Information held over 50 petabytes of genomic data by 2023.
Astronomy and Earth Science
Astronomical surveys generate petabytes of imaging data. The Vera C. Rubin Observatory's Legacy Survey of Space and Time (LSST) will capture 20 TB of data per night and accumulate roughly 60 PB over its 10-year survey. Climate models run by organizations such as NOAA and the European Centre for Medium-Range Weather Forecasts (ECMWF) produce petabytes of simulation data for each major model run.
Artificial Intelligence
Training large AI models requires petabytes of training data. OpenAI's GPT-4 was trained on a dataset estimated at several petabytes of text and code. Large image generation models such as Stable Diffusion and DALL-E were trained on billions of images totaling multiple petabytes. The trend toward ever-larger training datasets means that AI research is a major driver of petabyte-scale storage demand.
Interesting Facts
If you tried to download one petabyte over a typical home internet connection of 100 Mbps, it would take approximately 2.5 years of continuous downloading — with no interruptions, 24 hours a day.
The entire written works of humanity — every book, article, and document ever produced — are estimated at roughly 400 petabytes when digitized. The Library of Congress, with over 170 million items, comprises approximately 20 petabytes.
A single petabyte could store approximately 3.4 years of 24/7 ultra-high-definition 4K video recording, or about 250 million high-resolution digital photographs from a modern smartphone.
Google processes over 20 petabytes of data per day, including search queries, Gmail messages, YouTube videos, and Maps data. This is equivalent to processing the entire printed collection of the Library of Congress roughly once per day.
The human brain's theoretical storage capacity has been estimated at approximately 2.5 petabytes by researchers at the Salk Institute, based on the number of synaptic connections and their potential states.
In 2024, the cost of storing one petabyte on enterprise hard drives was approximately $15,000-$25,000 — a dramatic decrease from 2000, when the same storage would have cost over $100 million.
The Wayback Machine operated by the Internet Archive stores over 100 petabytes of web page snapshots, representing a significant fraction of the publicly accessible internet's history since 1996.