Industries

Products

Developer

Knowledge

Pricing

Blog

/

Expert opinion

Weather data = big data

Aug 3, 2025//Expert opinion, Meteorology

Blog post banner

This is why weather data is the epitome of Big Data.

Scott Mackaro

Chief Science Officer

What comes to your mind first when you think of "big data"? Finance, maybe? Healthcare? Entertainment? Gaming must be generating huge volumes of data, right?

What about the weather? We tend to take our daily weather forecast for granted – because, well, how difficult can it be to predict rain or shine a couple of days ahead? It doesn't sound like rocket science - but it kind of is.

To grasp the complexity of weather data, we will put it in the context of Big Data, which is generally defined by five characteristics – or the 5 Vs of Big Data: Volume, Velocity, Variety, Veracity, and Value.

Let’s go through each one to see how weather data is the epitome of Big Data.

Volume

The European Centre for Medium-Range Weather Forecasts (ECMWF), one of the key organizations providing weather data, generates 287 terabytes of data every single day. That's almost six times more than over 2 billion gamers globally (50 TB/day). If we add other major organizations (such as the U.S. National Oceanic and Atmospheric Administration (NOAA), weather data is quickly approaching exascale.

So why exactly is weather data so big? Weather is measured on the surface of the Earth (over both land and water), within the atmosphere, and from space. Modern technology allows for it to be measured across the planet from pole to pole, 24/7, 365 days a year.

These datasets, such as surface sensors, radars, and weather satellites, offer important situational awareness and historical context and are the backbone of weather forecasting technology. These systems generate hundreds of terabytes of data daily and process it in real time.

One primary use of this data is the creation of numerical weather prediction models, or NWPs, which are the foundation of any weather forecast. These models digitally represent the entire Earth-Ocean-Atmosphere (EOA) system, essentially aiming to create a digital twin. They then literally forecast the future using a combination of physics, dynamics, mathematics, and computer science.

Models calculate numerous equations representing the EOA system in four dimensions: horizontal (twice!), vertical, and across time into the future. We are talking about billions of calculations creating enormous datasets.

Velocity

To process such an enormous amount of data, the weather industry uses tremendous computational resources (read: computers). A little-known fact: Weather is one of the original applications of Big Data technologies and the reason many supercomputers exist. 

Think about it: Weather literally cannot wait. To make use of the forecast, we need to pull all this data together and run the models very quickly, generating the latest information and making it available for decisions about approaching weather impacts. The slightest delay and the forecast risks becoming useless.

The same is true for the observation systems themselves. All of this before we even involve technologies like machine learning, deep learning, or generative AI. Each of these systems also needs to be run as quickly as possible to be useful. 

Apart from the actual process of forecasting, weather data must also be delivered quickly using modern, responsive technologies. In some cases, we need technology like webhooks to tell us when to pay attention. Rapidly-developing severe weather - like hail, lightning, and tornado potential - that pose a danger are critical examples of data that must be delivered immediately.

Variety

As alluded to earlier, weather is measured in a variety of ways, by a variety of measurement devices, in a variety of formats, across a variety of time horizons. Just like the weather itself, its measurement is abundant in variety.

There are two types of weather data measurement based on the data source: in-situ and remote. In-situ data describes the atmosphere right where the sensor (e.g. weather sensor or radiosonde) is located. Remote data - satellites and radar - measures the atmosphere away from the sensor. This creates a variety of data formats, projections, and data types. All of this data is then translated and normalized through a process called data curation.

After curation, this data becomes an input for a variety of forecast models, based on scale of motion, range, and/or region (global, regional, or local). The sheer variety and volume of data sources, formats, and types is a data engineer’s dream (or nightmare) - and with new data types emerging all the time, data curation is a moving target.

Veracity

Weather forecasting is an imperfect science. This likely isn’t surprising, but thankfully, we (weather scientists) know where these imperfections lie and continue to work on improvements.

A 5-day forecast today is every bit as accurate as a 3-day forecast only 10 years ago. And while imperfect, it’s still one of humanity’s greatest achievements. But to keep raising the bar, we need better observations, a better understanding of the science, and access to increasingly stronger and better computational resources. Statistical post-processing techniques, and more recently, Artificial Intelligence, have allowed us to further push weather forecasting's veracity, aka accuracy. Such technology allows us to introduce recent and localized data, which is how we begin to close the gap in forecast error. If you truly want to forecast the conditions in your backyard, you must first measure that local microclimate and then incorporate that information into a modeling system. It may not be easy, but it will be worth it!

Value

It should be clear by now that weather forecasting takes a lot of work – is it even worth it? And why do we need more, better data if it takes so much effort? Well, simply put, weather impacts everyone on the planet. Weather is one of the keys to the protection of life and property. It’s also a crucial component of the global economy, benefiting trillions of dollars in business across the globe.

A recent publication in Science journal demonstrates just how impactful weather can be. The authors quantified the economic impact of the two strongest, most recent El Niño events (’82-’83 and ’97-’98), showing an estimated $4T-$5T in global income losses. 

Energy. Agriculture. Supply chain. Transportation. Operations. Insurance. Each of these requires reliable weather data. The current list of applications for weather data is large, and will only keep growing - we literally can't fathom every future weather data use case; the need is so vast.

It's no wonder that weather remains one of the biggest data challenges we face. Weather science is about more than just daily forecasts. It's about building resilient, safe, and sustainable societies globally. The weather business needs a new standard. The stakes are too high for Earth to continue to base crucial decisions on “good enough" data - especially when great data can save energy, resources, and ultimately, lives.


So, what do we do with all this data?

We collect data from hundreds of thousands of sources worldwide, processing over 5 petabytes (5 million GB) of information each year. We enrich already robust public datasets with our proprietary data and models, serving industries such as transportation, energy, insurance, and more. With our advanced AI tools and expertise, we are committed to extracting maximum value and insight from all available data.

Weather data, as a prime example of big data, is an ideal application for AI—and at Xweather, our AI-driven forecasting truly excels. Thanks to Vaisala's proprietary sources, we have access to quality-controlled, ground-truth data that enables our AI to deliver exceptional results. But importantly, we remain method agnostic: we use whichever approach, be it AI or physics-based modeling, that produces the most accurate and reliable predictions. Why? Because we are dedicated to ensuring our customers are the first to know the future and prepare for whatever it brings, and use all the tools in our toolbox to deliver on that commitment.

Scott Mackaro

Chief Science Officer

Related Resources

Related
Resources

Expert opinion

The AI forecasting boom: What’s hype, what’s real, and what’s next

Expert opinion

Steering clear of 'bad mojo': The case for applied tradecraft and ensemble forecasting

Expert opinion

Time is money: How AI changes weather forecasting

Previous

The $25b revolution: from high stakes guessing to matter-of-fact triggers

=

Next

Installing a GroundCast road weather sensor in Cranberry, PA