Many utilities are realizing that the huge amount of granular data coming at them can be leveraged to tackle their biggest challenges, such as distributed generation and engaging consumers beyond traditional power delivery.

A growing number of power companies are initiating big-data analytics projects, which include reports, dashboards and visualizations of historical data. But will they get the most out of the data by analyzing the past? Can slicing and dicing yesterday's data provide them with the maximum value they so urgently need to address emerging operational and marketing needs?

According to the technology research firm Gartner, predictive analytics can hold significantly more value for companies than simply analyzing the past. Answering questions like: "What will happen?" and "How will we make it happen?" instead of "What happened?" makes a big difference in value. Yet most of the analytics projects that are being initiated by utilities today fall into the latter category.

One of the main reasons for this is the lack of input from machine-learning experts and data scientists who possess a deep understanding of the energy markets. Another reason may be a lack of understanding of what predictive analytics is, how it differs from descriptive analytics, and what value it can create for utilities. 

So what is predictive analytics? Predictive analytics uses advanced machine learning and statistical algorithms to learn from historical data in order to provide accurate forecasts and actionable insights with a predictive nature (defined as "foresights" by Gartner), which enable users to be more proactive and positively influence tomorrow's bottom-line results.

Examples for the energy industry on the operational side include outage predictions, system failure predictions, accurate load forecasting at the meter/sub-meter and sub-hour levels for balancing supply and demand, optimizing demand response programs, and detecting early warnings of irregularities.

On the marketing side, examples include predicting customer churn and churn root-cause analysis, prediction of customers' responses to specific pricing, marketing energy efficiency offerings, detecting early warnings of irregularities of a household's electricity usage and informing the household as part of customer engagement initiatives.

All these examples of predictive analytics applications can have a huge impact on utilities' marketing campaigns and customer engagement, as well as customer acquisition and retention (in competitive markets). Predictive analytics applications can also help balance the grid and reduce risky spikes during peak hours.

These benefits are attractive to utilities. However, they tend to start with descriptive analytics because predictive analytics are deemed expensive, complex and resource-intensive. This is far from the truth. Since predictive analytics projects are usually led by machine-learning experts, the algorithms they develop are almost always fully automated. Moreover, if we take into account the financial impact of the predictive analytics projects, the time to reaping the full value of predictive analytics could be significantly shorter.

In cases where utilities decide to initiate predictive analytics implementations, how can they make sure they derive maximum value out of these projects? Here are four important suggestions resulting from my direct experience working on these solutions.

Correctly define the business objective

When it comes to machine learning and data mining, defining the target function correctly is crucial. The nuances can make all the difference. In many of the cases where we are required by utilities to provide accurate sub-hour load forecasts at the meter level, they define the target function to be maximum accuracy. But when I ask if the penalties of overestimation and underestimation of load are equal, the answer is "no" in many cases. 

At this stage, I suggest that the target function should be maximum profit instead of maximum accuracy, since solving the problem for maximum accuracy holds an assumption that these penalties are symmetric. It is important to understand that these are two different problems that will produce different results. Another example is the importance of aligning the algorithms' definition of accuracy to the business' definition of accuracy, since the method by which accuracy is measured affects the results that the algorithms produce. My suggestion when defining the objectives is to discuss them with a machine-learning expert who has an in-depth understanding of the energy industry.

Start with a proof of concept based on sampled data

The best way to start a predictive analytics implementation is with a proof of concept, which consists of only a sample of the data and does not require integration with other systems. This is a good way to learn the value that could be derived from the data with a minimum investment of time and money.

We always suggest starting with sampled data in order to emphasize the significant business value that can be derived by using predictive analytics (this process usually takes us between one and three weeks to complete). Nevertheless, it is extremely important to sample the data correctly for these proofs of concept and to make sure the sampled data is not biased. What we do in these cases is either sample the data ourselves or provide very detailed instructions on how to do it correctly. The danger with biased data is that accuracy measures may indicate that results and algorithms are on target, when in actuality, the predictions (as well as the accuracy measures) are wrong.

Always insist on receiving the accuracy measures

Although this is a very common requirement of utilities when it comes to granular load forecasting implementations, I have found that when required to provide forecasts for customer responses to energy-efficiency offerings, predicting customer churn, or identifying system failures, utilities don't always require accuracy measures.

Obtaining and understanding these measures is crucial for evaluating the project and for understanding how to deal with the results. When predicting customer responses for a demand response program, for example, it is important to know the false positive measures (that is, what percentage of customers were predicted to say "yes" but instead said "no"), as well as the false negative measures (what percentage of customers were predicted say "no" but instead said "yes").

Opt for transparent predictive algorithms

Algorithms are divided into the categories of "black boxes," in which no one understands what leads the algorithm to produce its results, and "transparent algorithms," in which the method by which the results are achieved are readily recognizable. As an example, for system failure predictions, the black-box algorithm could simply provide the failure probability of a system at a certain time. A transparent algorithm would also present and rank the importance of the parameters that lead to that result or generate a decision tree that presents how results were produced. In most cases, this information has as much business value as the predictions themselves, so you always get more value with transparent predictive algorithms.

There is certainly value to be gained from dashboards and graphs that analyze the past. But the big promise of data analytics can only be realized with granular, accurate predictions.

***

Dr. Noa Ruschin-Rimini is the founder and CEO of Grid4CShe holds a Ph.D. in machine learning and data mining from Tel Aviv University's engineering department, where she specialized in predictive analytics and anomaly detection of time series.

To learn how to effectively manage, analyze and take action on the data generated by grid networks, join Greentech Media at The Soft Grid: Data, Analytics and the Software-Defined Utility conference on September 10-11, 2014 in Menlo Park, Calif.