I realized after my post on big data last week that I probably needed to take a step backwards and define “big data” in the first place. What are the characteristics of big data?

  • At a minimum, it’s a big data problem when the size of the data itself is part of the problem (Mike Loukides’ definition), with potentially petabytes or exabytes of data to process. Raw transaction data over a long enough period of time can scale to this size.
  • Frequently, the structure of the data is part of the problem as well. Unstructured data processes require technologies different from the relational database technologies we’ve been accustomed to working with in the past. The number of data sources and potential need to infer relationships among them also can come into play. Sentiment analysis leverages unstructured social media commentary, for example.

Big data technology has the potential to help on the revenue and risk management fronts in two ways:

  • Decreasing the time it takes to perform disk and compute-intensive processes handled by traditional database and analytic technologies, such as customer profitability calculations. In-memory processing is an example of big data technology that greatly reduces processing time, with speed improvements of up to 1,000x (yes, 1,000).
  • Increasing the amount and variety of information that can be used in these processes. New big data technologies can leverage real-time data as well as unstructured data to improve processes such as fraud detection (combining real-time transaction and geolocation data to score transactions, for example) as well as cross-sell (combining transaction history, propensity-to-buy models, and geolocation data to present mobile offers, for example)

Big data can also help answer unstructured questions, such as exploring at patterns of customer behavior to determine why customers buy additional products or leave the institution. Account history coupled with raw delivery channel data (teller/FSR visits, call center calls, ATM calls) and customer contact data (email) can be analyzed for patterns to determine if sales behaviors or offers are working and can identify potential sources of dissatisfaction as well.

My article last week explored the use of transaction data to target web advertising and merchant rewards. Traditional relational database and analytic technologies can do this, but big data improves the targeting by increasing the sources of potential inputs to these models and decreasing the amount of time it takes to run them (think real-time vs. overnight batch). I will be digging deeper into the underlying technologies as well as real-world applications of big data in coming posts and look forward to sharing some success stories.