Knowing what to ask, and being organized to act on the information gleaned from the data are moving to the fore of our industry. Though data has been around for eons, the huge data sets (5 Petabytes and more) available by combining internally generated data with purchased data from social networks, blogs, etc fundamentally is changing the landscape. Traditional relational databases and business intelligence software were not designed to handle inflow of this magnitude, and to do so in real time.
Assume that you are involved with one of the dozens of companies with more than 1B page views per month and more than 20mm unique visitors per month. The amount of data generated around user behavior is close to infinite. You can slice it by demographics, on-site relationships, borwser, location, OS, engagment, etc. Moreover, you can correlate it with external events like the weather and press releases. Finally, 'special' runs which enable targeting for advertisers or data customers add myriad layers of complexity.
The same way that operating systems have evolved to hide the complexity of computing, it seems to me that the next generation's successes in 'big data' will combine the traits of masking complexity, with a layer of intelligence which highlights the critical data which contains really useful information. We are seeing a nice wave of this nascent trend through the successes of companies in the IT space such as Splunk, Solar Winds and Puppet Labs.