Unstructured Data: a Challenge for IT Decision-Makers
Terabytes of unstructured data
Found a great post by Paul Weinberg of eChannelLine observing a recent survey report by Taneja Group, an analyst group focused on infrastructure technologies. The company surveyed 238 IT decision-makers in North America and Great Britain around different industries. To give you an idea about the audience, I’d quote that “53 percent of users had 11 terabytes or more of unstructured data in their environments.”
So, 62 percent of the respondents reported that the unstructured data within their companies was growing between 16 and 75 percent per year. Despite the fact that the dispersion in this result is definitely too wide, this seems to be true.
Taneja discovered that the major drivers for unstructured data growth among survey respondents are Microsoft Office (78 percent), e-mail attachments (66 percent), and backup and archival (81 percent combined).
As you can see, despite the adoption of Salesforce CRM and other platforms, enterprises are still suffering from file disintegration. For them, it is still a challenge to manage e-mail attachments and MS Excel spreadsheets for analysis, BI, data warehousing, etc. According to another study by Gartner, 75 percent of the leading companies are incapable of creating a unified view of a customer.
Too expensive to solve
Furthermore, Steve Norall, a senior analyst at Taneja Group, is inclined to think that people are not going to move all of the data into a single storage location. Why?.. Due to huge expenses. (Yep, open-source middleware could be a Joker here.) So, he predicts that file management and integration companies will benefit from this and “should prosper.”
Finally, the majority of respondents expected their file management and control budgets would grow by up to 20 percent in the next 12 months.
This means the problem is really a headache and the executives are ready to pay for a solution. Besides the adoption of open-source data integration and file management software, I expect that any related services are going to be on the rise, as well. One of the probable solutions is to take an open-source tool kit (like Apatar) and allocate a budget for customizing this software to your unique integration needs.
In this case, the unstructured data management costs may stay far below the 20-percent level expected by the executives.
Integrating unstructured data from MS Office
In addition to the E-mail and MS Excel connectors, we at Apatar are currently working on releasing Apatar Merge, a lightweight add-on to integrate MS Word with Salesforce CRM. It will greatly facilitate the process of creating templates and documents across Word letters, e-mails, faxes, etc., and Salesforce.com’s Customers and Leads tables.
Gartner estimates that as much as 80 percent of actual or potentially mission-critical enterprise information takes the form of unstructured or semi-structured data. Addressing this challenge, Apatar Merge will enable users to efficiently build reports—by retrieving information on their contacts, companies, titles, campaigns, etc., and inserting this information right into an e-mail template or any other document for analysis. By using Apatar Merge, corporate managers will be able to save an incredible amount of time and effort due to automation of their marketing campaigns—from preparation and research to implementation and analysis.
(Check out this video tutorial for more on this add-on facilitating daily correspondence and reporting.)
The bottom line
So, while enterprises are struggling with integrating unstructured and semi-structured data, considering the cost as one of the major barriers, open-source technologies can be a resque strategy.
At the same time, even having succeeded in extracting customer data from multiple sources and normalizing it—which is a challenge for 54 percent of orgs, according to Aberdeen Group—other issues may follow. More than a half of companies (52 percent) still have troubles with verification of data accuracy or completeness, which may require implementing data quality initiatives, as well.