Three of Pure Storage’s top executives, including CEO Charles Giancarlo, participated in an unscripted session where they took questions from press and analysts on a wide range of topics including the expected impact of the company’s new Enterprise Data Cloud, the future of flash versus hard drives, and how the company is dealing with fast-moving AI environments.
The company, best known for its wide range of flash storage arrays, this week highlighted just how far it has shifted the conversation from storage and storage management to a focus on data management with the introduction of its Enterprise Data Cloud.
[Related: Pure Storage CEO: New Enterprise Data Cloud Gives Partners A ‘Huge Consulting Opportunity’]
CEO Charles Giancarlo (left, in photo), co-founder and Chief Visionary Officer John “Coz” Cosgrove (center), and CTO Rob Lee (right) fielded questions about Pure Storage’s new Enterprise Data Cloud (EDC) strategy, which aims to shift the management of data from physical arrays to a cloud-like architecture to improve data management, governance, and compliance.
They also took questions about how AI plays into the Enterprise Data Cloud, and plans for expansion of the technology.
The relationship between storage infrastructures and AI was also front and center, with Giancarlo noting that the shifts caused by AI are still in the early stages.
“I would say last year the conversation was still around reducing expenses,” he said. “I think like most fundamental transformations in technology, the real benefit comes from changing business processes and changing products, and I think we’re just starting to hear about that now this year. We’ve gone through the same path as every other company in terms of how we’re going to use it. And increasingly, we’re using it to change business processes.”
There’s a lot going on at Pure Storage and the storage industry in general. Here are highlights from the session.
On Whether Enterprise Data Cloud Will Work With Third-Party Vendors’ Storage
Giancarlo: Currently, we’re very focused on enabling all of our platforms with all the features to go from features that provide capabilities within a single array to features that operate globally under these presets or recipes that are created and then automatically implemented. We have plans in the future to be able to extend it to third-party arrays. Now, that will always be limited by their functionality, which not only won’t be identical to ours but won’t be identical to [other vendors’] as well. So there’ll be some limitations there.
Types Of AI Workloads Suitable For Enterprise Data Cloud
Giancarlo: The trouble when talking about AI is it’s multi-segmented, and so you generally can get into the technology and into different environments pretty quickly. We’ve been shipping into AI for six years, but up until GenAI, it was so-called parameter-based AI. And in that environment, it may not take a lot of GPUs. It may not take a GPU at all. There, it was really about speed of access and the ability to handle both small and large file types. As we go forward, there’s a much greater range of AI environments. One of the things is that when you go from training to inference to RAG (retrieval-augmented generation), some of the relationships between performance, capacity, read/write speeds, and so forth change in terms of what’s required. And so to the extent that you want to share it across those environments, flexibility really becomes key.
Lee: I think certainly there’s performance aspects. Whether you’re looking at training, at inference, other techniques are going to stretch performance in different ways. We’re working very closely with Nvidia. [If you] look at the broader enterprise, I actually think that’s where storage and data management play a key role, not directly in the performance path. We’re going to keep working on performance. And that’s not where the bottleneck is. I think most enterprises, most clients I speak with, the bottleneck is actually just figuring out where their data sits. It’s very common that I’ll speak with a customer or CIO, and they have great ambitions: ‘I’ve got this great whiz bang model I want to deploy. It’s all my historical data, and I want to make better decisions on fraud detection or this or that.’ And I’ll ask them, ‘Great, this sounds awesome. Where is all this data sitting today? You have this historical data, your transactional data.’ And you get this look across their face. [It’s] spread across six, seven different systems, each of which were probably designed and conceived of in isolation. They were never really contemplated to share data amongst them. They were built by different teams for their own requirements. Go one step further, and a lot of customers in the enterprise don’t necessarily know where all their data is and where it came from. And if we play this forward a little bit in enterprise deployment of AI, not necessarily folks building these models, but sourcing and deploying them, I think that access to data, data tracking, governance, provenance, etc., are going to be the most important pieces. If you’re a big bank, you’re going to get the best model you can. You deploy it. You’re not going to just let it loose. You’re going to measure the heck of that thing, right? You’re going to see what inputs went in, what decisions did it make, and when it goes off the rails, right? You’re going to want to understand how did that happen, and what guardrails do I have to put in place? How do I have to adjust the inputs so that we don’t make this bad trade? That all involves a tremendous amount of data generation and data processing, but also data tracking.
On The State Of Infrastructure Modernization For AI
Giancarlo: [One attendee] described it as a baseball game. Excuse of me for using a USA reference, but it’ll be in the next Olympics in Los Angeles. For the international team here, you’re going to need to learn baseball. But in baseball, there are nine innings, and he described [the current situation] as the audience still taking their seats before the game. … I would say last year the conversation was still around reducing expenses. I think like most fundamental transformations in technology, the real benefit comes from changing business processes and changing products, and I think we’re just starting to hear about that now this year. We’ve gone through the same path as every other company in terms of how we’re going to use it. And increasingly, we’re using it to change business processes.
On How Quickly Channel Partners Will Be Ready For Pure Storage’s New Technologies
Giancarlo: One of the largest investments we make is training channel partners. If you look at the success we’ve had with Evergreen One, which is where we provide our capabilities as a service rather than as products that are sold to the customer, one might think, ‘Well, OK, that’s for large enterprise customers.’ As a matter of fact, we have roughly equal the amount of business that comes from sales to our mid-size customers as to our large customers. We introduced the second version of Fusion last November. And since then, there are many trials going on, a lot of experimentation. … Often the market thinks of commercial customers as being either less sophisticated as enterprises or not taking on some of the newer technologies. We don’t find that at all. We really do see equal numbers of both. There are an equal number of early adopters in both segments. I think the training around this shifting mindsets and shifting paradigms actually takes a while, takes a lot of work, and so we’re going to be putting a lot of effort into that.
What Other Features Are Needed For The Enterprise Data Cloud
Giancarlo: What we’re very excited about, which is something we’re planning to introduce early next year, is a catalog that keeps a record of all copies, all snapshots, all replication of data as it takes place. Why is that exciting? Data is managed very manually, and almost anybody in a company can make a copy of data and put it somewhere for some use. It is remarkable how many cyber events occur on a copy of data the IT organization didn’t know they had, didn’t know where it was, had been sitting by itself without any new cybersecurity policies in place around it, where there was data exfiltration based on credentials that hadn’t been updated in 18 months. Why is that the case? Because a lot of this stuff is manual, and once copied, there was no policy, or maybe the person left the company and the data was never erased. By doing everything in software with policies set for things like cyber, you get data lifecycle management where you won’t have a copy of data that no one knows about anymore. … [Data] now has a much larger role inside an organization, not just for AI, and in the case of attack, you want it to be managed in software with policies. Certainly in my career, I have never really talked about compliance with enthusiasm. I mean, it’s just not something you talk about with enthusiasm. In this case, though, I actually get enthusiastic about having compliance and policy applied in software.
On If There’s A ‘Moore’s Law’ Of Storage
Coz: That’s how [Pure Storage] was effectively started, by looking at the improvement rate of hard drives, seeing it was flattening in the long term. It used to be an ‘OK, my hard drives will double every 18 months in capacity’ kind of thing, and cutting costs, obviously. And as that started to flatten in the early 2000s, that’s what allowed flash to take over at the performance end. And those same long-term curves of how fast flash improves in density, in power, in performance and reliability and cost compared to hard drives, is why we’re so convinced that in a few years, flash will be able to wipe out all of the hard drives. It’s just your standard exponential improvement curve, and we look at that all the time. That’s why we made the set of investments we have, and that’s why we started the company.
Lee: The other interesting curve I’d look at is the growth of data and what’s driving that growth of data. A lot of business is driven by core business applications, databases, et cetera, that basically scales with GDP. [Companies] hire more employees, provision more VMs, transact more business, have more credit card transactions to pump through a database, and it grows. If you look at what’s driving unstructured data growth and consumption, that more or less scales with something like Moore’s Law. … We talk about object storage, certainly we talk about AI. When we look at growth drivers in terms of unstructured data generation, [that’s] why that’s sustainable and really what’s behind it.
Giancarlo: One of the reasons I came to the company was I recognized that flash was improving in price-performance faster than disk. It may change on a year-by-year basis. But the long-term trend is very, very stable. Like processors, flash is decreasing in price literally on Moore’s Law. But the other thing that happened to processors about 15 years ago was this idea of virtualization, that you could do more than one thing on it, and that you could manage it on a data center scale or even a global scale by virtualizing it rather than having it be bespoke for every specific application. That’s another analogy that we felt held true for storage. Instead of bespoke storage arrays for specific applications, make storage more horizontal. Allow it to provide lots of different services, and then organize on a global scale rather than on individual scales.
What An Archetype Customer Looks Like For The New Technology
Giancarlo: Jeffrey Moore wrote a book on this where you have your early adopters, your innovators, and then your late adopters. The early adopters are enthusiastic about technology. They tend to be technology-deep, meaning that they understand the new things. They have confidence in their own ability to analyze new technologies as to whether they will work for them. Later adopters, perhaps because they’re more conservative, perhaps because their organizations are more conservative, are looking for the rest of the industry to prove out the vendor or the technology before using it. You know, those who buy the first model year of the car, versus those who wait for the second, third model year of the car. I would say it certainly is the early adopters that are looking at these new capabilities that we’re putting out. For example, anybody involved in AI right now is an early adopter because there isn’t any technology that has been proven out in that space. Anyone experimenting with AI in any meaningful way is an early adopter. Some of that tends to be true even in security, because there is so much attacker activity that you have to have some of the latest tools. Storage, in general, has been the laggards, because it’s very conservative, of the industry. Very few people get promoted for having great storage, and people get fired when things fail. That tends to build a somewhat conservative mindset. It’s one of the reasons why we’ve had to develop extraordinarily reliable products. And we have extraordinarily reliable products that customers can depend on, that they can upgrade with confidence. But I would say it still means that in this industry, we have put more time and effort and work into getting customers to understand these new capabilities. I would say the early adopters tend to be the same people: people that are confident in their own ability to run their business and evolve in it.
Coz: The customers who are like, ‘Yeah, I’ve always done it this way, and I don’t care,’ or are not interested in change, they’re obviously not early innovators. The customers who are trying to do better for their organizations, who are trying to improve, whether it’s the profitability of their company, the profitability of the services they’re delivering, the services they’re delivering to their internal customers, the ones who are looking to improve, they’re not on the bleeding edge. They’re not taking risks there, but they’re saying, ‘Hey, how can I do better all the time?’ And that’s the best kind of customer.
The Impact Of Enterprise Data Cloud On Future Pure Storage Development
Giancarlo: I believe the future of Pure is increasingly in what you would call the data management side of things. Now, we are very dedicated to and will continue developing the data storage side, but what we want to do, from a customer perspective, is make data storage effectively disappear from a management standpoint, to have auto provisioning, auto management, easy to operate. To allow our customers more tools to manage the data that sits in their environment. And with our products holding the data, we think we’re in an excellent position to help customers manage the data that is held in those systems better than they can do today. Today, data management is literally done manually. It’s in people’s minds, it’s on paper or on spreadsheets, and everything is done for the most part manually. We want them to be able to automate that. They’ll still have to build their policies, but we want to be able to turn those policies into actions in as simple a way as possible. The impact will be efficiency of operation, better management of their data, which results in better security. Security from reduced number of incidents of data being exfiltrated or lost, more efficiency in terms of fewer copies of their data that exist, fewer mistakes, whether that mistake results in a cyber incident or whether the mistake meant a data center failure or a power failure and they lost service because the disaster recovery wasn’t set up properly. And lastly, enabling their production data to be utilized by AI for insights, which is generally not possible today.
Lee: How does it impact our future R&D roadmap, and how we’re thinking about innovation? A couple things. Clearly this is a key part of our strategy expansion as a company. You should expect that as we continue to evolve our core platforms, enabling the Enterprise Data Cloud will become the primary mechanism that will drive all of the IT operations. As we look at any sort of multi-array, multi-site kinds of workflows, we want to push all that stuff into Fusion. We want to expand Fusion beyond just IT management, beyond just driving towards policy-driven IT operations, to get into logical data workflows, et cetera. So I see a lot more focus on Fusion. I think probably there will be some consolidation as we watch how customers interact with their products today to really put Fusion at the forefront as the primary mode. More importantly, there will be more automation systems to interact with the Pure platform.
Coz: The other really big impact will be better adaptability and agility. Virtually every customer who has a group of arrays has some that are very underutilized. With the Enterprise Data Cloud, you’ll be able to get more even utilization and raise the utilization on everything because you have more agility and more adaptability, and you can more easily bring in new products and get them into that cloud. So maybe instead of 30 [percent] or 40 percent utilized, the storage resources will be 60 [percent] or 70 percent utilized. There’ll be a lot of efficiency there. One thing AI is doing is changing the needs of people to access data. Where do they need it? How fast do they need it? How often do they need it? The agility will enable the infrastructure to react to and deliver that so organizations will be able to bring much greater value to their customers, whether it’s their external customers or their internal customers. And that’ll help every company roll out the things they really want from their IT, which is the value-add that generates profits.