Building the Foundation: AI as Organizational Infrastructure

Reba Habib

There is a conceptual distinction that separates organizations that build AI well from those that struggle to scale it, and it is not about model quality, data quantity, or engineering talent. It is about how the organization thinks about what AI is. In most product organizations, AI is treated as a feature layer: something that sits on top of existing products and adds intelligent behavior to specific interactions. In the organizations that are getting AI right at scale, it is treated as infrastructure: a foundational capability that other products, features, and workflows are built on top of.

This distinction has profound consequences for how AI is designed, funded, governed, and evolved. Infrastructure thinking changes what you build, how you build it, who owns it, and how you measure its success. It also changes the role of design within the organization, because infrastructure requires design at a different level of abstraction than features do. Understanding the difference between AI as a feature and AI as infrastructure is one of the most important conceptual moves a product leader or design director can make right now, and it is the subject of this article.

The Feature Layer Trap

When organizations first begin investing seriously in AI, they almost always start by building features. This is rational. Features are bounded, deliverable, and measurable. They map cleanly onto product roadmaps and sprint cycles. They produce demos that executives can see and metrics that investors can evaluate. A conversational assistant, a smart search bar, a content recommendation widget: these are things that can be specified, built, shipped, and declared done.

The problem is not that these features are unworthy. The problem is that building them as features, rather than as expressions of underlying infrastructure, creates a pattern of redundancy, inconsistency, and brittleness that becomes progressively more expensive over time. When each team builds its own AI capabilities independently, the organization ends up with multiple models serving similar purposes with different data, different quality standards, and different maintenance requirements. When the underlying models are embedded directly in product features rather than abstracted into shared services, updating a model requires updating every feature that uses it. When there is no shared infrastructure for data collection, model evaluation, or safety review, every team reinvents those processes, usually imperfectly.

This is the feature layer trap. It is the AI equivalent of every team in a company building its own authentication system rather than using a shared identity service. It works in the short term and creates compounding problems at scale.

Netflix's engineering organization has been transparent about its own experience with this trap. In the early years of Netflix's recommendation system, individual product teams built recommendation models for specific surfaces: the home screen, the search results page, the "more like this" feature. Each model was optimized for its specific surface and built and maintained by the team that owned that surface. As Netflix scaled, the redundancy became untenable. The organization eventually invested in building shared recommendation infrastructure, a set of platform-level capabilities for training, serving, and evaluating recommendation models that individual product teams could build on top of. The transition from feature-level models to infrastructure-level platforms was expensive and disruptive, but it was what allowed Netflix to scale its AI capabilities across a growing product surface without growing its AI engineering organization proportionally.

What AI Infrastructure Actually Means

Infrastructure, in the conventional software sense, refers to the foundational systems that other systems depend on: compute, networking, storage, identity, observability. These systems are not directly visible to end users, but they determine what is possible at the product layer. AI infrastructure operates at the same level of abstraction, but with components that are specific to the AI development and deployment lifecycle.

AI infrastructure includes several distinct layers, each of which has design implications.

The data layer is the foundation. It encompasses the systems for collecting, storing, labeling, versioning, and governing the data that AI models are trained and evaluated on. Organizations that treat AI as infrastructure invest in data infrastructure as a first-class concern, building pipelines that produce high-quality, well-documented, consistently formatted training data across the organization rather than in individual teams. The design implication here is significant: the quality of the data infrastructure directly determines the quality of the AI capabilities built on top of it, and the quality of those capabilities directly determines the quality of the user experiences that designers can create. Data infrastructure is not a technical concern that sits outside the design process; it is a constraint and an enabler that shapes what design can achieve.

The model layer sits above the data layer and encompasses the systems for training, evaluating, versioning, and serving models. Infrastructure thinking at the model layer means building shared model registries, standardized evaluation frameworks, and common serving platforms that any product team can use rather than requiring each team to build those capabilities independently. Google's approach to this, documented in its research on machine learning infrastructure, involves a set of shared platforms that allow teams across the organization to train and deploy models using common tooling and governance processes. This is what allows Google to maintain model quality and safety standards across a product surface that spans billions of users and dozens of distinct products.

The feature layer sits above the model layer and is where most product teams live. This is the layer of embeddings, APIs, and ML-powered services that product engineers and designers interact with directly. When AI infrastructure is well-designed, this layer presents a clean, well-documented interface that abstracts away the complexity of the layers below it. Product teams can build intelligent features without needing to understand the details of model training or data pipeline architecture. When AI infrastructure is poorly designed or absent, product teams are forced to interact directly with the underlying complexity, which produces the redundancy and inconsistency described earlier.

The observability layer cuts across all of the others. It encompasses the systems for monitoring model performance, detecting distribution shift, tracking user outcomes, and auditing model behavior in production. This layer is frequently underinvested in early-stage AI development, and its absence is consistently one of the primary reasons AI products degrade over time in ways that are difficult to diagnose. Infrastructure thinking requires treating observability as a first-class concern from the beginning, not as something to be added once problems appear.

The Design Role in AI Infrastructure

Design's relationship to infrastructure is an area where most organizations are still finding their footing. The conventional design role is organized around user-facing experiences: the interfaces, interactions, and information architectures that users encounter directly. Infrastructure, by definition, is not user-facing, which creates a conceptual gap between where designers are trained to operate and where AI infrastructure decisions are made.

Closing this gap is one of the more important organizational challenges in AI product development, because the design decisions that matter most in AI systems are frequently infrastructure decisions. How the data collection pipeline is designed determines what the model knows about users, which determines what personalization is possible, which determines what experiences designers can create. How the model evaluation framework is designed determines what quality standards the model is held to, which determines how reliably it performs in edge cases, which determines how much the interface needs to compensate for model uncertainty. How the observability layer is designed determines how quickly the team can detect and respond to model degradation, which determines how durable the user experience is over time.

These are design decisions in the fullest sense of the term: they involve tradeoffs, they have user consequences, and they require judgment about what matters. But they are made upstream of the product design process in most organizations, often by data scientists and engineers who are not thinking about their design implications. The organizations that are building AI well are ones where design has found ways to extend its influence into these upstream decisions, either by embedding designers in platform and infrastructure teams or by creating explicit processes for design input into infrastructure-level decisions.

Microsoft Research's work on human-centered AI development has articulated this as a challenge of "shifting left" in the AI development process, borrowing a concept from software quality assurance. Just as quality assurance in conventional software development benefits from moving testing earlier in the development process rather than treating it as a post-development activity, human-centered design in AI development benefits from moving design input earlier in the process, into the data, model, and infrastructure layers rather than treating design as something that happens after the AI capabilities are already fixed.

Infrastructure as a Product Strategy

Treating AI as infrastructure is not just an engineering decision; it is a product strategy decision with significant competitive implications. Organizations that build strong AI infrastructure create compounding advantages over time that are difficult for competitors to replicate quickly.

The first advantage is speed. When AI infrastructure is mature, shipping a new AI-powered feature requires assembling existing capabilities rather than building new ones. A product team that can call a well-designed internal API for personalization, natural language understanding, or content classification can ship AI-powered features much faster than one that has to build those capabilities from scratch for each feature. This compounding speed advantage becomes more significant as the product surface grows.

The second advantage is quality consistency. When AI capabilities are built on shared infrastructure with common quality standards and evaluation frameworks, the quality of those capabilities is consistent across the product. Users do not encounter dramatically different levels of AI quality depending on which part of the product they are using. This consistency is both a quality advantage and a trust advantage, because users' trust in an AI system is significantly influenced by the consistency of its performance.

The third advantage is governance scalability. As AI regulation matures and organizational AI governance requirements grow, the organizations that will be able to respond most efficiently are those with centralized AI infrastructure where governance controls can be applied at the infrastructure level rather than having to be implemented individually in each AI-powered feature. The EU AI Act's requirements for documentation, risk assessment, and human oversight are much more tractable for an organization that has a shared model registry and observability platform than for one where AI capabilities are scattered across dozens of independent feature implementations.

Amazon's AI strategy illustrates this infrastructure-first approach as clearly as any example in the industry. Amazon Web Services exists in part because Amazon built internal infrastructure for its own operations, recognized that infrastructure as strategically valuable, and made it available as a product. The same logic applies to Amazon's AI capabilities. The recommendation systems, the demand forecasting models, the natural language processing capabilities that power Alexa and Amazon search: these are built on shared internal infrastructure that Amazon has invested in over years. Individual product surfaces benefit from that infrastructure, and the infrastructure improves as more product surfaces use it and generate more data and feedback. This is the compounding advantage that infrastructure thinking creates.

Designing for Reuse and Abstraction

The practical design challenge of AI infrastructure is designing for reuse: building AI capabilities at a level of abstraction that makes them genuinely useful across multiple product contexts rather than optimized for one specific use case.

This is harder than it sounds. AI models are typically developed to solve specific problems in specific contexts. A model trained to recommend movies on the Netflix home screen is not, without significant adaptation, also the right model for recommending movies in a "because you watched" context or for recommending new releases to users who have been on the platform for less than a week. The specific context matters enormously for what a good recommendation looks like, what data is available, and what the optimization objective should be.

Designing for reuse requires finding the right level of abstraction: specific enough that the capability is genuinely powerful and not trivially generalizable, but general enough that multiple product contexts can benefit from it. This is a design problem in the same way that designing a good API is a design problem. It requires understanding the full range of use cases that the infrastructure will need to serve, identifying the common patterns across those use cases, and designing an interface that serves the common patterns well without being so general that it loses the specificity that makes AI capabilities valuable.

The Nielsen Norman Group's research on design systems provides a useful analogy here. A well-designed component in a design system is specific enough to be immediately useful but abstract enough to work across multiple contexts. A button component that is perfectly styled for one specific page is not a component; it is a hardcoded element. A button component that is too generic to carry any design intent is not useful. The right level of abstraction is the one that captures the essential design decisions while leaving context-specific decisions to the teams that use the component. The same logic applies to AI infrastructure components.

For design leaders, this analogy points toward a productive reframing of the AI infrastructure challenge. Building AI infrastructure is, in important respects, a design systems problem. It requires the same skills: understanding use cases across the organization, identifying patterns, designing abstractions, creating documentation, and managing the governance of a shared resource that many teams depend on. Design organizations that have maturity in design systems are well-positioned to extend that competency into AI infrastructure design.

Real-World Infrastructure Models

Several organizations have published enough about their AI infrastructure that it is possible to draw concrete lessons about what infrastructure-level AI design looks like in practice.

Google's Vertex AI platform represents one model: a centralized ML infrastructure that Google uses internally and also offers as a commercial product. The design of Vertex AI reflects infrastructure thinking at every level, from the data labeling tools to the model evaluation frameworks to the serving infrastructure. What is instructive about Google's approach is not just the technical architecture but the organizational logic behind it: Google treats its AI infrastructure as a product with its own users (internal engineering teams), its own quality standards, and its own roadmap. This product thinking applied to infrastructure is what makes the infrastructure genuinely useful rather than a lowest-common-denominator platform.

Spotify's personalization infrastructure, as described in the company's engineering blog, offers a different model. Spotify's recommendation and personalization capabilities are built on a shared infrastructure that serves multiple product surfaces: Discover Weekly, Daily Mixes, podcast recommendations, concert recommendations. Each surface has different requirements, but they share underlying infrastructure for understanding user taste, modeling content similarity, and generating personalized rankings. The design challenge Spotify has navigated is building an infrastructure layer that is powerful enough to support sophisticated personalization across all of those surfaces while remaining flexible enough that each surface team can customize its use of the infrastructure for its specific context.

OpenAI's API is, in a sense, an AI infrastructure product designed for external consumption. The design decisions embedded in the API, about how to structure prompts, how to handle context, how to manage model behavior through system prompts, are infrastructure design decisions with significant downstream consequences for the AI-powered products built on top of it. The fact that millions of developers are building on OpenAI's API means that those design decisions propagate across an enormous number of products and user experiences, which is a sobering illustration of how consequential infrastructure design choices can be.

Practical Considerations for Design Leaders

For design leaders working in organizations that are transitioning from feature-level to infrastructure-level AI thinking, there are several practical considerations worth attending to.

The first is the question of where design has a seat in infrastructure decisions. If your organization's AI infrastructure is being designed entirely by engineering and data science teams without design input, the resulting infrastructure will likely produce design constraints that are difficult to work around at the product layer. Making the case for design involvement in infrastructure decisions requires articulating what design can contribute to those decisions, which is primarily an understanding of the user contexts that the infrastructure needs to serve and the quality standards that user-facing applications will require.

The second consideration is design documentation for AI infrastructure. Infrastructure is only as useful as its documentation, and AI infrastructure documentation presents unique challenges because the behavior of AI components is not fully specifiable in advance. A well-documented AI infrastructure component should include not just its API contract but its performance characteristics, its known failure modes, its behavior in edge cases, and the design contexts in which it has been tested and validated. Creating this documentation is partly a design responsibility, because designers are best positioned to specify the user contexts and quality standards that the documentation needs to address.

The third consideration is the governance of shared AI infrastructure. When AI capabilities are shared across multiple product teams, decisions about how those capabilities evolve have consequences for all of the products that depend on them. This requires governance processes that balance the needs of individual product teams against the integrity of the shared infrastructure. Design leaders should be involved in those governance processes, because changes to AI infrastructure frequently have user experience implications that are not immediately visible to the engineering teams managing the infrastructure.

Conclusion

Designing AI as infrastructure is not a technical recommendation. It is a strategic orientation that has implications for how organizations are structured, how design is practiced, and how AI products create value over time. The organizations that are building AI well at scale are not doing so because they have better models or more data. They are doing so because they have made the organizational and design investments required to treat AI as foundational infrastructure rather than as a collection of features.

For design leaders and AI product designers, this means extending the design conversation upstream, into the data systems, model evaluation frameworks, and platform architectures that determine what AI can do in product. It means advocating for infrastructure thinking in organizations that default to feature thinking. And it means developing the conceptual vocabulary to participate in infrastructure decisions that have traditionally been the exclusive domain of engineering and data science.

The user experience of an AI product is not determined by the interface alone. It is determined by the entire stack of infrastructure decisions that produce the AI's behavior. Designing that behavior well requires engaging with that entire stack, which is what infrastructure thinking makes possible.

menu