By Barak Engel
Founder/Chief Geek: Eammune, CISO: Stubhub, CISO: Amplitude, CISO: Bond.Tech, vCISO: TraceData
Barak is a highly experienced CISO, established security expert, and respected industry veteran. As the originator of the "virtual CISO" concept, Barak, with his company EAmmune, has built and managed as the named CISO dozens of security organizations across many industries, for notable brands such as MuleSoft and StubHub. A frequent and sought-after public speaker, his highly pragmatic, no-nonsense, and occasionally whimsical approach to security management is captured in his recently-published book "Why CISOs Fail - the Missing Link in Security Management, and How to Fix it".
In the last couple of decades, rapid technology development and shift to the cloud have created tremendous strain for engineering and technology departments. Organizations develop tools and software to make use of all the juicy opportunities available in the cloud.
Moreover, according to Gartner, 83% of all internet traffic in 2020 are API calls. This became a new challenge for security professionals in various industries
The biggest challenge of operating in the cloud is the occurrence of abstraction layers. Before processes were different, they were based on the hardware and the owner of the device managed its operating system top to bottom. Now, it is the opposite. Every layer of the network is suddenly being managed for us. Customers of any cloud do not have access to the network. The hypervisor is managed for them, so there is no access to the virtualization layer. Lack of visibility makes it much harder to understand the environment in detail. The same is happening in applications. APIs, and XML being the earliest form of them, are abstraction layers of the underlying application.
With the movement of applications to the cloud, enterprise customers of SaaS or PaaS vendors become more interested in how their data is handled in the cloud environment that is being managed on their behalf. And the fact that hundreds or even thousands of cloud platform employees can have access to every component within products in the cloud, is being ignored. Employees are able to capture the encryption keys if they would have the intention to do so. That is why fuzzy boundaries in data, systems, operations, and responsibilities are the challenges we're all trying to deal with.
A shared responsibility model is an attempt to capture this problem. Generally, it claims that there is a safe box to interact with and an element of trust that has to be built. So, the vendor is not looking at what is happening within the application sandbox, but in case illegal actions appear, it will cooperate with law enforcement. So clients should be responsible for what they do. In this context, the notion of clear control and strict responsibility boundaries is no longer in place.
Coming back to the theme of how to protect the API. Web application firewalls are fairly standardized, so you can develop a WAF that uses signatures. Because at the end of the day, the protocol is standard: 80, 443 (or HTTP/HTTPS) operates the same way, and typically the transactions within them are standardized. But that is no longer true in an API. While an API might connect on 443, all the language that is driving it is completely customized. So what straightforward WAFs could do, an API firewall can’t. It cannot rely on prior knowledge of the protocol.
Technology stacks shift constantly. Some technology becomes trendy, everybody copies it and finally, it is replicated and used in most organizations. Additionally, engineers move from company to company quite rapidly and carry the virtual stack with them. This stack gets implemented because there is no time to explore others as engineers have to develop quickly. All of this creates another strain on mature technology management.
Furthermore, the whole process of development is becoming dependent on extreme specialization in particular areas and codebases that are distributed in containing code components for many different contributors, authors, and sources. So if one team member leaves in most cases nobody wants to touch their code since it seems to work well. But the problem then lies in the security world. Hackers love it because nobody is maintaining this piece of code and they might find a way to break it.
Another complexity is hidden in the way that we write code. Now a lot of engineering teams are assembling code, rather than writing it from scratch. Building blocks, libraries that have already been built to help us create applications. But how many engineers actually inspect the code they are incorporating from libraries? Discipline is hard to develop when the engineering team is incentivized to release, not to maintain. Additionally, human mistakes will take place in any case.
Next, continuous integration and development create other boundary challenges. The discipline of DevOps can be split into two different kinds: DEVops and devOPS.
Which portion is stressed is specific for each organization and is defined, typically by implication and hidden assumption, inside the company. But at the end of the day, engineering now runs operations in the flavor of modern CI/CD. And for more traditional technology management, this is a challenge, because there is often no clear boundary between the team that creates the product and the team that manages the manifestation – running environment – of that product. APIs and the underlying applications iterate rapidly. SaaS or PasS providers, if they apply CI/CD, may release products 6, 12, 15 times a day. That means that the proper approval chain of command for changes is impossible to apply. There is just no time to have a committee to evaluate every change. It’s all done via automation.
In a way, it can be really nice to put a traditional WAF in front of an API endpoint. The WAF, not understanding the underlying API language, will not report on much and you’ll get a false sense of security. No alerts from the API endpoint will make you think that you are protected - which is not true.
Secondly, all the API endpoints are public. So they are both public and customized, not conforming to anything else that we know. They misleadingly seem to be easy to protect by just putting the same controls that would be suitable for anything else that’s located in a public interface. And that is ideal for attackers since they can constantly test various attack scenarios and not raise any alarms. The reason for that can be that the protection mechanisms do not even recognize the underlying protocol they are trying to protect. So the created false sense of security in this case is probably one of the bigger risks in the world of API management.
And the last bit is that you can no longer rely on any tool that uses signatures. Since everything is written for a particular application, in a particular environment, there’s no real way to build a product that relies on pre-acquired knowledge. And this, ultimately, leads you to use artificial intelligence.
The second leg of that stool is API endpoints testing, which is difficult and costly. You can't easily automate a tool to test a custom language. It's similar to checking English grammar with one’s deep knowledge of Swahili. Finding issues with OWASP will not help as well, because the API interactions are written in the language that your tool simply doesn't understand. Unique customized data exchange mechanisms are impossible to effectively test with any sort of automated tooling except with the addition of manual efforts. Moreover, the handling of the data is also masked for multiple layers of abstraction. And the tool doesn't even know where to look for them. Scans do not work for the same reasons.
The way you typically test an API endpoint is with the help of Burp Suite, iterating through all of the given calls. Typically the organization will test all the public and private calls they can. If a hacker or pentester aims to test the API they will perform exactly the same iterations. But the problem here is the high cost since the analysis should be performed on a case by case basis by experts with specialized skills. The proper penetration testing of API endpoints in an organization with 100 and 1000 APIs being published, performed regularly gets expensive quite quickly.
What sometimes results is a situation when an organization orders a pentest against the API endpoints, which as we know, are fully customized.
And without the necessary prior knowledge of the tools, they detect nothing. So all the automated (reconn, or discovery) tests fail and the report contains no results which look like a wonderful pentest report: the public interface in your environment, your website is so secure that we could not find any holes, issues, weaknesses, or vulnerabilities, and we effectively can't even “see” the endpoint! This is sometimes called a “stealth” website, implying that the hackers targeting to hack your environment will ignore it since it’s invisible. That is total nonsense. These situations happen much more frequently than anybody likes to admit.
And the third leg of the stool is monitoring API endpoints, which is also difficult. We mentioned that signatures are generally useless. There are no real patterns that we can use. Because if the API is customized, the malicious activity is going to be customized as well. So to start detecting malicious activity for the particular API endpoints, we have to recognize the patterns specific for these API endpoints under normal conditions. And now repeat that for every API endpoint.
Let's project into the future. Let's say you tried to put a control mechanism in front of an API. But the underlying application that the API serves include the artificial intelligence that constantly adjusts the way it interacts with other parties. So not only do you have no patterns, not only can you use no signatures; now even if you try to analyze the API properly, your analysis might be outdated a week from today. It’s probably fair to predict that in the future we will end up with a scenario where you have an AI, attempting to attack and an AI attempting to protect the same application.
We talked about the three legs of a stool of why APIs are so difficult to protect, so difficult to test and so difficult to monitor.
What would be a list of traits that we would desire from an API protection mechanism? We could call it an API firewall, but I think the term API firewall may already be misused.
The first trait clearly is that it has to be adaptive, it has to learn what the API endpoint is trying to do. It has to adapt itself to that API with all of its internal complexity as an abstraction layer, before we can pretend to build some sort of protection around it. And we can't rely on humans, we have to find some clever automation, some sort of artificial intelligence, to perform this function rapidly.
It has to be data-centric, it needs to be able to look at data impact for the underlying payload, it needs to be able to dig through all of those abstraction layers to actually get an understanding of what it is that it's trying to protect. And again, none of that is known in advance because each API is custom and the protocol that it is sitting on is custom. What really matters is the impact on the data. That results in practical considerations. If we try to protect against everything, the API endpoint itself will slow down to a crawl and will become unusable. What is important about customizations and an API is that they usually are very narrow. So once you understand them, they become really well defined. It's just that they're only well-defined within their own context.
But you can create something that would automatically adjust, rapidly wrap itself as tightly as possible around one API, in its own particular unique way that will fit no other. And then when it goes to the next API, it goes through the same process. This approach will work. That would satisfy one of the big wishes, since we're not generating a lot of noise because the tighter we wrap around each API endpoint, the less false positives we get. And if you've been in security, you've seen plenty of tools that generate enormous amounts of noise. this sort of approach will take care of that.
It has to be targeted. Each protection has to be targeted specifically to that protocol layer, then the data layer, then the API endpoint. And you can not build them in advance - they have to build themselves to some degree. Otherwise, the costs of implementation skyrocket, once you try to scale beyond one or two different API endpoints.
And so all of that leads to the conclusion that it has to be behavioral. The idea of behavioral AI is not new. In the early stages of web application firewalls, some had a cool behavioral implementation of a web application firewall. Protocols were well understood, WAF’s could look at the protocol implementation and learn the traffic patterns and behavior. And suggest when that pattern falls out of the norm. But they could all rely on the fundamental protocol (HTTP/S) behavior.
A number of challenges, one solution
To sum up, we discussed a number of challenges security professionals face while developing an API and web-application security strategy for their infrastructures.
The Wallarm WAF solution addresses all the discussed above issues in an effective way while using a single platform. Regardless of where applications are running, what tech stack they are based on, and how much traffic the resource consumes, you can get your application, APIs, websites, and any workloads protected. And that is pretty remarkable.