Google
 

Sunday, September 24, 2017

Azure Event Grid WebHooks - Retries (Part 3)

Building distributed systems is challenging. If not carefully designed and implemented, a failure in one component can cause cascading failures that affect the whole system. That's why patterns like Retry and Circuit Breaker should be considered to improve system resilience. In case of sending WebHooks the situation might be even worse as your system is calling a totally external system with no availability guarantees and over the internet which is less reliable than your internal network.
Continuing on the previous parts of this series (Part 1, Part 2) I'll show how to use Azure Event Grid to overcome this challenge.

Azure Event Grid Retry Policy

Azure Event Grid provides a built-in capability to retry failed requests with exponential backoff, which means that in case the WebHook request fails, it will be retried with increased delays.
As per the documentation failed requests will be retried after 10 seconds, and if the request fails again, it will keep retrying after 30 seconds, 1 minute, 5 minutes, 10 minutes, 30 minutes, and 1 hour. However these numbers aren't exact intervals as Azure Event Grid adds some randomization to these intervals.
Events that take more than 2 hours to be delivered will be expired. This duration should be increased to 24 hours after the preview phase.
This behavior is not trivial to implement which adds to the reasons why using a service like Azure Event Grid should be considered as an alternative to implementing it's capabilities from scratch.

Testing Azure Event Grid Retry

To try this capability and building on the example used in Part 1, I made a change to the AWS Lambda function that receives the WebHook to introduce random failures:

public object Handle(Event[] request)
{
    Event data = request[0];
    if(data.Data.validationCode!=null)
    {
        return new {validationResponse = data.Data.validationCode};
    }

    var random = new Random(Guid.NewGuid().GetHashCode());
    var value = random.Next(1 ,11);

    if(value > 5)
    {
        throw new Exception("Failure!");
    }

    return "";
}

Lines 9-15 produce almost 50% failure rate. When I pushed an event (as shown in the previous posts) to a 1000 WebHook subscribers, the result was the below chart depicting the number of API calls per minute and number of 500 errors per minute:


Number of requests per minute (Blue) - Number of 500 Errors per minute (Orange)

We can observe the following:
  • The number of errors (orange) is almost half the number of requests (blue)
  • Number of requests  per minute is around 1500 for the first minute. My explanation is that since we have 1000 listeners and 50% failure rate, Azure has made extra 500 requests.
  • After a bit less than 2 hours (not shown in the chart for size constraints) the number of errors has dropped to 5 and no more requests were made. This is due to the expiration period during the preview.

Summary

Azure Event Grid is a scalable and resilient service that can be used in case of handling thousands (maybe more) of WebHook receivers. Whether your solution is hosted on premises or on Azure, you can use this service to offload a lot of work and effort.
I wish that Azure Event Grid could give some insights on how events are pushed and received which would help a lot in troubleshooting as the subscriber is usually not under your control. I hope this will become an integrated part of the Azure portal.
It's worth mentioning that other cloud providers support similar functionality as Event Grid that are worth checking, specifically Amazon Simple Notification Service (SNS) and Google Cloud Pub/Sub. Both have overlapping functionality with Azure Event Grid.

Sunday, August 27, 2017

Azure Event Grid WebHooks - Filtering (Part 2)

I my previous post I introduced Azure Event Grid. I demonstrated how simple it is to use Event Grid to push hundreds of events to subscribers using WebHooks.
In today's post I'll show a powerful capability of Event Grid which is filters.

What are Filters?

Subscribing to a topic means that all events pushed to this topic will be pushed to the subscriber. But what if the subscriber is interested only in a subset of the events? For example in my previous post I created a blog topic and all subscribers to this topic will receive notifications about new and updated blog posts, new comments, etc. But some subscribers might be interested only in posts and want to ignore comments. Instead of creating multiple topics for each type of event which will required separate subscriptions, Event Grid has the concept of filters. Filters are applied on the event content of and events will only be pushed to subscribers with matching filters.
The below diagram demonstrates this capability:

Filtering based on Subject prefix/suffix

Azure Event Grid supports two types of filters:
  • Subject prefix and suffix filters.
  • Event type filters.

Subject prefix and suffix filters

In this example I'll use a prefix filter to receive only events with subject starting with "post" using the --subject-begins-with post parameter.

az eventgrid topic event-subscription create --name postsreceiver --subject-begins-with post --endpoint https://twzm3c5ry2.execute-api.ap-southeast-2.amazonaws.com/prod/post -g rg --topic-name blog

Similarly:
az eventgrid topic event-subscription create --name commenstreceiver --subject-begins-with comment  --endpoint https://twzm3c5ry2.execute-api.ap-southeast-2.amazonaws.com/prod/comment -g rg --topic-name blog

An event that looks like:
[
    {
        "id": "2134",
        "eventType": "new",
        "subject": "comments",
        "eventTime": "2017-08-20T23:14:22+1000",
        "data":{
            "content": "Azure Event Grid",
            "postId": "123"
        }
    }
]

Will only be pushed to the second subscriber because it matches the filter.


Filtering based on event type

Another way for the subscriber to filter the pushed message is specifying event types. By default when a new subscription is added the subscriber filter data looks like
"filter": {                       
  "includedEventTypes": [         
    "All"                         
  ],                              
  "isSubjectCaseSensitive": null, 
  "subjectBeginsWith": "", 
  "subjectEndsWith": ""

The includedEventTypes attribute equals to "All" which means that the subscriber will get all events regardless the type.
You can filter on multiple event types as space separated values using the --included-event-types parameter:
az eventgrid topic event-subscription create --name newupdatedreceiver --included-event-types new updated --endpoint https://twzm3c5ry2.execute-api.ap-southeast-2.amazonaws.com/prod/newupdated -g rg --topic-name blog

which results in:
 "filter": {                     
   "includedEventTypes": [       
     "new",                      
     "updated"                   
   ],                            
   "isSubjectCaseSensitive": null,
   "subjectBeginsWith": "",      
   "subjectEndsWith": ""  

        
Which means that only events with type "new" or "updated" will be pushed to this subscriber. This event won't be pushed:
[
    {
        "id": "123456",
        "eventType": "deleted",
        "subject": "posts",
        "eventTime": "2017-08-20T23:14:22+1000",
        "data":{
            "postId": "123"
        }
    }
]

Summary

Enabling the subscriber to have control on which events it will receive based on subject prefix, suffix, or event type (and a mix of these options) is a powerful capability of Azure Event Grid. Routing events in a declarative way without writing any logic on the event source side significantly simplifies this scenario.

Tuesday, August 22, 2017

Azure Event Grid WebHooks (Part 1)

Few days ago, Microsoft announced the new Event Grid service. The service is described as:
"... a fully-managed intelligent event routing service that allows for uniform event consumption using a publish-subscribe model."
Although not directly related, I see this service as a complement to the serverless offerings provided by Microsoft after Azure Functions and Logic Apps.

Event Grid has many capabilities and scenarios. In brief , it's a service that is capable of listening to multiple event sources using topics and publishing them to subscribers or handlers that are interested in these  events.
Event sources can be Blob storage events, Event hub events, custom events, etc. And subscribers can be Azure functions, logic apps, WebHooks.
In this post I'll focus on pushing WebHooks in a scalable, reliable, pay as you go, and easy manner using Event Grid.

Topics, and WebHooks

Topics are a way to categorize events. A publisher defines topics and sends specific events to these topics. Publishers can subscribe to topics to listen and respond to events published by event sources.
The concept of WebHooks is not new. WebHooks are HTTP callbacks that respond to events that were originated in other systems. For example you can create HTTP endpoints that listen to WebHooks published by GitHub when code is pushed to a specific repository. This creates an almost endless number of integration possibilities.
In this post we'll simulate a blogging engine that pushes events when new posts are published. And we'll create a subscriber that listens to these events.

Creating a topic

The first step to publishing a custom event is to create a topic. As other Azure resources, Event Grid topics are created in resource groups. To create a new resource group named "rg" we can execute this command using Azure CLI v2.0.

az group create --name rg --location westus2
I Chose westus2 region because currently Event Grid has limited region availability. But this changes all the time.
The next step is to create a topic in the resource group. We'll name our topic "blog":
az eventgrid topic create --name blog -l westus2 -g rg

When you run the above command, the response should look like:

{                                                                                                    
  "endpoint": "https://blog.westus2-1.eventgrid.azure.net/api/events",                                                     
  "id": "/subscriptions/5f1ef4e8-6358-4a75-b171-58904114fb57/resourceGroups/rg/providers/Microsoft.EventGrid/topics/blog", 
  "location": "westus2",                                                                                                    
  "name": "blog2",                                                                                                          
  "provisioningState": "Succeeded",                                                                                         
  "resourceGroup": "rg",                                                                                                    
  "tags": null,                                                                                                             
  "type": "Microsoft.EventGrid/topics"                                                                                      
}
Observe the endpoint attribute. Now we have the URL to be used to to push events: https://blog.westus2-1.eventgrid.azure.net/api/events.


Subscribing to a topic

To show the capabilities of the Event Grid, I need to create hundreds of subscribers. You can create your subscribers in any HTTP capable framework. I chose to use AWS Lambda functions + API Gateway hosted in Sydney region. This proves that there is no Azure magic by any means. Just pure HTTP WebHooks sent from Azures data centers in west US to AWS data centers in Sydney.
The details of creating Lambda functions and exposing them using API Gateway aren't relevant to this post, the important thing is to understand that I have an endpoint that listens to HTTP requests on: https://twzm3c5ry2.execute-api.ap-southeast-2.amazonaws.com/prod/{id} and forwards them to AWS Lambda implemented in C#.
The command to create a subscription looks like:

az eventgrid topic event-subscription create --name blogreceiver   --endpoint https://twzm3c5ry2.execute-api.ap-southeast-2.amazonaws.com/prod/   -g rg  --topic-name blog 

I created 100 subscriptions using this simple Powershell script:

while($val -ne 100) { $val++ ;  az eventgrid topic event-subscription create --name blogreceiver$val   --endpoint https://twzm3c5ry2.execute-api.ap-southeast-2.amazonaws.com/prod/$val   -g rg  --topic-name blog}

An important thing to notice which is the security implications of this model. If I was able to specify any URL as a subscriber to my topic, I'd be able to use Azure Event Grid as a DDoS attacking tool. That's why subscription verification is very important.

Subscription verification

To verify that the subscription endpoint is a real URL and is really willing to subscribe to the topic, a verification request is sent to the subscription endpoint when the subscription is created. This request looks like:

[
{
    "Id": "dbb80f11-6fbb-4fc3-9c1f-034f00da3b5f",
    "Topic": "/subscriptions/5f1ef4e8-6358-4a75-b171-58904114fb57/resourceGroups/rg/providers/microsoft.eventgrid/topics/blog",
    "Subject": "",
    "Data": {
        "validationCode": "4fc3f59c-2d03-41f4-b466-da65a81f8ba5"
    },
    "EventType": "Microsoft.EventGrid/SubscriptionValidationEvent",
    "EventTime": "2017-08-20T11:11:00.0101361Z"
}
]

The validationCode attribute has a unique key to identify the subscription request. The endpoint should respond to the verification request with the same code:

{"validationResponse":"3158cb2f-a2c4-46ca-96b0-ae2c8562fa43"}

The subscriber

The subscriber is very simple. It checks whether the request has a validation code. If so, it responds with the validation response. Otherwise it just returns 200 or 202.


    public class Event
    {
        public Data Data { get; set; }
    }
    public class Data
    {
        public string validationCode { get; set; }
    }

    public class Receiver
    {
        public object Handle(Event[] request)
        {
            Event data = request[0];
            if(data.Data.validationCode!=null)
            {
                return new {validationResponse = data.Data.validationCode};
            }
            return "";
        }
    }

Note that the AWS API Gateway is responsible for setting the status code to 200.

 

Pushing events

As I showed above, I created 100 subscribers. Now it's time to start pushing events which is a simple post request but of course this request must be authenticated. The authentication methods supported are Shared Access Signature "SAS" and keys. I'll use the latter for simplicity.
To retrieve the key, you can use the management portal or this command:

az eventgrid topic key list --name blog --resource-group rg
To configure my .net core console application that will push the events, I created 2 environment variables using Powershell:
$env:EventGrid:EndPoint = "https://blog.westus2-1.eventgrid.azure.net/api/events"
$env:EventGrid:Key = "HQI2Ff7MoqlV8RFc/U........."
I created a class to read the configuration variables into an instance of it:

class EventGridConfig
{
    public string EndPoint { get; set; }
    public string Key { get; set; }
}
The rest is simple. Reading the configuration variables, and posting an event to the endpoint.

Configuration = builder.Build();
var config = new EventGridConfig();
Configuration.GetSection("EventGrid").Bind(config);

var http = new HttpClient();
string content = @"
    [
        {
            ""id"": ""123"",
            ""eventType"": ""NewPost"",
            ""subject"": ""blog/posts"",
            ""eventTime"": ""2017-08-20T23:14:22+1000"",
            ""data"":{
                ""title"": ""Azure Event Grid"",
                ""author"": ""Hesham A. Amin""
            }
        }
    ]";

http.DefaultRequestHeaders.Add("aeg-sas-key", config.Key);
var result = http.PostAsync(config.EndPoint, new StringContent(content)).Result;
Now it's Azure's Event Grid turn to push this event to the 100 subscribers.

The result

Running the above console application sends a request to Azure Event Hub. In turn in sends the event to the 100 subscribers I've created.
To see the result. I use AWS API Gateway CloudWatch graphs which show the number of requests to my endpoint. I ran the application few times and the result was this graph:
Requests per minute


Summary

In this post I've shown how to use Azure Event Grid to push WebHooks to HTTP endpoints and how to subscribe to these WebHooks.
In next posts I'll explore more capabilities of Azure Event Grid.

Thursday, August 17, 2017

My AWS IaaS playlist for Arabic speakers

If you're an Arabic speaker and interested in learning about AWS IaaS, check my AWS IaaS [Arabic] Youtube playlist. In this series of videos I go step by step creating a scalable, secure web application using AWS infrastructure as a service offering.
I'm following a problem-solution approach. I start with a very basic but functional solution, I identify the challenges the solution has, then I move to the next step in a logical progression towards achieving the end goal.




And if you have no idea what capabilities AWS has, you can check my introductory video. It's a bit dated but still relevant.

Sunday, July 23, 2017

My talk at DDDSydney 2017

It was very excising to attend and speak at DDDSydney 2017. a lot of interesting topics have been presented and the organizers have done a good job classifying the sessions into tracks that one can follow to get a complete picture about a certain area of interest. For example my session "Avoiding death by a thousand containers. Kubernetes to the rescue!" was the last in a track that had sessions about microservices and docker. That made it a logical conclusion on how to host containerized microservices in a highly available and easy to manage environment.

In my demos I used AWS. This choice was intentional since AWS doesn't support Kubernetes out of the box as both Google Container Engine (GKE) and Azure Container Service (ACS) do. I wanted to show that Kubernetes could be deployed to other environments as well. Thanks to Kops (Kubernetes Operations) which made it relatively easy to deploy the Kubernetes cluster on AWS.
I this session I showed how to expose services using an external load balancer and how deployments make it easy to declare the desired state of the Pods deployed to Kubernetes. I also demonstrated the very powerful concept of Labels and Selectors which is a loosely coupled way to connect services to the Pods that contain the service logic.


I Also demonstrated how easy it is to perform an updated to the deployment by switching from Nginx to Apache (httpd).
In another demo I wanted to demonstrate how to connect services inside the cluster. I made a simple .net core web application that counts the number of hits each frontend gets. The hit count is stored in a Redis instance that's exposed through a service.


The interesting part is how the web application determines the address of the Redis instance. As the docker image should be immutable once created, configurations should be stored in the environment.

As in the above code snippet, the environment variable REDIS_SERVICE_HOST is used to get the address of the Redis service. This environment variable is automatically populated by Kubernetes since the Redis service is created before the web application deployment. Otherwise DNS service discovery could be used. I used a simple script to hit the web API and the result was. I also manually deleted Pods that host the web API and thanks to Kubernetes' desired state magic it kept creating new instances automatically. And that was the result of hitting the service:


Requests go through AWS load balancing to Kubernetes nodes. The service passes the requests to Pods hosting the API.

Kubernetes is one of the fast moving open source projects and I think the greatest thing about it is the community and wide support. So if you're planning to host containerized workloads, give it a try!



Saturday, May 20, 2017

Detecting applications causing SQL Server locks

On one of our testing environments, login attempts to a legacy web application that uses MS SQL Server were timing out and failing. I suspected that the reason might be that another process is locking one of the table needed in the login process.
I ran a query similar to this:

SELECT request_mode,
 request_type,
 request_status,
 request_session_id,
 resource_type,
 resource_associated_entity_id,
 CASE resource_associated_entity_id 
  WHEN 0 THEN ''
  ELSE OBJECT_NAME(resource_associated_entity_id)
 END AS Name,
 host_name,
 host_process_id,
 client_interface_name,
 program_name,
 login_name
FROM sys.dm_tran_locks
JOIN sys.dm_exec_sessions
 ON sys.dm_tran_locks.request_session_id = sys.dm_exec_sessions.session_id
WHERE resource_database_id = DB_ID('AdventureWorks2014')


Which produces a result similar to:



It shows that an application is granted exclusive lock on the table EmailAddress, and another query is waiting for a shared lock to read from the table. But who is holding this lock? In my case, by checking the client_interface_name and program_name columns from the result we could identify that a long running VBScript import job was locking the table. I created a simple application that simulates a similar condition which you can check on Github. You can run the application and run the query to see the results.

It's a good practice to include "Application Name" property in your connection strings (as in the provided application source code) to make diagnosing this kind of errors easier.

Saturday, February 18, 2017

Abuse of Story Points

Relative estimates are usually recommended in Agile teams. However nothing mandates a specific sizing units like story points or T-shirt sizing. I believe that - used correctly - relative estimation is a powerful and flexible tool.
I usually prefer T-shirt sizing for road-mapping to determine which features will be included in which releases. When epics are too large and subject to may changes, it makes sense to use an estimation technique that is quick and fun and doesn't give a false indication of accuracy.
On the release level, estimating backlog items using story points helps planning and creating a shared understanding between all team members. However used incorrectly, the team can get really frustrated and might try to avoid story points in favor of another estimation technique.

In a team I'm working with, one of the team members suggested during a sprint retrospective to change the estimation technique from story points to T-shirt sizing. The reasons were:
  • Velocity (measured by story points achieved in a sprint) are sometimes used to compare the performance of different teams.
  • Story points are used as a tool to force the team to do a specific amount of work during a sprint.
Both reasons make a good case against the use of story points.

The first one clearly contradicts with the relative nature of story points as each team has different capacity and baseline for their estimates. Also the fact that some teams use velocity as a primary success metric is a sign of a crappy agile implementation.
The second point is also a bad indicator. The reason is that you simply get what you ask for: If the PO/SM/Manager wants higher velocity then inflated estimates is what (s)he gets. Quite similar to the Observer effect.

Fortunately in our case both of these concerns were based on observations from other teams. Both the Product Owner and Scrum Master were knowledgeable enough to avoid these pitfalls and they explained how our team is using velocity just as a planning tool. However, the fact that some team members might get affected by the surrounding atmosphere in the organization is interesting and brings into attention the importance of having consistent level of maturity and education.

What is your experience with using story points or any other estimation technique? What worked for you and what didn’t? Share your thoughts in a comment below.