Domain Events With Convention-Based Registration And Deferred Execution Support

In DDD, we establish bounded contexts and define how they will exchange information, which often occurs by raising events. According to Eric Evans, you should :

“Describe the points of contact between the models, outlining explicit translation for any communication and highlighting any sharing.”

These contexts will often communicate as part of a same database transaction. That’s actually desired, since it will assure integrity between the whole process. However, in the real world things can be tricky. Some actions might depend on committed transactions in order to get triggered. This happens, for instance, to services that run outside of the ongoing database transaction. You don’t want to send an email to notify the user that something changed and have the whole operation rolled back seconds later. Same thing happens when our application consume distributed services that do not support distributed transactions.

What We Want To Achieve

Let’s see our main goals. There are many good ways to implement domain event handlers. This implementation will use a dependency container to control the lifetime of the events raiser, it will register events with the container on a very transparent way and support delayed execution. Let’s review :

Effortless registration

In order to create a new handler for an event, we don’t want to look for a distant class in another project that has hundreds of event-to-handler bindings and add new lines. To that end, we might want to consider to use a dependency container.

Deferred execution support

Handlers that invoke code that are not bound to a transaction should be fired only once the transaction is committed.

Practical Example : Paying an Order on an E-Commerce App

Let’s take as example the payment of an order on an e-commerce application. Once the payment is processed, it raises an event. Then, the bounded context that processes customers orders handles the event and places the order for shipment. Furthermore, the bounded context that manages the inventory handles the same event to subtract from the stock the items that were ordered.

Domain-Events-01.png

Created with Balsamiq Mockups.

Firing The Event

PaymentService is instantiated by the dependency container. An event is raised on the last line of the PayOrder method.

    public class PaymentService : IPaymentService
    {
        private readonly IDomainEventsRaiser _events;
        private readonly IRepository _orderRepository;
        private readonly IRepository _paymentRepository;

        public PaymentService(IRepository paymentRepository, IRepository orderRepository, IDomainEventsRaiser events)
        {
            _paymentRepository = paymentRepository;
            _orderRepository = orderRepository;
            _events = events;
        }

        public void PayOrder(int orderId, decimal amount)
        {
            Order order = _orderRepository.GetById(orderId);

            Payment payment = new Payment()
            {
                OrderId = orderId,
                Amount = amount
            };

            _paymentRepository.Insert(payment);

            _events.Raise(new OrderPaidEvent(payment, order.Items));
        }
    }

IDomainEventsRaiser has the responsibility of raising events and it needs only one method to do so :

    public interface IDomainEventsRaiser
    {
        ///
        /// Raises the given domain event
        /// 

        /// Domain event type
        /// domainEvent">Domain event
        void Raise(T domainEvent) where T : IDomainEvent;
    }

What makes OrderPaidEvent an event is the fact that it implements the IDomainEvent interface. This interface actually does not expose any service. Its only purpose is to serve as a generic constraint to the Raise method of IDomainEventsRaiser to group classes that represent an event.

    public class OrderPaidEvent : IDomainEvent
    {
        public OrderPaidEvent(Payment payment, IEnumerable orderItems)
        {
            Payment = payment;
            OrderItems = new List(orderItems);
        }

        public Payment Payment { get; private set; }

        public List OrderItems { get; set; }
    }

Handling The Event

As we said earlier, we want effortless registration. To that end, we will use some conventions. Every handler will have to implement the interface IHandles, as described below :

    ///<summary>
    /// Handles an event. If there is a database transaction, the
    /// execution is delayed until the transaction is complete.
    /// </summary>

    /// <typeparam name="T"></typeparam>
    public interface IHandles<T> where T : IDomainEvent
    {
        void Handle(T domainEvent);
    }

Once the order is paid, its related items are subtracted from the stock. In other words, once OrderPaidEvent is raised, a SubtractInventaryWhenOrderPaidEventHandler will handle it. The IHandles interface makes the bridge  between a domain event and its handlers.

    public class SubtractInventaryWhenOrderPaidEventHandler : IHandles<OrderPaidEvent>
    {
        private readonly IInventoryService _inventory;

        public SubtractInventaryWhenOrderPaidEventHandler(IInventoryService inventory)
        {
            _inventory = inventory;
        }

        public void Handle(OrderPaidEvent domainEvent)
        {
            foreach (var item in domainEvent.OrderItems)
                _inventory.SubtractAvailability(item.InventoryItemId, item.Quantity);
        }
    }

The handler is also created by the dependency container. That allows dependencies to be passed more easily.

We can have multiple handlers for a same event. That said, once the order is paid, it can be placed for delivery. That means that a PlaceOrderWhenPaidEventHandler will handle this same event (we can also have one single handler that handles multiples events).

    public class PlaceOrderWhenPaidEventHandler : IHandles<OrderPaidEvent>
    {
        private readonly IOrderPlacementService _orderPlacement;

        public PlaceOrderWhenPaidEventHandler(IOrderPlacementService orderPlacement)
        {
            _orderPlacement = orderPlacement;
        }

        public void Handle(OrderPaidEvent domainEvent)
        {
            _orderPlacement.PlaceOrder(domainEvent.Payment.OrderId);
        }
    }

The Domain Events Raiser

Now that we understand how events are fired and handled, let’s see how the events raiser is implemented :

    ///<summary>
    /// Simple domain events raiser that is functional, but doesn't support deferred execution
    /// </summary>

    class DomainEventsRaiser : IDomainEventsRaiser
    {
        ///
        /// Locator of event handlers
        /// 

        private readonly IServiceProvider _locator;

        internal DomainEventsRaiser(IServiceProvider locator)
        {
            _locator = locator;
        }

        ///
        /// Raises the given domain event
        /// 

        /// Domain event type
        ///Domain event
        public void Raise(T domainEvent) where T : IDomainEvent
        {
            //Get all the handlers that handle events of type T
            IHandles[] allHandlers = (IHandles[])_locator.GetService(typeof(IHandles[]));

            if (allHandlers != null && allHandlers.Length > 0)
                foreach (var handler in allHandlers)
                    handler.Handle(domainEvent);
        }
    }

As we can see, a locator finds all the handlers that handles a specific event, then fires each handler sequentially. The domain events raiser is also created by the dependency container. It allows services to have its reference on their constructors.

These same principles could be used for other needs, like CQRS to handle commands.

Convention-Based Registration

You might want to use convention-based registration to services that are a part of a same concept, have similar structure and follow a same pattern of registration and resolving. A more “manual” registration can be more interesting for services that don’t fall on the first rule and require more specific needs.

We will be using Unity on this section, but any decent container should give you similar capabilities.

Container.RegisterTypes(new EventHandlersConvention());

This will make the container register all classes that implements IHandles. That allows the DomainEventsRaiser to use a locator that find the handlers for a specific event.

    ///<summary>
    /// Register the conventions that allows domain events to be raised and handled
    /// </summary>

    class EventHandlersConvention : RegistrationConvention
    {
        public override Func<type, ienumerable> GetFromTypes()
        {
            return WithMappings.FromAllInterfaces;
        }

        public override Func<type, ienumerable> GetInjectionMembers()
        {
            return (t) => new InjectionMember[0];
        }

        public override Func<type, lifetimemanager> GetLifetimeManager()
        {
            return WithLifetime.Transient;
        }

        public override Func<type, string> GetName()
        {
            return WithName.TypeName;
        }

        public override IEnumerable GetTypes()
        {
            Type handlerType = typeof(IHandles<>);

            return AllClasses.FromLoadedAssemblies(skipOnError: false).
                        Where(t => !t.IsAbstract &&
                            t.GetInterfaces().Any(i => i.IsGenericType && i.GetGenericTypeDefinition().Equals(handlerType)));
        }
    }

Important : The lifetime of DomainEventsRaiser is not transient. It can be singleton, but I suggest scoped to the duration of a full transaction, since the state of this object (the events queue that we will see on the next section) is not reusable between one transaction and another.

Adding Deferred Execution Support

We already have a full working example of domain events that are registered using conventions. Now let’s add a teaspoon of real life.

Let’s say, once the order is paid, an e-mail is sent to the customer to inform that his payment has been processed. This is what our process now looks like :

Domain-Events-02.png

Created with Balsamiq Mockups.

We will be handling e-mail sending using the domain events raiser that we proposed for domain events. However, the e-mail should be sent only after the transaction is committed.

So, in our IHandles interface, we add a Deferred property :

    ///<summary>
    /// Handles an event. If there is a database transaction, the
    /// execution is delayed until the transaction is complete.
    /// </summary>

    /// <typeparam name="T"></typeparam>
    public interface IHandles<T> where T : IDomainEvent
    {
        void Handle(T domainEvent);

        bool Deferred { get; }
    }
One single class can handle many events. <strong>NotificationsHandler</strong> demonstrate how this behavior could be achieved :
    public class NotificationsHandler : IHandles, IHandles //<- multiple events handled by a single handler
    {
        private readonly IUserNotifier _notifier;

        public NotificationsHandler(IUserNotifier notifier) // <-- instantiated by the dependencies container
        {
            _notifier = notifier;
        }

        public bool Deferred { get { return true; } } //<- 'true' here indicates that all these events should be invoked only after the transaction is committed

        public void Handle(OrderPaidEvent domainEvent)
        {
            _notifier.Notify("Yay! Your payment has been processed.");
        }

        public void Handle(OrderShippedEvent domainEvent)
        {
            _notifier.Notify(string.Format("Your order has finally been shipped. Address : \"{0}\"", domainEvent.Order.ShipmentAddress));
        }
    }

DeferredDomainEventsRaiser

And DeferredDomainEventsRaiser replaces the simple DomainEventsRaiser that I mentioned earlier. It adds deferred handlers in a Queue and dispatch them once the transaction is committed. Please notice that :

  • Not all events are deferred. Only those that have IHandles.Deferred = true.
  • In addition, events are never deferred if there is no transaction in progress.
    ///<summary>
    /// Domain events handler that supports deferred execution
    /// </summary>

    class DeferredDomainEventsRaiser : IDomainEventsRaiser
    {
        ///
        /// Locator of event handlers
        /// 

        private readonly IServiceProvider _resolver;
        ///
        /// Collection of events queued for later execution
        /// 

        private readonly ConcurrentQueue _pendingHandlers = new ConcurrentQueue();
        ///
        /// Data access state manager
        /// 

        private readonly IDbStateTracker _dbState;

        public DeferredDomainEventsRaiser(IServiceProvider resolver, IDbStateTracker dbState)
        {
            _resolver = resolver;
            _dbState = dbState;

            _dbState.TransactionComplete += this.Flush;
            _dbState.Disposing += this.FlushOrClear;
        }

        ///
        /// Raises the given domain event
        /// 

        /// Domain event type
        /// Domain event
        public void Raise(T domainEvent) where T : IDomainEvent
        {
            //Get all the handlers that handle events of type T
            IHandles[] allHandlers = (IHandles[])_resolver.GetService(typeof(IHandles[]));

            if (allHandlers != null && allHandlers.Length > 0)
            {
                IHandles[] handlersToEnqueue = null;
                IHandles[] handlersToFire = allHandlers;

                if (_dbState.HasPendingChanges())
                {
                    //if there is a transaction in progress, events are enqueued to be executed later

                    handlersToEnqueue = allHandlers.Where(h => h.Deferred).ToArray();

                    if (handlersToEnqueue.Length > 0)
                    {
                        lock (_pendingHandlers)
                            foreach (var handler in handlersToEnqueue)
                                _pendingHandlers.Enqueue(() => handler.Handle(domainEvent));

                        handlersToFire = allHandlers.Except(handlersToEnqueue).ToArray();
                    }
                }

                foreach (var handler in handlersToFire)
                    handler.Handle(domainEvent);
            }
        }

        ///
        /// Fire all the events in the queue
        /// 

        private void Flush()
        {
            Action dispatch;
            lock (_pendingHandlers)
                while (_pendingHandlers.TryDequeue(out dispatch))
                    dispatch();
        }

        ///
        /// Execute all pending events if there is no open transaction. Otherwise, clears the queue without executing them
        /// 

        private void FlushOrClear()
        {
            if (!_dbState.HasPendingChanges())
                Flush();
            else
            {
                //If the state manager was disposed with a transaction in progress, we clear
                //the queue without firing the events because this flow is pretty inconsistent
                //(it could be caused, for instance, by an unhandled exception)
                Clear();
            }
        }

        ///
        /// Clear the pending events without firing them
        /// 

        private void Clear()
        {
            Action dispatch;
            lock (_pendingHandlers)
                while (_pendingHandlers.TryDequeue(out dispatch)) ;
        }
    }

This new interface IDbStateTracker simply expose some events about  the state of the database and it could be implemented by an UnitOfWork (see source code attached to this post, also available on GitHub).

    ///<summary>
    /// Tracks the database's state
    /// </summary>

    public interface IDbStateTracker : IDisposable
    {
        ///
        /// Triggered when disposing
        /// 

        event Action Disposing;

        ///
        /// Triggered when the transaction is completed
        /// 

        event Action TransactionComplete;

        ///
        /// Returns true if there are uncommitted pending changes. Otherwise, returns false.
        /// 

        bool HasPendingChanges();
    }

Putting It All Together

This approach allows event handlers to be registered automatically by using conventions. Also, the same design allows event handlers to be fired online or have their execution delayed until transaction is over. The source code available on GitHub) contains a full working example of the principles discussed here.

Advertisements

Choosing The Right Collection

Everyone once had the experience of not having the right tool for a job. You can get the job done, but it will cost you more effort, maybe some limitations and even some unwanted damage. But what if a person had all the right tools, knew how to use most of them, but still he proactively decided stop thinking about them because he got that generic one that is “ok” for most of the tasks? What could be the consequences?

A good wrench can be enough to hit a nail on the wall, but it will never be as efficient as the simplest hammer.

In this post I will cover some well-known basic data structures, discuss the importance of knowing how each one works and propose a systematic, time complexity-oriented approach for choosing a proper data structure in a given scenario. I will be using .NET collections as example. Warning : Be aware that there are tons of data structures out there and that each one is a perfect fit for a specific context. In this post we will be covering the data structures that are usually available as built-in component on many languages.

How Many Times Did You Use a Collection Other Than List, Array and Dictionary?

When we need to group entities of same type in a collection for general purposes, List(T) a default choice for many developers for a good reason : it’s practical and versatile as data structure. Furthermore, there is a good chance that in a random circumstance it’s a good choice.

So Why Should I Bother About It?!

04 - Word Cloud

The simple answer is pretty straightforward : we shouldn’t be using a tool just because it’s capable of handling the job. When choosing a data structure, it’s important to consider how it will be used and the complexity of the operations that will be performed against it. With that in mind, choosing the best tool for the job is a matter of knowing the proper usage of each one and the complexity of their common operations.

The elaborate answer is still as simple as the simple answer : You should think about the proper data structure because it will improve code quality and readability, not rarely it will be much simpler and productive to implement and, lastly but not least, you will avoid hidden performance issues that are hard to track.

Know How They Work

It’s relevant the number of developers who don’t know (or know, but just don’t care) about how the collections they use work under the hood. This basic knowledge is the key to understand the complexity of their operations. Actually, you don’t have to know the details. A simple glance can be just enough to support your decision about which tool to use under the proper circumstances.

By understanding our environment and making fortunate choices, we avoid common collateral damage that haunts developers on a regular basis and ensure best practices. For instance :

We Avoid Hidden Performance Issues Due To a Collection Misuse

List(T).Add has O(1) amortized time, which is nice, but List(T).Insert() and List(T).Remove() has O(n) complexity. During these operations, the elements after the target index are shifted accordingly in order to reorganize the dynamic array. That said, writing other than at the end of the collection can be computationally expensive and an innocent Insert can be a potential source of performance issues.

04 - ListInsert

When we understand how it works, we can anticipate these potential issues and be saved from being cursed by other developers for generating a problem that is hard to track.

We Ensure Proper Usage of Interfaces And Code is Easier to Understand

When you use a more dedicated collection, you explicitly specify how it should be used. A mere declaration naturally reduces the set of usage possibilities and keeps its affordance clear. For instance, if someone inspects a code he doesn’t know and finds a class that manipulates a Queue(T), it becomes naturally clear that this attribute is not inserting or removing elements at the middle of the collection. It means also that the first element added is always the first to be read and that at some point the element is probably removed after its value is obtained. All that information by reading one single line of code.

By improving affordance, we ensure that the user (i.e. someone who analyses the code) doesn’t need to think (or think a lot) in order to understand what’s going on.

Affordance : a situation where an object’s sensory characteristics intuitively imply its functionality and use. Source: Usability First.

That said, it becomes clear that if we used a List(T) in a scenario where a Queue(T) would be a fit, even if the performance matches, we lose in affordance by not using the right tool, thus compromising readability and interpretation.

Choosing The Right Data Structure : A Flowchart

I created this flowchart as a simple tool to support decision when choosing a data structure. As I said at the beginning, there are tons of data structures out there and each one is a perfect fit for a specific context. This chart contains only common data structures supported natively by many languages.

04 - Data Structures

There Is No Free Lunch

There is no data structure that will fit better in every context. In order to select a proper collection, you need to know what you can give away.

  • A dictionary, hashset or hashtable will give you O(1) amortized complexity for insert /add / remove, but you can’t guarantee order of the elements.
  • A Binary Search Tree will give you O(log n) for insert / remove and a predictable order. It’s far more efficient than a O(n) for big sets, but it’s not as fast as a dictionary.

O(log n) : A logarithmic running-time function is faster than O(n) and slower than O(1). For instance, if you have a collection of 16 elements and need to find a specific element, the worst-case scenario of a O(n) function would be to find the desired element after 16 iterations, against 4 iterations of a O(log n) function (log2 16 = 4) and a direct access of a O(1) function. Example of a O(log n) algorithm : Binary Search.

That said, if you need fast access and you don’t need order, the “limitations” of a dictionary is not even an issue. However, if you need order and elements are constantly entering and leaving at unpredictable positions, most search trees will have O(log n) complexity for both writing and retrieval, which is pretty decent, but not as fast as the dictionary’s amortized constant time.

Everything depends on your needs. The perfect fit is will always be achieving maximum performance with minimum collateral damage due to the intrinsic limitations of a data structure.

Amortization is a method for analyzing a given algorithm’s time complexity. (…) The motivation is that looking at the worst-case run time per operation can be too pessimistic.

Example : If we started out with a dynamic array (List<T>) of size 4, it would take constant time to push four elements onto it. Yet pushing a fifth element onto that array would take longer as the array would have to create a new array of double the current size (8), copy the old elements onto the new array, and then add the new element. (Thus, despite the eventual array copy, we still consider that List.Add has a constant amortized time). Source : Wikipedia.

List, Array

Time complexity

List Array SortedList
Access by index O(1) O(1) O(1)
Access by key O(log n)
Search O(n) O(n) O(log n)
Insert O(n) O(n)
Delete O(n) O(n)
Insert at the end O(1) amortized O(log n)
Delete at the end O(1) O(log n)
  • Lists and arrays are preferable when you need indexed or random access to elements.
  • They are also versatile and commonly used when you just want to group data for general purposes.
  • .NET lists have a Sort() method, which implements a Quick Sort. On average, this algorithm is an O(n*log n) operation. If you need to sort the elements only once and then you just add elements at the end, it will probably be better to use this method than to use a SortedList, since this last one has O(log n) for insertions at the end, against O(1) amortized of List<T>.
  • If you don’t need to add or remove elements after the collection is created, consider to use an array instead of a List<T>.
  • Avoid making heavy Insert() and Remove() operations in a List<T>. It’s an O(n) operation because the array is shifted. Remove an element at the end is O(1) because there is no shifting.
  • Add at the end is O(1) amortized. When the internal array reaches its limit, a new instance is created with double size and the elements are copied. If you know how many elements are to be inserted during creation of the collection, you can define its size during initialization and save time and space.

Stack, Queue

Time complexity

Stack / Queue
Access to the object at the top O(1)
Search O(n)
Insert O(1) amortized
Delete O(1)
  • These are perfect when you need a sequential list where elements are discarded after its value is retrieved.
  • Consider using a Queue<T> for First-in, First-out behavior (FIFO), or its thread-safe version ConcurrentQueue<T>.
  • Consider using a Stack<T> for Last-in, First-out behavior (LIFO), or its thread-safe version ConcurrentStack<T>.
  • Push / Enqueue is O(1) amortized. When the internal array reaches its limit, a new instance is created with double size and the elements are copied. If you know how many elements are to be inserted during creation of the collection, you can define its size during initialization and save time and space.
  • Pop / Dequeue is O(1).

Linked List (Singly, Doubly, Circular)

Time complexity

Linked List
Access to a node itself or its adjacencies O(1)
Search O(n)
Insert O(1)
Delete O(1)
  • Linked lists are preferable when you need constant time insertion and remove at the middle of the collection.
  • It is also useful when memory usage is critical and you don’t know the number of elements to be inserted, since there is no array copy in a linked list when it becomes too big.
  • You should use a Circular linked list if the problem is inherently circular (for instance, to control whose turn it is in a multi-player board game).
  • It should be Doubly-linked if you need to go back in the list, or Singly-linked if you just need to move forward.
  • Linked lists are not indexed. Thus, if you want to keep track of specific nodes, you might consider to keep a reference of them.
  • .NET LinkedList<T> is a Doubly-Linked List.Thus, you can naturally use it for a Singly-Linked List. If you need it to be circular, just make the first node to be the next of the last.

Dictionary, HashSet

Time complexity (Average)

Dictionary / HashSet
Access by key O(1) amortized
Search O(n)
Insert O(1) amortized
Delete O(1) amortized

Time complexity (Worst)

Dictionary / HashSet
Access by key O(n)
Search O(n)
Insert O(n)
Delete O(n)
  • Dictionaries and HashSets are great for fast access, but the order of the elements is not guaranteed.
  • HashSet is preferable over Dictionary when you need a collection of unique values.
  • The complexity of insert / get / remove includes the complexity of calculating the hash. So keep GetHashCode() simple and with constant time.
  • The worst case scenario is GetHashCode() returning the same value for all the entries. Elements with same hash are grouped in the same bucket to avoid collision. In this case, the elements  in the bucket are iterated and Equals() of the key passed as argument is called against the key of each element in the bucket. The complexity of accessing the element is then the complexity of finding the element inside the bucket, which can be O(n).
  • ListDictionary is faster than Hashtable for small collections (10 items or fewer). The Dictionary<tkey, tvalue> generic class provides faster lookup than the SortedDictionary<tkey, tvalue> generic class. The multi-threaded implementation is ConcurrentDictionary<tkey, tvalue>. ConcurrentBag provides fast multi-threaded insertion for unordered data. Source: MSDN.

Search Tree

Time complexity

SortedDictionary SortedList
Access by key O(log n) O(log n)
Access by zero-based index O(1)
Search O(log n) O(log n)
Insert O(log n) O(n)
Delete O(log n) O(n)
Insert at the end O(log n) O(log n)
Delete at the end O(log n) O(log n)
  • Trees are great when you need fast insertion, deletion and retrieval.
  • It’s ideal in cases where data is entering / leaving constantly in different positions.
  • Search tree is a family of data structure and there are tons of trees. You should check which tree is the best for your scenario.
  • The SortedList<TKey, TValue> generic class is a binary search tree. It’s similar to the SortedDictionary<TKey, TValue> generic class. The two classes have similar object models, and both have O(log n) retrieval. The two classes differ in memory use and speed of insertion and removal.
  • SortedList<TKey, TValue> uses less memory than SortedDictionary<TKey,   TValue>.
  • SortedList<TKey, TValue> supports efficient indexed retrieval of keys and values through the collections returned by the Keys and Values properties.
  • SortedDictionary<TKey, TValue> has faster insertion and removal operations for unsorted data, O(log n) as opposed to O(n) for  SortedList<TKey, TValue>.
  • If the list is populated all at once from sorted data, SortedList<TKey,  TValue> is faster than  SortedDictionary<TKey, TValue>.

Putting It All Together

We shouldn’t be using a data structure just because it’s capable of handling the job. It’s important to consider the scenario and the operations that will be performed. The understanding of how things works under the hood helps us make the judgement of which data structure will perform better and ensure application’s performance. It will also improve code quality and readability.

Quality assurance in software development: When should you start the testing process?

“A procedure intended to establish the quality, performance, or reliability of something, especially before it is taken into widespread use” — Test definition at the Oxford dictionary.

Customers don’t like to deal with defected software. They want their demands to be delivered with high quality and in the shortest possible timebox. That testing phase that starts only a few days before releasing the next version of the product might not (and it probably won’t!) be able to ensure product quality.

Below is an example of a typical SDLC:

  • Planning – Goals and objectives are defined, requirements are gathered, costs and resources are estimated and feasibility of alternative solutions is analyzed.
  • Analysis & Design – Features are detailed taking into account the user needs. Wireframes and business rules are defined. Other relevant documents are attached.
  • Construction – Code is written for real in this phase.
  • Testing – Pieces are put together in a test environment to verify the system. Different types of tests can be performed, but that is a conversation for another day.

SDLC1

This cycle is exposed to some problems. As we can see, each activity starts at the end of the previous phase. First, let’s think about bugs found during tests: many of them exist in the system since design or even planning phase, and will probably be much more expensive to fix these bugs after development is completed than it would be if the problems were identified in previous steps. Furthermore, in a predictive planning, a tight deadline combined with a delay in completion of the construction phase can reduce the available time for testing, which might significantly undermine quality of the product.

It is observed that most of the errors found in the testing phase were introduced during requirements gathering or design.

Why testing should start early in the software development life cycle?

The process starts with requirements, and as the project evolves inside the SDLC more efforts are allocated to create or modify the solution, more people are involved and the cost of the project increases. Bugs detected at the end of the process tend to require significantly more effort to be fixed. The sooner the bug is identified, the cheaper it will be to fix the problem. In Software Testing, Ron Patton says that the cost of fixing a bug can be represented by something around a logarithmic function, where the cost can increase by more than 10 times as the project progresses through the phases of the SDLC.

For instance, a bug identified during conception costs something around zero, but when this same bug is found only after implementation or test, the average cost of repair can get to something between 10 and 1000 times more than in the previous step. When customers find this bug in production environment, the cost of the problem considers all side effects related to it and. That is where things can get serious.

Bugs

Main advantages of testing in earlier phases

  • Many problems are introduced into the system during planning or design. Requirements testing anticipate future problems at a significantly lower cost.
  • Since the testing process is involved with all phases of the SDLC, Management will not feel like testing is the bottleneck to release the product.
  • Testers will be more familiar with the software, as they are more involved with the evolution of the product in earlier phases.
  • Test cases written during requirements and shared with the Dev team before the construction phase can help developers to think outside the box and evaluate more chances of failure in their code.
  • The test environment can be prepared in advance, anticipating risks and preventing delays.
  • The risk of having a short time for testing is greatly reduced, increasing test coverage and types of tests performed.
  • Involving quality assurance in all phases of the SDLC helps creating a ‘quality culture’ inside the organization.

Defect prevention: Quality is built in, not added on

Inspection does not improve the quality, nor guarantee quality. Inspection is too late. The quality, good or bad, is already in the product. As Harold F. Dodge said, “You cannot inspect quality into a product.” — Out of the Crisis, pg. 29

Start a test plan at the beginning of the project and identify test requirements. Test requirements are not test cases, as they do not describe the data being used for tests. Data is irrelevant at this level. These tests should be used as input documents for generating Test Cases. Testing should start at the planning phase and it should continue throughout analysis and design phase. At the end of the design phase, integration and unit test cases should be completed.

  • “Validate that you can insert an entry to the repository”
  • “Validate that you can’t insert an entry when the repository already contains one with the same unique identification”
  • “Validate that you can’t insert an entry when the repository reaches 300 entries”
  • “Validate that you can insert an entry to the repository when it is empty (initial test)”
  • “Validate that the full repository can be loaded to the screen in 2 seconds”

SDLC2

Verify that requirements are clear and consistent. It is important to eliminate ambiguities in interpretation caused by some general terms. Some customers may use terms that have different meanings, which compromises analysis of the document.

Discover missing requirements. In many cases, project designers have no clear understanding of modules and assume certain requirements. Requirements should cover all aspects of the system without any assumptions.

Ask the client about the need of requirements that are not related to the project goals. It is important that these requirements are identified and the client asked if it is really necessary. A requirement can be considered irrelevant when its absence causes no significant impact on the project goal.

Some others tests include (Source: link):

  • Does the specification contain a definition of the meaning of every essential subject matter term within the specification?
  • Is every reference to a defined term consistent with its definition?
  • Is the context of the requirements wide enough to cover everything we need to understand?
  • Is every requirement in the specification relevant to this system?
  • Does the specification contain solutions posturing as requirements?
  • Is the stakeholder value defined for each requirement?
  • Is each requirement uniquely identifiable?
  • Is each requirement tagged to all parts of the system where it is used? For any change to requirements, can you identify all parts of the system where this change has an effect?

Defect detection

The tester should report the defect detection efficiency on completion of the project. It measures the process efficiency within the SDLC. It helps to understand and track the phases of the SDLC that are generating more problems and compromising product quality.

The role of developers in defect prevention

Developers must be aligned with the expectations regarding the requirements. In many cases, in order to keep up with the schedule, developers do not invest enough time to review the specification and often ignore important documents or misunderstand some requirements. This kind of ambiguity generates more bugs to be identified at the end of the project and the cost of repair will end up more expensive.

Developers should also create unit tests and review code (and/or have their code reviewed) before commits. Together, these small daily activities make great contribution to defect prevention during construction phase.

In addition, some types of tests certainly worth the consideration of being automated, and an automation team would get involved with the process. The execution of automated tests (UI, load, performance, unit, etc.) can be strongly linked to commits of developers during the construction phase (see Continuous Integration), but that’s a topic for another conversation.

Putting it all together

Defect prevention is an important investment with short-term return. The joint actions not only increase product quality by anticipating issues, but also reduce maintenance cost of the product, increase overall productivity and reduce development time of the project. As a consequence of this combination of factors, customer satisfaction increases, as well as the reliability and reputation of the organization.

A Typescript library for Google Knowledge Graph (Freebase)

Have you ever seen a little box that eventually appears (depends on the query) as part of Google’s search results? Type “Paris” and see the magic in action. The structured content that is displayed is part of the Google Knowledge Graph, a structured database that enhances the search engine.

Google Search - Paris

Google Knowledge Graph

Google Knowledge Graph is a huge database that contains structured data about almost everything. The intention is to enhance the search results with content that includes structured details of a particular topic. You can read more about Google Knowledge Graph at the official blog and Wikipedia.

We know that keywords are the components that form the basis of SEO, but what might caught your attention is that the Knowledge Graph seems to be very well structured, in a way that could not have been built from search terms entered by users. In fact, the knowledge base is supported by Freebase, an open database developed by Metaweb. This company was acquired by Google in 2010.

Freebase is a massive knowledge base. It is structured, searchable, editable and open-licensed (free for commercial and non-commercial use under Creative Commons). It is an ambitious initiative to create a base of semantic data on virtually everything. It is collaborative and was built with the information of many sources, including individual contribution of the online community.

The Knowledge Graph is open, powerful and Google provides an API for remote access

API documentation can be found on Google Developers. If you want direct access to the database, a data dump is available for download here. As we can see, this knowledge base creates endless possibilities for new promising applications.

Some important services:

  • search – useful for recommendation or displaying auto-complete results.
  • topic – useful for displaying a structured article about a specific subject.

You can play with these services using a web interface, here and here.

A Typescript library for Google Knowledge Graph (Freebase)

I developed a Typescript library that wraps the Freebase API and expose some services. The library contains some definitions to provide type checking at compile time and intellisense. I also published the resulting Javascript. The project is available under MIT license (open source) and is available for download on Github.

Freebase code - 01

Freebase code - 02

I hope it is useful for someone. ;)

Developing secure web applications: XSS attack, the Confused Deputy and Over-Posting

The reality is that securing applications is not as much fun as we would like it to be. Nobody comes and says “Wow! Thumbs up for securing my personal data!”. However, security is undoubtedly important in many contexts and cannot be underestimated. We do not want to pass through an embarrassing position caused by a security breach.

In this article, I cover XSS, XSRF and Over-Posting, a fraction of the most common web attacks, and explain how to protect your applications against them.

Some web security principles

  • Implement security in depth: the Web tier responsible for receiving HTTP requests should not be the only layer to perform security checks. Each subsystem must take responsibility for their own security, disregarding the quality of the security checks of the upper layers.
  • Never trust user input: Never assume that you do not need to perform a security check just because it has already been done elsewhere. Moreover, validations in JavaScript can improve the user experience by avoiding unnecessary http requests, but never assume that these validations on the client side are secure. An attacker can bypass client validation without any difficulty.
  • Reduce surface area: Minimize the amount of exposed operations and restrict the inputs of each action to contain only the data required to process the action. Do not expose to the end user exception details or information about the database schema.
  • Enforce the principle of Least Privilege: Perform operations using only the privileges required for the task. If for some reason you need to elevate permissions, grant them but remove when these permissions are no longer needed.
  • Disable unnecessary features: Platforms can provide security flaws. Uninstall features that are not relevant to the execution of your application.
  • Evaluate a model of authentication and authorization that fits well to your application: Authentication is the process of identifying the user who accesses the application. Authorization is the process of verifying what operations the authenticated user can perform. There are different strategies for authentication and authorization. Check the one that best suits your application.

Defending your code against the most common attacks

Users of your site are likely to discover bugs. Some of these bugs reveal security holes. And when an attacker discovers one of these gaps, we have a problem.

Cross-Site Scripting (XSS)

XSS is one of the most common web attacks. According to WhiteHat Security, 40% of the websites has at least one breach for Cross-Site Scripting. In XSS, we inject some client script code to the remote server through user inputs or query strings.

Open http://www.insecurelabs.org/Talk/Details/6 to start playing with XSS. Pay attention to the content written inside the input of the form. Everything that you write in a comment will be rendered after the form is submitted. In a real application, a malicious script inside a comment could inject code to the page and every user that opens the browser would run that script.

XSS01

This is the result

XSS02

This is a classical XSS attack. My posted comment containing a bad script will be rendered within the list of comments of the page. When someone opens the page, the script is executed. In this particular case, the attacker is stealing the cookies related to the domain where the script was injected. Open the article Session hijacking (Wikipedia) to understand the implications of exposing your cookies.

In the words of Jaff Atwood (CodingHoror.com), article Protecting Your Cookies: HttpOnly:

Imagine, then, the surprise of my friend when he noticed some enterprising users on his website were logged in as him and happily banging away on the system with full unfettered administrative privileges.

Preventing XSS

Never trust user input. Encode everything. Instead of displaying content as-is, use HTML encode to render content.

In ASP.NET Web Forms, the code below will not be encoded.

  <%= Model.Comment %> 

Instead, use one of these approaches:

 <%: Model.Comment %> 

 

 <%= Html.Encode(Model.Comment) %> 

Razor encodes by default. The sample below gives an encoded output:

 @Model.Comment 

In ASP.NET MVC, if you want to write raw content as-is (without any encoding), you should write it explicitly and decorate your action with a [ValidateInput(false)] attribute. In this case, security concerns are at your own risk.

 @Html.Raw(Model.HtmlContent) 
[ValidateInput(false)]
public ViewResult Save(FormDataWithRawContent form)
{
...
}

Preventing Cookie Theft – Use HttpOnly and Require SSL

In the Web.config file, write:

<httpCookies domain="" httpOnlyCookies="true" requireSSL="true" />

The httpOnlyCookies attribute will prevent cookies from being accessed from client scripts. The requireSSL attribute will prevent cookies from being sent back to the server when the connection is not secure with SSL. This configuration will stop most XSS attacks and should be used whenever possible.

Cross-Site Request Forgery (XSRF) – The confused deputy

Cross-site request forgery is an attack where the attacker creates a page that includes a form that posts to the attacked site.

<form id="someform"
action="http://attackedTargetSite/Entity/Update"
method="post">

<input type="text" />
<input type="text" />

If the user who submitted the form is authenticated on the target site, the target site will receive and process the post without presenting any complications related to authentication, since the action is being performed by an authenticated user.

The confused deputy attack

XSRF is a form of confused deputy attack.

If an attacker want to perform some administrative actions on a web application that he does not have the proper privileges, a feasible approach is to confuse an user to do these things for him. In the case of an XSRF attack, the confusing deputy is the user’s browser. Here (Haacked.com) we can see a nice practical example about XSRF attack.

A confused deputy is a computer program that is innocently fooled by some other party into misusing its authority. It is a specific type of privilege escalation. In information security, the confused deputy problem is often cited as an example of why capability-based security is important.

From Wikipedia

Preventing XSRF

Reduce surface area. Minimize the amount of actions exposed. Furthermore, create a hidden HTML input field that stores a token generated by your server and validate if a submitted token is valid. This token can be stored in a session variable or cookie.

ASP.NET MVC includes some helpers to prevent against this kind of attack. It generates a anti forgery token inside the form and validate this token against the server when the form is submitted:

@using (Html.BeginForm(...))
{
@Html.AntiForgeryToken("salt")
...
}

 

[ValidateAntiForgeryToken(Salt="salt")]
public ViewResult Update()
{
...
}

Thus, if the source of the request is unknown (i.e. the antiforgery token does not exist or it is not consistent), the action will not be triggered and an error will be thrown to the client application.

Known limitations

  • Users must have cookies enabled, or requests will be rejected.
  • This method works with POST, but not with GET. This should not be a very big problem, since GET requests should only read information and never change the state of the database.
  • XSS holes create possibilities for cookies hijacking and access to the antiforgery token.

Over-Posting

ASP.NET MVC maps model properties in GET and POST operations by using some naming conventions. However, this convenience introduces a security flaw: the attacker can populate properties of the model even if they are not displayed in the form.

Consider a model PlaceReview bound to an action.


public class PlaceReview
{
	public int ReviewID { get; set; } // PK
	public int PlaceID { get; set; } // FK
	public Place Place { get; set; } // FK
	public int NumberOfLikes { get; set; }
	public string UserName { get; set; }
	public string UserComment { get; set; }
}

Now consider that the action is only interested on the UserName and UserComment properties of the model.


<ul>
		<li>
			@Html.LabelFor(m => m.UserName)
			@Html.TextBoxFor(m => m.UserName)
		</li>
		<li>
			@Html.LabelFor(m => m.UserComment)
			@Html.TextBoxFor(m => m.UserComment)
		</li>
	</ul>

A malicious user can manipulate the querystring or post data to add information and change other properties of the model. For instance, he could add “NumberOfLikes = 5000” and the corresponding property would be populated accordingly in the server side application.

Preventing Over-Posting

You can use a BindAttribute on the model class or action parameter to explicitly control the properties that should be mapped by the model binder.

[System.Web.Mvc.Bind(Include="UserName, UserComment")]
public class PlaceReview
{
	public int ReviewID { get; set; } // PK
	public int PlaceID { get; set; } // FK
	public Place Place { get; set; } // FK
	public int NumberOfLikes { get; set; }
	public string UserName { get; set; }
	public string UserComment { get; set; }
}

A similar approach involves calling UpdateModel or TryUpdateModel within the Action.

UpdateModel(placeReview, "PlaceReview",
new string[] { "UserName", "UserComment" });

A better approach is not to use data mapping to represent view model and create specific classes that contain only the properties required for the action and nothing else.

public class PlaceReviewModel
{
	public string UserName { get; set; }
	public string UserComment { get; set; }
}

This last approach significantly reduces security holes related to over-posting and eliminates a problem of stamp coupling.

Stamp coupling occurs when modules share objects and use only a part of it. Sharing more data than what was needed allows the called module to access more data than it really needs.

From Cohesion and coupling: Principles of orthogonal, scalable design

Putting it all together

No application is totally secure, but there are some best practices that can be used to avoid several vulnerabilities. We talked about XSS, XSRF and Over-Posting. Fortunately, there are enough resources available and ready to protect web applications from dangerous attacks that occur frequently out there.

Related readings

The absolute bare minimum every programmer should know about memory management

Advances in technology come with increased complexity. That is just the nature of how technology evolves. However, things get simpler for the users as they are getting more demanding about alignment with their intentions. In “The Invisible Computer: Why Good Products Can Fail, the Personal Computer Is So Complex, and Information Appliances Are the Solution” Donald Normam says:

Most technology goes through cycles of development and change in both internal and external complexity. Often, the very first device is simple, but crude. As the device undergoes the early stages of development, its power and efficiency improve, but so does its complexity. As the technology matures, however, simpler, more effective ways of doing things are developed, and the device becomes easier to use, although usually by becoming more complex inside.

The memory management of the .NET framework is a good example of how complexity can be hidden in a layer of simplicity. However, when we assume that we don’t need to worry about how memory is managed and how garbage collection works, focusing on developing skills in syntax and classes, we risk to take bad design decisions that leads to performance and memory issues. Furthermore, the knowledge about how memory is managed helps us understand how the variables are behaving in our application. In this article I cover the basics about this topic.

Stack

The stack stores the state of a method. Every time a method is called, the .NET creates a container (i.e. stack frame) that stores parameters, local variables and the address of the return point. When the method completes, the frame is removed and the thread continues in the return point defined in the stack frame. Also, each thread has its own stack.

void MethodA()
{
	int a1 = 2;
	MethodB(a1, 8);
}
void MethodB(int data, int valueToSum)
{
	int sum = data + valueToSum;
	MethodC(sum);
}
void MethodC(int value)
{
	Console.WriteLine(value);
}

Stack

Obviously, the representation of the stack is simplified to ease understanding and the return point is not the code line number.

In the above example, If we consider that the thread starts executing the first line of MethodA and there is a breakpoint on the last line of MethodC, when the thread reaches the breakpoint, the stack will look like the image above.

As we can see, it’s like a pile of boxes: in order to access the contents of a box that is under another, you must first remove all the boxes that are above it. And since all the content in a box cannot be accessed from the outside, it can never have a global scope. In addition, memory in the stack is self-maintained. When a method exits, the whole stack frame is thrown out and memory is freed automatically.

Furthermore, as we can see, stacks do more than store variables: since it stores the return point, it keeps track of the execution flow.

Variables stored in the stack are local by nature

We cannot access variables from anywhere else than the last container (i.e. the top of the stack). So, when a new stack frame is created, variables declared in others frames cannot be accessed anymore. It means that types stored onto the stack are local by nature. When you pass a parameter, it is copied to the next stack frame in order to be accessed.

Content copy isn’t usually a big deal, unless you are copying large value types. If you pass large structs as parameter between method calls inside a big loop or recursive operation, you make successive copies of the struct, which leads to high copying overhead and the risk of running into performance issues.

Summary

  • A stack frame grows as methods declare variables, and variables exist while the method that owns the frame is running.
  • Everything stored in a stack has local scope.
  • We do not need to worry about allocating and deallocating memory in the stack, because memory is managed automatically. When a method exits, the whole stack frame is disposed.
  • Reading, allocating and deallocating memory is faster in a stack when compared to a heap. The operations are much simpler and efficiently organized by the CPU.
  • Space is well managed by the CPU and memory is not fragmented.
  • A stack is bound to a thread, and each thread handles one stack.
  • There is a limit for the stack size, which is OS-dependent and defined when a thread is created.
  • Variables cannot be resized.

Heap

Everything that is stored in the Heap can be accessed at any time. However, unlike the stack, heap memory is managed by a garbage collector (GC), resource responsible for optimizing available memory space and deallocating memory that is no longer referenced.

public class SomeClass
{
	public SomeClass(int v)
	{
		this.value = v;
	}
	public int value { get; set; }
}

SomeClass a = new SomeClass(3);
SomeClass b = new SomeClass(1);

StackHeap

In above example, ‘a’ and ‘b’ are two instances of SomeClass. Because SomeClass is a reference type, it is stored in the Heap. On the other hand, ‘a’ and ‘b’ are declared on the stack as references to this heap memory. Special attention to the “value” property of SomeClass. Despite being an int32, it is also stored in the heap because it is part of the reference type content.

When SomeMethod finishes, the stack frame is deleted and there will be no pointer referencing the objects in the heap. The heap then will be storing orphaned elements, which will be disposed during the next execution of the GC.

Summary

  • No limit on memory size. Although the initial size is predefined on application startup, more space can be requested from the OS.
  • Variables stored in the heap can be accessed globally.
  • Memory is managed by the OS or the memory management library.
  • As blocks are allocated and deallocated, memory might become fragmented.
  • Heap is allocated for the application by the runtime and disposed when the application process exits.
  • Allocation and deallocation in heap is more expensive when compared to stack.

Where things are stored

Short and brief: Reference type is always stored in the heap. Value types are stored where they are declared. If we declare a value type inside a method, it is placed on the stack. If a value type is boxed or if it is a member of a reference type (as we can see in the example of SomeClass with a int property named value), it will be stored on the heap.

Value types

struct int16 int32 int64
uint16 uint32 uint64 byte
sbyte float double decimal
enum bool char

Reference types

  • class
  • instances of object
  • interface
  • delegate
  • string

Pointers

A pointer refers to the address of an item in the memory. “References types” are types that are accessed through a Pointer. They are managed by the CLR (Common Language Runtime) and take up space in memory as any other type.

Performance issues in Boxing operations

Boxing is the conversion of a value type to an object or interface. It is a computationally expensive process that should be avoided whenever possible.


int v = 10;
object obj = v; //boxing (implicit).
int v2 = (int)obj //unboxing (explicit)

StackHeap2

When a value type is boxed, a new object is allocated on the heap. This can be 20 times slower than a reference assignment. When unboxing, this operation can be 4 times slower than an assignment (msdn reference). Thus, iterating this operation in a loop can cause significant impact on performance.

Summary

  • Accessing boxed values is slower. First the pointer is accessed in the stack, then its content in the heap.
  • Boxed values take up more memory, because since it is stored in the heap, a pointer is needed in the stack (which takes 32B or 64B).
  • Unboxed variables are disposed with the stack frame. Boxed variables will be kept in memory until the GC is triggered.
  • Boxing operation requires significant CPU time, because space must be allocated in the heap and the value type must be copied to it. Unboxing requires some CPU time to copy the content of the heap back to the stack.

When (and Why) Recursive operations can be a problem

When a method is called, a new stack frame is added to the call stack. Thus, recursive operations add new stack frames for each recursive call. The more recursive operations are expected in a call, the more expensive the overhead is. Moreover, as explained before, each method call involves copying the parameters to the stack frame of the next called method. This might be a problem when the method parameter is a large value type, such as some structs, and there are successive calls to this method.

And what about the memory consumption?

Earlier in this article I said that we do not need to worry about allocating and deallocating memory in the stack, because memory is managed automatically. When a method exits, the whole stack frame is disposed. However, the stack grows and grows while operations are chained in recursive calls. Remember that each method retains its own stack frame with local variables, copies of parameters that were passed, and everything else that remains until the recursive operation is over and the method finishes.

If you go with recursion, use tail call whenever possible

Tail call occurs when the last statement of a method is a method call.

	public int TailRecursiveFunction(...)
	{
		//... some business logic here
		return TailRecursiveFuntion(...);
	}

Here is the magic: when the thread gets to the point of the tail call, there will be no need to add a new frame to the stack, copy values, etc. In this particular scenario, since most of the state of the current frame will no longer be necessary, many compilers optimize performance by reusing the current stack frame for the next method execution. Methods call in tail positions can be parsed to efficient “goto” statements, which is far more efficient than the traditional recursive operation.

Related readings

Cohesion and coupling: Principles of orthogonal, scalable design

The sections below are about enabling an application to evolve and be maintained with minimal risks and effort. It is not easy to interpret a lot of complex information derived from the organizational structure of a source code. By separating concerns (link), we minimize complexity. Different responsibilities are maintained in different places. Separation of concerns is about dividing to conquer, about modularity, encapsulation, defining layers, about individual pieces of code that are developed and maintained individually and independently.

Instead of worrying about different sections of the code, we need to focus on localized changes in the right (and expected) places.

Orthogonality

In geometry, Euclidean vectors are orthogonal if they are perpendicular, i.e., form a right angle. Even if these vectors grow infinitely in space, they will never cross. Well designed softwares are orthogonal. Their components can grow or be modified without affecting other components.

Orthogonal design is built upon two pillars: cohesion and coupling. These concepts form the basis of software design. However, although well known, they are constantly ignored or misunderstood.

Orthogonal

Coupling

Coupling (also known as Dependency) is a degree to which one program unit (e.g. a class, module, subsystem) relies on other units. It is a measure of strength of the interconnections between elements, which should be minimized.

We want elements that are independent of each other. In other words, we want to develop applications that exhibit loose (rather than tight) coupling.

However, since parts need to communicate among themselves, we do not have completely independent modules. As interconnections grows between the parties involved, one module will need more information about the other, increasing the dependency between them.

CohesionNCoupling

The code below is a sample of content coupling. It occurs when one component depends (by modifying or relying) on internal data or behavior of another component. Changing elementary structure or behavior of one component leads to refactoring of other components.

	public class LoggedUsersController
	{
		public Dictionary<int, DateTime> LastUserLoginDateAndTime { get; set; }
		public List Users { get; set; }
	}

	public class BusinessRule
	{
		private LoggedUsersController loggedUsers =
					new LoggedUsersController();

		public User RegisterUserLogin(int userId)
		{
			User user = getUserFromDatabase(userId);

			if (loggedUsers.Users.Exists(u => u.Id == userId))
				throw new UserAlreadyExistsException();

			loggedUsers.Users.Add(user);

			if (!loggedUsers.LastUserLoginDateAndTime.ContainsKey(user.Id))
				loggedUsers.LastUserLoginDateAndTime.Add(user.Id,DateTime.Now);
			else
				loggedUsers.LastUserLoginDateAndTime[user.Id] = DateTime.Now;

			return user;
		}
	}

Since RegisterUserLogin performs direct access to inner content of LoggedUsersController, it contributes to a tighter coupling from the caller to the behavior of LoggedUsers. A better approach is to isolate the behavior inside LoggedUsersController.

	public class LoggedUsersController
	{
		private Dictionary<int, DateTime> LastUserLoginDateAndTime;
		private List Users;

		public void AddUser(User user)
		{
			if (this.Users.Exists(u => u.Id == user.Id))
				throw new UserAlreadyExistsException();

			this.Users.Add(user);
			if (!this.LastUserLoginDateAndTime.ContainsKey(user.Id))
				this.LastUserLoginDateAndTime.Add(user.Id, DateTime.Now);
			else
				this.LastUserLoginDateAndTime[user.Id] = DateTime.Now;
		}
	}

	public class BusinessRule
	{
		private LoggedUsersController loggedUsers =
					new LoggedUsersController();

		public User RegisterUserLogin(int userId)
		{
			User user = getUserFromDatabase(userId);
			loggedUsers.AddUser(user);
		}
	}

Now, the BusinessRule class is not tied to the implementation of LoggedUsersController. Instead, it is interested only in the responsibilities of its interface. It does not know details about the implementation of LoggedUsersControllers anymore, which contributes to a looser coupling. Moreover, all the logic related to the data of LoggedUsersControllers is handled closer, eliminating the inappropriate intimacy, which increases cohesion of the class.

Types of coupling

The types are listed in order of the highest to the lowest coupling.

  • Content coupling (worst) occurs when one components depends on internal data or behavior of another component. This is the worst degree of coupling, since changes to one component will almost certainly require modification to others.
  • Common coupling occurs when modules share common data, like global variables. As we all know, globals are evil. Changing the shared resource implies changing all the modules using it.
  • Control coupling occurs when one service or module knows something about the implementation of another and passes information to control its logic.
  • Stamp coupling occurs when modules share objects and use only a part of it. Sharing more data than what was needed allows the called module to access more data than it really needs.
  • Data coupling occurs when one modules or services share data between each other. Data passed as parameter to a function call is included in this type of coupling. Although services with many parameters are a bad sign of design, well handled data coupling is preferable when compared to other forms of coupling.
  • No coupling (best) – No intersection between modules.

If two or more components need to communicate, they should exchange as little information as possible.

Cohesion

Cohesion is a measure of responsibility and focus of an application component. It is the degree to which the elements of a module belong together, which should be maximized.

We want strong-related responsibilities in a single component. Thus, we want to develop highly cohesive code.

In a highly cohesive code, all data, methods and responsibilities are kept close. Services tend to be similar in many aspects.

A simple and intuitive way to test the cohesion of a class is to check if all the data and methods that it contains have a close relationship with the class name. Considering this, you should be aware that generic class names tend to generate cohesion problems, because they can get many different responsibilities over time. In fact, classes that have a vague name might one day become god objects (an anti-pattern that defines an “all-knowing” object that contains tons of features and services with many different purposes, which dramatically compromises cohesion and coupling of components).

The Law of Demeter: Talk only to your closest friends

Demeter

Also known as Principle of Least Knowledge or just LoD, the law of Demeter governs the interconnection between components. It reinforces loose coupling and high cohesion by stating that your object-oriented entities should only be talking to their closest friends.

The Law of Demeter states that a method of a given object should only access methods and accessors belonging to:

  • The object Itself
  • Parameters passed in to the method
  • Any object created within the method
  • Direct component elements of the object

Long chaining  of accessors and methods is a sign of bad design.

For example:

	public class BusinessRule
	{
		public Bid GetCarAuctionBestBid(int carId)
		{
			//... some logic here
			return bidsLogic.Bids.AllBids.GetBestBid(b => b.CarId = carId);
		}
	}

Even if you need a particular information that is at the end of the chain, digging out the methods and accessors yourself is a terrible idea.

	public class BusinessRule
	{
		public Bid GetCarAuctionBestBid(int carId)
		{
			//... some logic here
			return bidsLogic.GetBestBid(carId);
		}
	}

Now you are only talking to your closest friend. “bidsLogic” resolves its properties internally and expose services that are explicitly needed by others components. There is another thing going on here. The law of Demeter is not just about chaining. When you do not have to worry about navigating accessor and methods (i.e., when you have what you need by just calling a method or property of a nearby object), we say that you are telling the object what to do. The principle of least knowledge is closely related to the principle “Tell, don’t ask”!

Tell, Don’t Ask

The problem is not to use “get” accessor to understand a behavior of an object. The problem is to make decisions based on that. You do not want to ask the object about its inner state, make decisions about that state and then perform some dark operation. Object-oriented programming tells objects what to do.

The sample below is a example of code that asks too much. The code makes decisions about the state of the bill by adding prices contained in the list of Items assuming that the sum represents the total value of the bill.

	public class BusinessRule
	{
		public double CalculateDinnersCost(List<Bill> bills)
		{
			if (bills == null) return 0;

			double totalCost = 0;
			foreach (var bill in bills)
				if (bill.Items != null ? bill.Items.Count : 0)
					foreach (var item in bill.Items)
						totalCost += item.Price;

			return totalCost;
		}
	}
	public class Bill
	{
		public List<Item> Items { get; set; }

		public void AddItem(Item item) {...}
		public void Remove(int itemId) {...}
	}

Instead of asking that much, if the state of an object can be inferred by examining closer accessors and methods, we should consider relocating the logic inside the right object. This is the “Tell, don’t ask” principle: instead of developing procedural code, we tell the objects what to do.

	public class BusinessRule
	{
		public double CalculateDinnersCost(List<Bill> bills)
		{
			if (bills == null) return 0;

			double totalCost = 0;
			foreach (var bill in bills)
				totalCost += bill.CalculateTotalCost();

			return totalCost;
		}
	}
	public class Bill
	{
		public List<Item> Items { get; set; }

		public void AddItem(Item item) {...}
		public void Remove(int itemId) {...}

		public double CalculateTotalCost()
		{
			double totalCost = 0;
			if (this.Items != null ? this.Items.Count : 0)
				foreach (var item in this.Items)
					totalCost += item.Price;

			return totalCost;
		}
	}

Single Responsibility Principle

The Single Responsibility Principle is straightforward:

Every class should have a single responsibility and have one, and only one, reason to change.

Imagine an entity called MessagesRegister, which is responsible for registering alerts and notifications.

This is how MessagesRegister works:

  • It reads a configuration file
  • It processes some business logic to create notifications.

This class can be changed for two reasons. First, the source of the configuration can evolve into something more elaborate over time (instead of XML, a configuration through graphical user interface). Furthermore, the actual processing rule might change.

It is a bad design decision to keep together two pieces of information that are changed for different reasons. The cohesion (i.e., focus and responsibility of the class) is reduced and the principle of separation of concerns is violated.

Putting it all together

So… Where should I put this code? While coupling measures the degree of dependence between different entities of the system, cohesion is a measure of how focused are the responsibilities of a given entity.

By following these principles, we can enumerate some major achievements in our design:

  • Maintenance is child’s play: we keep together things that need to be maintained together and things that are not directly related can be changed without affecting other components.
  • The code is readable and algorithms become more self-documenting.
  • Classes have well-defined responsibilities and code duplication is drastically reduced.

CohesionNCoupling2

Related readings