Quality assurance in software development: When should you start the testing process?

“A procedure intended to establish the quality, performance, or reliability of something, especially before it is taken into widespread use” — Test definition at the Oxford dictionary.

Customers don’t like to deal with defected software. They want their demands to be delivered with high quality and in the shortest possible timebox. That testing phase that starts only a few days before releasing the next version of the product might not (and it probably won’t!) be able to ensure product quality.

Below is an example of a typical SDLC:

  • Planning – Goals and objectives are defined, requirements are gathered, costs and resources are estimated and feasibility of alternative solutions is analyzed.
  • Analysis & Design – Features are detailed taking into account the user needs. Wireframes and business rules are defined. Other relevant documents are attached.
  • Construction – Code is written for real in this phase.
  • Testing – Pieces are put together in a test environment to verify the system. Different types of tests can be performed, but that is a conversation for another day.

SDLC1

This cycle is exposed to some problems. As we can see, each activity starts at the end of the previous phase. First, let’s think about bugs found during tests: many of them exist in the system since design or even planning phase, and will probably be much more expensive to fix these bugs after development is completed than it would be if the problems were identified in previous steps. Furthermore, in a predictive planning, a tight deadline combined with a delay in completion of the construction phase can reduce the available time for testing, which might significantly undermine quality of the product.

It is observed that most of the errors found in the testing phase were introduced during requirements gathering or design.

Why testing should start early in the software development life cycle?

The process starts with requirements, and as the project evolves inside the SDLC more efforts are allocated to create or modify the solution, more people are involved and the cost of the project increases. Bugs detected at the end of the process tend to require significantly more effort to be fixed. The sooner the bug is identified, the cheaper it will be to fix the problem. In Software Testing, Ron Patton says that the cost of fixing a bug can be represented by something around a logarithmic function, where the cost can increase by more than 10 times as the project progresses through the phases of the SDLC.

For instance, a bug identified during conception costs something around zero, but when this same bug is found only after implementation or test, the average cost of repair can get to something between 10 and 1000 times more than in the previous step. When customers find this bug in production environment, the cost of the problem considers all side effects related to it and. That is where things can get serious.

Bugs

Main advantages of testing in earlier phases

  • Many problems are introduced into the system during planning or design. Requirements testing anticipate future problems at a significantly lower cost.
  • Since the testing process is involved with all phases of the SDLC, Management will not feel like testing is the bottleneck to release the product.
  • Testers will be more familiar with the software, as they are more involved with the evolution of the product in earlier phases.
  • Test cases written during requirements and shared with the Dev team before the construction phase can help developers to think outside the box and evaluate more chances of failure in their code.
  • The test environment can be prepared in advance, anticipating risks and preventing delays.
  • The risk of having a short time for testing is greatly reduced, increasing test coverage and types of tests performed.
  • Involving quality assurance in all phases of the SDLC helps creating a ‘quality culture’ inside the organization.

Cartoontester01

Defect prevention: Quality is built in, not added on

Inspection does not improve the quality, nor guarantee quality. Inspection is too late. The quality, good or bad, is already in the product. As Harold F. Dodge said, “You cannot inspect quality into a product.” — Out of the Crisis, pg. 29

Start a test plan at the beginning of the project and identify test requirements. Test requirements are not test cases, as they do not describe the data being used for tests. Data is irrelevant at this level. These tests should be used as input documents for generating Test Cases. Testing should start at the planning phase and it should continue throughout analysis and design phase. At the end of the design phase, integration and unit test cases should be completed.

  • “Validate that you can insert an entry to the repository”
  • “Validate that you can’t insert an entry when the repository already contains one with the same unique identification”
  • “Validate that you can’t insert an entry when the repository reaches 300 entries”
  • “Validate that you can insert an entry to the repository when it is empty (initial test)”
  • “Validate that the full repository can be loaded to the screen in 2 seconds”

SDLC2

Verify that requirements are clear and consistent. It is important to eliminate ambiguities in interpretation caused by some general terms. Some customers may use terms that have different meanings, which compromises analysis of the document.

Discover missing requirements. In many cases, project designers have no clear understanding of modules and assume certain requirements. Requirements should cover all aspects of the system without any assumptions.

Ask the client about the need of requirements that are not related to the project goals. It is important that these requirements are identified and the client asked if it is really necessary. A requirement can be considered irrelevant when its absence causes no significant impact on the project goal.

Some others tests include (Source: link):

  • Does the specification contain a definition of the meaning of every essential subject matter term within the specification?
  • Is every reference to a defined term consistent with its definition?
  • Is the context of the requirements wide enough to cover everything we need to understand?
  • Is every requirement in the specification relevant to this system?
  • Does the specification contain solutions posturing as requirements?
  • Is the stakeholder value defined for each requirement?
  • Is each requirement uniquely identifiable?
  • Is each requirement tagged to all parts of the system where it is used? For any change to requirements, can you identify all parts of the system where this change has an effect?

Defect detection

The tester should report the defect detection efficiency on completion of the project. It measures the process efficiency within the SDLC. It helps to understand and track the phases of the SDLC that are generating more problems and compromising product quality.

The role of developers in defect prevention

Developers must be aligned with the expectations regarding the requirements. In many cases, in order to keep up with the schedule, developers do not invest enough time to review the specification and often ignore important documents or misunderstand some requirements. This kind of ambiguity generates more bugs to be identified at the end of the project and the cost of repair will end up more expensive.

Developers should also create unit tests and review code (and/or have their code reviewed) before commits. Together, these small daily activities make great contribution to defect prevention during construction phase.

In addition, some types of tests certainly worth the consideration of being automated, and an automation team would get involved with the process. The execution of automated tests (UI, load, performance, unit, etc.) can be strongly linked to commits of developers during the construction phase (see Continuous Integration), but that’s a topic for another conversation.

Cartoontester02

Putting it all together

Defect prevention is an important investment with short-term return. The joint actions not only increase product quality by anticipating issues, but also reduce maintenance cost of the product, increase overall productivity and reduce development time of the project. As a consequence of this combination of factors, customer satisfaction increases, as well as the reliability and reputation of the organization.


A Typescript library for Google Knowledge Graph (Freebase)

Have you ever seen a little box that eventually appears (depends on the query) as part of Google’s search results? Type “Paris” and see the magic in action. The structured content that is displayed is part of the Google Knowledge Graph, a structured database that enhances the search engine.

Google Search - Paris

Google Knowledge Graph

Google Knowledge Graph is a huge database that contains structured data about almost everything. The intention is to enhance the search results with content that includes structured details of a particular topic. You can read more about Google Knowledge Graph at the official blog and Wikipedia.

We know that keywords are the components that form the basis of SEO, but what might caught your attention is that the Knowledge Graph seems to be very well structured, in a way that could not have been built from search terms entered by users. In fact, the knowledge base is supported by Freebase, an open database developed by Metaweb. This company was acquired by Google in 2010.

Freebase is a massive knowledge base. It is structured, searchable, editable and open-licensed (free for commercial and non-commercial use under Creative Commons). It is an ambitious initiative to create a base of semantic data on virtually everything. It is collaborative and was built with the information of many sources, including individual contribution of the online community.

The Knowledge Graph is open, powerful and Google provides an API for remote access

API documentation can be found on Google Developers. If you want direct access to the database, a data dump is available for download here. As we can see, this knowledge base creates endless possibilities for new promising applications.

Some important services:

  • search – useful for recommendation or displaying auto-complete results.
  • topic – useful for displaying a structured article about a specific subject.

You can play with these services using a web interface, here and here.

A Typescript library for Google Knowledge Graph (Freebase)

I developed a Typescript library that wraps the Freebase API and expose some services. The library contains some definitions to provide type checking at compile time and intellisense. I also published the resulting Javascript. The project is available under MIT license (open source) and is available for download on Github.

Freebase code - 01

Freebase code - 02

I hope it is useful for someone. ;)


Developing secure web applications: XSS attack, the Confused Deputy and Over-Posting

The reality is that securing applications is not as much fun as we would like it to be. Nobody comes and says “Wow! Thumbs up for securing my personal data!”. However, security is undoubtedly important in many contexts and cannot be underestimated. We do not want to pass through an embarrassing position caused by a security breach.

In this article, I cover XSS, XSRF and Over-Posting, a fraction of the most common web attacks, and explain how to protect your applications against them.

Some web security principles

  • Implement security in depth: the Web tier responsible for receiving HTTP requests should not be the only layer to perform security checks. Each subsystem must take responsibility for their own security, disregarding the quality of the security checks of the upper layers.
  • Never trust user input: Never assume that you do not need to perform a security check just because it has already been done elsewhere. Moreover, validations in JavaScript can improve the user experience by avoiding unnecessary http requests, but never assume that these validations on the client side are secure. An attacker can bypass client validation without any difficulty.
  • Reduce surface area: Minimize the amount of exposed operations and restrict the inputs of each action to contain only the data required to process the action. Do not expose to the end user exception details or information about the database schema.
  • Enforce the principle of Least Privilege: Perform operations using only the privileges required for the task. If for some reason you need to elevate permissions, grant them but remove when these permissions are no longer needed.
  • Disable unnecessary features: Platforms can provide security flaws. Uninstall features that are not relevant to the execution of your application.
  • Evaluate a model of authentication and authorization that fits well to your application: Authentication is the process of identifying the user who accesses the application. Authorization is the process of verifying what operations the authenticated user can perform. There are different strategies for authentication and authorization. Check the one that best suits your application.

Defending your code against the most common attacks

Users of your site are likely to discover bugs. Some of these bugs reveal security holes. And when an attacker discovers one of these gaps, we have a problem.

Cross-Site Scripting (XSS)

XSS is one of the most common web attacks. According to WhiteHat Security, 40% of the websites has at least one breach for Cross-Site Scripting. In XSS, we inject some client script code to the remote server through user inputs or query strings.

Open http://www.insecurelabs.org/Talk/Details/6 to start playing with XSS. Pay attention to the content written inside the input of the form. Everything that you write in a comment will be rendered after the form is submitted. In a real application, a malicious script inside a comment could inject code to the page and every user that opens the browser would run that script.

XSS01

This is the result

XSS02

This is a classical XSS attack. My posted comment containing a bad script will be rendered within the list of comments of the page. When someone opens the page, the script is executed. In this particular case, the attacker is stealing the cookies related to the domain where the script was injected. Open the article Session hijacking (Wikipedia) to understand the implications of exposing your cookies.

In the words of Jaff Atwood (CodingHoror.com), article Protecting Your Cookies: HttpOnly:

Imagine, then, the surprise of my friend when he noticed some enterprising users on his website were logged in as him and happily banging away on the system with full unfettered administrative privileges.

Preventing XSS

Never trust user input. Encode everything. Instead of displaying content as-is, use HTML encode to render content.

In ASP.NET Web Forms, the code below will not be encoded.

  <%= Model.Comment %> 

Instead, use one of these approaches:

 <%: Model.Comment %> 

 

 <%= Html.Encode(Model.Comment) %> 

Razor encodes by default. The sample below gives an encoded output:

 @Model.Comment 

In ASP.NET MVC, if you want to write raw content as-is (without any encoding), you should write it explicitly and decorate your action with a [ValidateInput(false)] attribute. In this case, security concerns are at your own risk.

 @Html.Raw(Model.HtmlContent) 
[ValidateInput(false)]
public ViewResult Save(FormDataWithRawContent form)
{
...
}

Preventing Cookie Theft – Use HttpOnly and Require SSL

In the Web.config file, write:

<httpCookies domain="" httpOnlyCookies="true" requireSSL="true" />

The httpOnlyCookies attribute will prevent cookies from being accessed from client scripts. The requireSSL attribute will prevent cookies from being sent back to the server when the connection is not secure with SSL. This configuration will stop most XSS attacks and should be used whenever possible.

Cross-Site Request Forgery (XSRF) – The confused deputy

Cross-site request forgery is an attack where the attacker creates a page that includes a form that posts to the attacked site.

<form id="someform"
action="http://attackedTargetSite/Entity/Update"
method="post">

<input type="text" />
<input type="text" />

If the user who submitted the form is authenticated on the target site, the target site will receive and process the post without presenting any complications related to authentication, since the action is being performed by an authenticated user.

The confused deputy attack

XSRF is a form of confused deputy attack.

If an attacker want to perform some administrative actions on a web application that he does not have the proper privileges, a feasible approach is to confuse an user to do these things for him. In the case of an XSRF attack, the confusing deputy is the user’s browser. Here (Haacked.com) we can see a nice practical example about XSRF attack.

A confused deputy is a computer program that is innocently fooled by some other party into misusing its authority. It is a specific type of privilege escalation. In information security, the confused deputy problem is often cited as an example of why capability-based security is important.

From Wikipedia

Preventing XSRF

Reduce surface area. Minimize the amount of actions exposed. Furthermore, create a hidden HTML input field that stores a token generated by your server and validate if a submitted token is valid. This token can be stored in a session variable or cookie.

ASP.NET MVC includes some helpers to prevent against this kind of attack. It generates a anti forgery token inside the form and validate this token against the server when the form is submitted:

@using (Html.BeginForm(...))
{
@Html.AntiForgeryToken("salt")
...
}

 

[ValidateAntiForgeryToken(Salt="salt")]
public ViewResult Update()
{
...
}

Thus, if the source of the request is unknown (i.e. the antiforgery token does not exist or it is not consistent), the action will not be triggered and an error will be thrown to the client application.

Known limitations

  • Users must have cookies enabled, or requests will be rejected.
  • This method works with POST, but not with GET. This should not be a very big problem, since GET requests should only read information and never change the state of the database.
  • XSS holes create possibilities for cookies hijacking and access to the antiforgery token.

Over-Posting

ASP.NET MVC maps model properties in GET and POST operations by using some naming conventions. However, this convenience introduces a security flaw: the attacker can populate properties of the model even if they are not displayed in the form.

Consider a model PlaceReview bound to an action.


public class PlaceReview
{
	public int ReviewID { get; set; } // PK
	public int PlaceID { get; set; } // FK
	public Place Place { get; set; } // FK
	public int NumberOfLikes { get; set; }
	public string UserName { get; set; }
	public string UserComment { get; set; }
}

Now consider that the action is only interested on the UserName and UserComment properties of the model.


<ul>
		<li>
			@Html.LabelFor(m => m.UserName)
			@Html.TextBoxFor(m => m.UserName)
		</li>
		<li>
			@Html.LabelFor(m => m.UserComment)
			@Html.TextBoxFor(m => m.UserComment)
		</li>
	</ul>

A malicious user can manipulate the querystring or post data to add information and change other properties of the model. For instance, he could add “NumberOfLikes = 5000″ and the corresponding property would be populated accordingly in the server side application.

Preventing Over-Posting

You can use a BindAttribute on the model class or action parameter to explicitly control the properties that should be mapped by the model binder.

[System.Web.Mvc.Bind(Include="UserName, UserComment")]
public class PlaceReview
{
	public int ReviewID { get; set; } // PK
	public int PlaceID { get; set; } // FK
	public Place Place { get; set; } // FK
	public int NumberOfLikes { get; set; }
	public string UserName { get; set; }
	public string UserComment { get; set; }
}

A similar approach involves calling UpdateModel or TryUpdateModel within the Action.

UpdateModel(placeReview, "PlaceReview",
new string[] { "UserName", "UserComment" });

A better approach is not to use data mapping to represent view model and create specific classes that contain only the properties required for the action and nothing else.

public class PlaceReviewModel
{
	public string UserName { get; set; }
	public string UserComment { get; set; }
}

This last approach significantly reduces security holes related to over-posting and eliminates a problem of stamp coupling.

Stamp coupling occurs when modules share objects and use only a part of it. Sharing more data than what was needed allows the called module to access more data than it really needs.

From Cohesion and coupling: Principles of orthogonal, scalable design

Putting it all together

No application is totally secure, but there are some best practices that can be used to avoid several vulnerabilities. We talked about XSS, XSRF and Over-Posting. Fortunately, there are enough resources available and ready to protect web applications from dangerous attacks that occur frequently out there.

Related readings


The absolute bare minimum every programmer should know about memory management

Advances in technology come with increased complexity. That is just the nature of how technology evolves. However, things get simpler for the users as they are getting more demanding about alignment with their intentions. In “The Invisible Computer: Why Good Products Can Fail, the Personal Computer Is So Complex, and Information Appliances Are the Solution” Donald Normam says:

Most technology goes through cycles of development and change in both internal and external complexity. Often, the very first device is simple, but crude. As the device undergoes the early stages of development, its power and efficiency improve, but so does its complexity. As the technology matures, however, simpler, more effective ways of doing things are developed, and the device becomes easier to use, although usually by becoming more complex inside.

The memory management of the .NET framework is a good example of how complexity can be hidden in a layer of simplicity. However, when we assume that we don’t need to worry about how memory is managed and how garbage collection works, focusing on developing skills in syntax and classes, we risk to take bad design decisions that leads to performance and memory issues. Furthermore, the knowledge about how memory is managed helps us understand how the variables are behaving in our application. In this article I cover the basics about this topic.

Stack

The stack stores the state of a method. Every time a method is called, the .NET creates a container (i.e. stack frame) that stores parameters, local variables and the address of the return point. When the method completes, the frame is removed and the thread continues in the return point defined in the stack frame. Also, each thread has its own stack.

void MethodA()
{
	int a1 = 2;
	MethodB(a1, 8);
}
void MethodB(int data, int valueToSum)
{
	int sum = data + valueToSum;
	MethodC(sum);
}
void MethodC(int value)
{
	Console.WriteLine(value);
}

Stack

Obviously, the representation of the stack is simplified to ease understanding and the return point is not the code line number.

In the above example, If we consider that the thread starts executing the first line of MethodA and there is a breakpoint on the last line of MethodC, when the thread reaches the breakpoint, the stack will look like the image above.

As we can see, it’s like a pile of boxes: in order to access the contents of a box that is under another, you must first remove all the boxes that are above it. And since all the content in a box cannot be accessed from the outside, it can never have a global scope. In addition, memory in the stack is self-maintained. When a method exits, the whole stack frame is thrown out and memory is freed automatically.

Furthermore, as we can see, stacks do more than store variables: since it stores the return point, it keeps track of the execution flow.

Variables stored in the stack are local by nature

We cannot access variables from anywhere else than the last container (i.e. the top of the stack). So, when a new stack frame is created, variables declared in others frames cannot be accessed anymore. It means that types stored onto the stack are local by nature. When you pass a parameter, it is copied to the next stack frame in order to be accessed.

Content copy isn’t usually a big deal, unless you are copying large value types. If you pass large structs as parameter between method calls inside a big loop or recursive operation, you make successive copies of the struct, which leads to high copying overhead and the risk of running into performance issues.

Summary

  • A stack frame grows as methods declare variables, and variables exist while the method that owns the frame is running.
  • Everything stored in a stack has local scope.
  • We do not need to worry about allocating and deallocating memory in the stack, because memory is managed automatically. When a method exits, the whole stack frame is disposed.
  • Reading, allocating and deallocating memory is faster in a stack when compared to a heap. The operations are much simpler and efficiently organized by the CPU.
  • Space is well managed by the CPU and memory is not fragmented.
  • A stack is bound to a thread, and each thread handles one stack.
  • There is a limit for the stack size, which is OS-dependent and defined when a thread is created.
  • Variables cannot be resized.

Heap

Everything that is stored in the Heap can be accessed at any time. However, unlike the stack, heap memory is managed by a garbage collector (GC), resource responsible for optimizing available memory space and deallocating memory that is no longer referenced.

public class SomeClass
{
	public SomeClass(int v)
	{
		this.value = v;
	}
	public int value { get; set; }
}

SomeClass a = new SomeClass(3);
SomeClass b = new SomeClass(1);

StackHeap

In above example, ‘a’ and ‘b’ are two instances of SomeClass. Because SomeClass is a reference type, it is stored in the Heap. On the other hand, ‘a’ and ‘b’ are declared on the stack as references to this heap memory. Special attention to the “value” property of SomeClass. Despite being an int32, it is also stored in the heap because it is part of the reference type content.

When SomeMethod finishes, the stack frame is deleted and there will be no pointer referencing the objects in the heap. The heap then will be storing orphaned elements, which will be disposed during the next execution of the GC.

Summary

  • No limit on memory size. Although the initial size is predefined on application startup, more space can be requested from the OS.
  • Variables stored in the heap can be accessed globally.
  • Memory is managed by the OS or the memory management library.
  • As blocks are allocated and deallocated, memory might become fragmented.
  • Heap is allocated for the application by the runtime and disposed when the application process exits.
  • Allocation and deallocation in heap is more expensive when compared to stack.

Where things are stored

Short and brief: Reference type is always stored in the heap. Value types are stored where they are declared. If we declare a value type inside a method, it is placed on the stack. If a value type is boxed or if it is a member of a reference type (as we can see in the example of SomeClass with a int property named value), it will be stored on the heap.

Value types

struct int16 int32 int64
uint16 uint32 uint64 byte
sbyte float double decimal
enum bool char

Reference types

  • class
  • instances of object
  • interface
  • delegate
  • string

Pointers

A pointer refers to the address of an item in the memory. “References types” are types that are accessed through a Pointer. They are managed by the CLR (Common Language Runtime) and take up space in memory as any other type.

Performance issues in Boxing operations

Boxing is the conversion of a value type to an object or interface. It is a computationally expensive process that should be avoided whenever possible.


int v = 10;
object obj = v; //boxing (implicit).
int v2 = (int)obj //unboxing (explicit)

StackHeap2

When a value type is boxed, a new object is allocated on the heap. This can be 20 times slower than a reference assignment. When unboxing, this operation can be 4 times slower than an assignment (msdn reference). Thus, iterating this operation in a loop can cause significant impact on performance.

Summary

  • Accessing boxed values is slower. First the pointer is accessed in the stack, then its content in the heap.
  • Boxed values take up more memory, because since it is stored in the heap, a pointer is needed in the stack (which takes 32B or 64B).
  • Unboxed variables are disposed with the stack frame. Boxed variables will be kept in memory until the GC is triggered.
  • Boxing operation requires significant CPU time, because space must be allocated in the heap and the value type must be copied to it. Unboxing requires some CPU time to copy the content of the heap back to the stack.

When (and Why) Recursive operations can be a problem

When a method is called, a new stack frame is added to the call stack. Thus, recursive operations add new stack frames for each recursive call. The more recursive operations are expected in a call, the more expensive the overhead is. Moreover, as explained before, each method call involves copying the parameters to the stack frame of the next called method. This might be a problem when the method parameter is a large value type, such as some structs, and there are successive calls to this method.

And what about the memory consumption?

Earlier in this article I said that we do not need to worry about allocating and deallocating memory in the stack, because memory is managed automatically. When a method exits, the whole stack frame is disposed. However, the stack grows and grows while operations are chained in recursive calls. Remember that each method retains its own stack frame with local variables, copies of parameters that were passed, and everything else that remains until the recursive operation is over and the method finishes.

If you go with recursion, use tail call whenever possible

Tail call occurs when the last statement of a method is a method call.

	public int TailRecursiveFunction(...)
	{
		//... some business logic here
		return TailRecursiveFuntion(...);
	}

Here is the magic: when the thread gets to the point of the tail call, there will be no need to add a new frame to the stack, copy values, etc. In this particular scenario, since most of the state of the current frame will no longer be necessary, many compilers optimize performance by reusing the current stack frame for the next method execution. Methods call in tail positions can be parsed to efficient “goto” statements, which is far more efficient than the traditional recursive operation.

Related readings

Cohesion and coupling: Principles of orthogonal, scalable design

The sections below are about enabling an application to evolve and be maintained with minimal risks and effort. It is not easy to interpret a lot of complex information derived from the organizational structure of a source code. By separating concerns (link), we minimize complexity. Different responsibilities are maintained in different places. Separation of concerns is about dividing to conquer, about modularity, encapsulation, defining layers, about individual pieces of code that are developed and maintained individually and independently.

Instead of worrying about different sections of the code, we need to focus on localized changes in the right (and expected) places.

Orthogonality

In geometry, Euclidean vectors are orthogonal if they are perpendicular, i.e., form a right angle. Even if these vectors grow infinitely in space, they will never cross. Well designed softwares are orthogonal. Their components can grow or be modified without affecting other components.

Orthogonal design is built upon two pillars: cohesion and coupling. These concepts form the basis of software design. However, although well known, they are constantly ignored or misunderstood.

Orthogonal

Coupling

Coupling (also known as Dependency) is a degree to which one program unit (e.g. a class, module, subsystem) relies on other units. It is a measure of strength of the interconnections between elements, which should be minimized.

We want elements that are independent of each other. In other words, we want to develop applications that exhibit loose (rather than tight) coupling.

However, since parts need to communicate among themselves, we do not have completely independent modules. As interconnections grows between the parties involved, one module will need more information about the other, increasing the dependency between them.

CohesionNCoupling

The code below is a sample of content coupling. It occurs when one component depends (by modifying or relying) on internal data or behavior of another component. Changing elementary structure or behavior of one component leads to refactoring of other components.

	public class LoggedUsersController
	{
		public Dictionary<int, DateTime> LastUserLoginDateAndTime { get; set; }
		public List Users { get; set; }
	}

	public class BusinessRule
	{
		private LoggedUsersController loggedUsers =
					new LoggedUsersController();

		public User RegisterUserLogin(int userId)
		{
			User user = getUserFromDatabase(userId);

			if (loggedUsers.Users.Exists(u => u.Id == userId))
				throw new UserAlreadyExistsException();

			loggedUsers.Users.Add(user);

			if (!loggedUsers.LastUserLoginDateAndTime.ContainsKey(user.Id))
				loggedUsers.LastUserLoginDateAndTime.Add(user.Id,DateTime.Now);
			else
				loggedUsers.LastUserLoginDateAndTime[user.Id] = DateTime.Now;

			return user;
		}
	}

Since RegisterUserLogin performs direct access to inner content of LoggedUsersController, it contributes to a tighter coupling from the caller to the behavior of LoggedUsers. A better approach is to isolate the behavior inside LoggedUsersController.

	public class LoggedUsersController
	{
		private Dictionary<int, DateTime> LastUserLoginDateAndTime;
		private List Users;

		public void AddUser(User user)
		{
			if (this.Users.Exists(u => u.Id == user.Id))
				throw new UserAlreadyExistsException();

			this.Users.Add(user);
			if (!this.LastUserLoginDateAndTime.ContainsKey(user.Id))
				this.LastUserLoginDateAndTime.Add(user.Id, DateTime.Now);
			else
				this.LastUserLoginDateAndTime[user.Id] = DateTime.Now;
		}
	}

	public class BusinessRule
	{
		private LoggedUsersController loggedUsers =
					new LoggedUsersController();

		public User RegisterUserLogin(int userId)
		{
			User user = getUserFromDatabase(userId);
			loggedUsers.AddUser(user);
		}
	}

Now, the BusinessRule class is not tied to the implementation of LoggedUsersController. Instead, it is interested only in the responsibilities of its interface. It does not know details about the implementation of LoggedUsersControllers anymore, which contributes to a looser coupling. Moreover, all the logic related to the data of LoggedUsersControllers is handled closer, eliminating the inappropriate intimacy, which increases cohesion of the class.

Types of coupling

The types are listed in order of the highest to the lowest coupling.

  • Content coupling (worst) occurs when one components depends on internal data or behavior of another component. This is the worst degree of coupling, since changes to one component will almost certainly require modification to others.
  • Common coupling occurs when modules share common data, like global variables. As we all know, globals are evil. Changing the shared resource implies changing all the modules using it.
  • Control coupling occurs when one service or module knows something about the implementation of another and passes information to control its logic.
  • Stamp coupling occurs when modules share objects and use only a part of it. Sharing more data than what was needed allows the called module to access more data than it really needs.
  • Data coupling occurs when one modules or services share data between each other. Data passed as parameter to a function call is included in this type of coupling. Although services with many parameters are a bad sign of design, well handled data coupling is preferable when compared to other forms of coupling.
  • No coupling (best) – No intersection between modules.

If two or more components need to communicate, they should exchange as little information as possible.

Cohesion

Cohesion is a measure of responsibility and focus of an application component. It is the degree to which the elements of a module belong together, which should be maximized.

We want strong-related responsibilities in a single component. Thus, we want to develop highly cohesive code.

In a highly cohesive code, all data, methods and responsibilities are kept close. Services tend to be similar in many aspects.

A simple and intuitive way to test the cohesion of a class is to check if all the data and methods that it contains have a close relationship with the class name. Considering this, you should be aware that generic class names tend to generate cohesion problems, because they can get many different responsibilities over time. In fact, classes that have a vague name might one day become god objects (an anti-pattern that defines an “all-knowing” object that contains tons of features and services with many different purposes, which dramatically compromises cohesion and coupling of components).

The Law of Demeter: Talk only to your closest friends

Demeter

Also known as Principle of Least Knowledge or just LoD, the law of Demeter governs the interconnection between components. It reinforces loose coupling and high cohesion by stating that your object-oriented entities should only be talking to their closest friends.

The Law of Demeter states that a method of a given object should only access methods and accessors belonging to:

  • The object Itself
  • Parameters passed in to the method
  • Any object created within the method
  • Direct component elements of the object

Long chaining  of accessors and methods is a sign of bad design.

For example:

	public class BusinessRule
	{
		public Bid GetCarAuctionBestBid(int carId)
		{
			//... some logic here
			return bidsLogic.Bids.AllBids.GetBestBid(b => b.CarId = carId);
		}
	}

Even if you need a particular information that is at the end of the chain, digging out the methods and accessors yourself is a terrible idea.

	public class BusinessRule
	{
		public Bid GetCarAuctionBestBid(int carId)
		{
			//... some logic here
			return bidsLogic.GetBestBid(carId);
		}
	}

Now you are only talking to your closest friend. “bidsLogic” resolves its properties internally and expose services that are explicitly needed by others components. There is another thing going on here. The law of Demeter is not just about chaining. When you do not have to worry about navigating accessor and methods (i.e., when you have what you need by just calling a method or property of a nearby object), we say that you are telling the object what to do. The principle of least knowledge is closely related to the principle “Tell, don’t ask”!

Tell, Don’t Ask

The problem is not to use “get” accessor to understand a behavior of an object. The problem is to make decisions based on that. You do not want to ask the object about its inner state, make decisions about that state and then perform some dark operation. Object-oriented programming tells objects what to do.

The sample below is a example of code that asks too much. The code makes decisions about the state of the bill by adding prices contained in the list of Items assuming that the sum represents the total value of the bill.

	public class BusinessRule
	{
		public double CalculateDinnersCost(List<Bill> bills)
		{
			if (bills == null) return 0;

			double totalCost = 0;
			foreach (var bill in bills)
				if (bill.Items != null ? bill.Items.Count : 0)
					foreach (var item in bill.Items)
						totalCost += item.Price;

			return totalCost;
		}
	}
	public class Bill
	{
		public List<Item> Items { get; set; }

		public void AddItem(Item item) {...}
		public void Remove(int itemId) {...}
	}

Instead of asking that much, if the state of an object can be inferred by examining closer accessors and methods, we should consider relocating the logic inside the right object. This is the “Tell, don’t ask” principle: instead of developing procedural code, we tell the objects what to do.

	public class BusinessRule
	{
		public double CalculateDinnersCost(List<Bill> bills)
		{
			if (bills == null) return 0;

			double totalCost = 0;
			foreach (var bill in bills)
				totalCost += bill.CalculateTotalCost();

			return totalCost;
		}
	}
	public class Bill
	{
		public List<Item> Items { get; set; }

		public void AddItem(Item item) {...}
		public void Remove(int itemId) {...}

		public double CalculateTotalCost()
		{
			double totalCost = 0;
			if (this.Items != null ? this.Items.Count : 0)
				foreach (var item in this.Items)
					totalCost += item.Price;

			return totalCost;
		}
	}

Single Responsibility Principle

The Single Responsibility Principle is straightforward:

Every class should have a single responsibility and have one, and only one, reason to change.

Imagine an entity called MessagesRegister, which is responsible for registering alerts and notifications.

This is how MessagesRegister works:

  • It reads a configuration file
  • It processes some business logic to create notifications.

This class can be changed for two reasons. First, the source of the configuration can evolve into something more elaborate over time (instead of XML, a configuration through graphical user interface). Furthermore, the actual processing rule might change.

It is a bad design decision to keep together two pieces of information that are changed for different reasons. The cohesion (i.e., focus and responsibility of the class) is reduced and the principle of separation of concerns is violated.

Putting it all together

So… Where should I put this code? While coupling measures the degree of dependence between different entities of the system, cohesion is a measure of how focused are the responsibilities of a given entity.

By following these principles, we can enumerate some major achievements in our design:

  • Maintenance is child’s play: we keep together things that need to be maintained together and things that are not directly related can be changed without affecting other components.
  • The code is readable and algorithms become more self-documenting.
  • Classes have well-defined responsibilities and code duplication is drastically reduced.

CohesionNCoupling2

Related readings

How to handle faults in WCF without declaring them explicitly

In WCF, the exception handling between client and server requires a specification of the faults that each service might invoke. This is a good thing because it allows fault declarations inside the WSDL (enhancing the code-generation at client side) and it also works as documentation about validations that can interrupt the execution flow.

However, imagine the scenario where you have a large standalone .NET application with hundreds of services that processes complex business rules. Distributing this application in tiers tracing all possible exceptions that each service might invoke can be challenging as hell.

For this particular scenario, an effective way to handle this problem is to implement a WCF architecture that does not declare the faults explicitly. Obviously, this approach assumes that some other layer is handling the exceptions appropriately. Furthermore, as exceptions will not be specified in the WSDL, developers should consider to move those that travel between the client and server to a shared base project.

Known limitations:

This approach is recommended for scenarios where interoperability is not a requirement. The best practices for service references states that everything (operations and data contracts) that the client needs to invoke a service must be predefined in the WSDL. In this case, exceptions will not be defined in the interface. Thus, it creates a coupling between the client and the server.

Also, I recommend this approach for interprocess distribution (e.g. using Named Pipes) or intranet applications. For security purposes,  sending real exceptions to the client in scenarios where the binding is unsafe, the code is not obfuscated and/or services are available via internet is not recommended. You do not want to provide details of your exceptions so easily.

Considering that, we can do the following:

  1. Define a generic fault which encapsulates the real exception thrown by the server-side application.
  2. Create a WCF extension (server-side) that handles exceptions globally, bundles and stores it inside the generic fault.
  3. Unpack the exception (client-side) that comes from the server.

You can also adapt this behavior to bundle not all, but only a specific subset of exceptions (e.g. that inherits a particular class or exceptions defined in one or more specific namespaces). 

Define a fault to encapsulate exceptions:

	[DataContract]
	public class PackedFault
	{
		public PackedFault(byte[] serializedFault) { this.Fault = serializedFault; }

		[DataMember]
		public byte[] Fault { get; set; } //Real exception will be stored here.
	}

Create a serializer to pack and unpack the exception:

This serializer must be declared in a shared project between the client and server. You can also use the XmlObjectSerializer instead of the BinaryFormatter, but this last one is known to achieve better performance.


	public class BinarySerializer
	{
		public static byte[] Serialize(object obj)
		{
			using (System.IO.MemoryStream serializationStream = new System.IO.MemoryStream())
			{
				IFormatter formater = new BinaryFormatter();
				formater.Serialize(serializationStream, obj);
				return serializationStream.ToArray();
			}
		}

		public static TObj Deserialize<TObj>(byte[] data)
		{
			using (System.IO.MemoryStream serializationStream = new System.IO.MemoryStream(data, false))
			{
				IFormatter formater = new BinaryFormatter();
				return (TObj)formater.Deserialize(serializationStream);
			}
		}
	}

Create an ErrorHandler responsible for centralizing exception handling on the server:


	public class ErrorHandler : IErrorHandler
	{
		public bool HandleError(Exception error)
		{
			//TODO Register (log) the exception (e-mail, eventviewer, databases, whatever).
			//Because the HandleError method can be called from many different places there are no guarantees made about which thread the method is called on. Do not depend on HandleError method being called on the operation thread.

			return false; //return true if WCF should not abort the session
		}

		public void ProvideFault(Exception error, System.ServiceModel.Channels.MessageVersion version, ref System.ServiceModel.Channels.Message fault)
		{
			error.Data.Add("StackTrace", error.StackTrace); //For security purposes, the client should not need to know the server stack trace. But in case you want to preserve that, here is an approach.
			PackedFault pack = new PackedFault(BinarySerializer.Serialize(error));
			FaultException<PackedFault> packedFault = new FaultException<PackedFault>(pack, new FaultReason(error.Message), new FaultCode("Sender"));
			fault = Message.CreateMessage(version, packedFault.CreateMessageFault(), packedFault.Action);
		}
	}

Create a WCF extension responsible for handling errors:


	public class ErrorHandlerServiceBehavior : IServiceBehavior
	{
		public void AddBindingParameters(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase,
			Collection<ServiceEndpoint> endpoints, BindingParameterCollection bindingParameters) { return; }

		public void ApplyDispatchBehavior(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)
		{
			var errorHandler = new ErrorHandler();
			foreach (ChannelDispatcher chanDisp in serviceHostBase.ChannelDispatchers)
				chanDisp.ErrorHandlers.Add(errorHandler);
		}

		/// <summary>
		/// Validate if every OperationContract included in a ServiceContract related to this behavior declares PackedFault as FaultContract
		/// </summary>
		/// <param name="serviceDescription"/>;
		/// <param name="serviceHostBase"/>;
		public void Validate(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)
		{
			foreach (ServiceEndpoint se in serviceDescription.Endpoints)
			{
				// Must not examine any metadata endpoint.
				if (se.Contract.Name.Equals("IMetadataExchange") && se.Contract.Namespace.Equals("http://schemas.microsoft.com/2006/04/mex"))
					continue;

				foreach (OperationDescription opDesc in se.Contract.Operations)
					if (opDesc.Faults.Count == 0 || !opDesc.Faults.Any(fault => fault.DetailType.Equals(typeof(PackedFault))))
						throw new InvalidOperationException(
							string.Format("{0} requires a FaultContractAttribute(typeof({1})) in each operation contract. The \"{2}\" operation contains no FaultContractAttribute.",
							this.GetType().FullName, typeof(PackedFault).FullName, opDesc.Name));
			}
		}
	}

Create an ExtensionElement to ease service configuration:


	public class ErrorHandlerExtensionElement : BehaviorExtensionElement
	{
		public override Type BehaviorType { get { return typeof(ErrorHandlerServiceBehavior); } }
		protected override object CreateBehavior() { return new ErrorHandlerServiceBehavior(); }
	}

Declare the extension as a behaviorExtension in the configuration file:

<system.servicemodel>
<extensions>
	<behaviorExtensions>
		<add name="errorHandler" type="WcfService1.ServiceModel.ErrorHandlerExtensionElement, WcfService1" />
	</behaviorExtensions>
</extensions>
</system.servicemodel>

Service (server-side):


	[ServiceContract]
	public interface IWCFService
	{
		[FaultContract(typeof(PackedFault)), OperationContract]
		void ThrowUndeclaredException();
	}

	public class WCFService : IWCFService
	{
		public void ThrowUndeclaredException()
		{
			throw new NotImplementedException();
		}
	}

Service invoke (client-side):


		try
		{
			using (WCFServiceClient client = new WCFServiceClient())
				client.ThrowUndeclaredException();
		}
		catch (FaultException<PackedFault> e)
		{
			Exception exc = BinarySerializer.Deserialize<Exception>(e.Detail.Fault);
			throw exc; //this is the real exception thrown by the server
		}

Download sample code here.


Why every developer should have a blog

Scott Hanselman once said: “Every developer should have a blog”. When I think about it, I can see that the motives are noble. We spend days, months, years, developing professional code. Some solutions are simpler to implement. Others present themselves as true mind blowers! Anyway, as if all our knowledge and contributions were like forgotten boxes in a garage, we leave aside the possibility of making potential contributions when good ideas for interesting problems are dispersed into multiple projects without applying a systematic and social criteria of organization.

Why not to write about an interesting design pattern that has been successfully applied in a given context? …Why should you write?

Writing is a way of backing up knowledge acquired in problems that you once solved (it is even a way of getting deeper in the topic). It is a testament that you do what you do because you have a passion for it and not just for money. It is a tool for sharing ideas and knowledge. It is even a way of marketing yourself.

Blogs, github, google code, sourceforge, etc.. There are many ways of organizing potential interesting content.


Follow

Get every new post delivered to your Inbox.