Skip to content

Categories:

Faceted search in WordPress

About a month ago I talked to a friend of mine about WordPress and PHP. Being a staunch .NET loyalist he quickly concluded that he had no time for sloppy PHP.

PHP supports both procedural and object-oriented paradigms and WordPress is a prime example of why this is not always a good idea: The WordPress API is partly based on procedures and partly based on global accessible God objects. Pragmatic – but not pretty.

But this do not mean that you cannot create a tidy and strictly object-oriented WordPress plugin. In this post I’ll show you how to structure your plugins.

The plugin that I am going to showcase is a plugin that add facets to WordPress’ search. We made this plugin for at The Royal Library for at Danish music research portal and its a pretty neat piece of software – clean and tightly coded and reasonably fast.

Faceted search explained

You probably know search facets from Google. Google’s result page allow you to show only certain types (or facets) of results – like images or videos.

You choose which facet to show by clicking on one of the buttons in the right side panel. Google will then show only the selected result type – filtering out all others.

Search facets makes Google simple to use: You just enter a search string and click search.

You do not have to choose what kind of results you are looking for before you get the actual results.

Understanding the WordPress search

The anatomy of a WordPress search is simple. When you enter the search string foobar and click search, WordPress requests the URL /?s=foobar.

Every time WordPress receives a request where the key s is set to some value the request is treated as a search and redirected to the result page.

The wp_query object

The results is fetched from the data layer using the object wp_query. The wp_query maintains a collection of predefined search conditions in key/value collection called query variables. The collection is manipulated by the two methods set and get.

The wp_query works by example: Most predefined query variables mirror the properties of a WordPress page. If you e.g. wish to fetch all pages with the title Frontpage you set the query variable post_title to Frontpage.

But the wp_query also includes some special query variables not directly related to page properties. The search uses its own variable s to do full text searches. After this variable has been set the search continues and calls the method get_post.

The method get_posts execute the following steps:

1. Cleans and sanitizes the query variable by calling the method parse_query.
2. Fires the WordPress event pre_get_posts.
3. Translate the query variables into a SQL SELECT query.
4. Execute the SQL query and fetch the result. The database interaction is done by the data abstraction object wpdb.

Implementing faceted search in WordPress

My plugin consist of two parts: A widget and a applier. The widget shows a list of all facets on the search result page:

Each facet is identified by an id, and when the user click a facets this id is added to the query string of a new search /s=foobar&facet_id=4.

Hence my facets are not filters but instead extra conditions that are added to the wp_query before performing a second search: It is most cases it is simply faster to let the database do another search than to start filtering the existing results in PHP.

The signature of a facet

I have implemented three types of facet – but their all inherit from the same base class:

What differentiates the three types are the add_facet method. It takes the wp_query and the key/value pair to it – but in slightly different ways.

Pages where a property has a specific value

The most basic type of facet simply takes the key/value pair and add it to the list of query variables:

class Property_Facet extends Facet
{
    /* (non-PHPdoc)
    * @see Facet::add_facet()
    */
    public function add_facet(&$wp_query)
    {
        // Adding value to wp_query
        $wp_query->set($this->key, $this->value);
    }
}

This type can be used to create a many different facets:

$a = new Property_Facet();
$a->key = 'post_type';
$a->value = 'post';

$b = new Property ();
$b->key = 'post_parent';
$b->value = '230';

$c = new Property ();
$c->key = 'post_author';
$c->value = '25';

Facet $a shows only blog posts, facet $b shows only child pages of a specific page and facet $c shows only content from a specific author.

Understanding post_type property

WordPress’s two types of content – pages and posts – are both stored in the wp_posts table. The post_type property is used to differentiate the two.

When WordPress introduced custom types (e.g. post_type='article' or post_type='record') a set of method was needed that would work on all post types. The result is a rather confusing API where the term post can mean content with post_type='post' as well as all kind of content in the wp_posts table.

Pages that have a specific set of metadata

Besides having properties a WordPress page has a key/value collection of custom meta data. My other type of facet adds conditions to this collection:

class Meta_Facet extends Facet
{
    /* (non-PHPdoc)
    * @see DVM_Facets_Facet::add_facet()
    */
    public function add_facet(&$wp_query)
    {
        $meta_query = $wp_query->get('meta_query');
        $meta_query[] =
            array
            (
                'key' => $this->key,
                'value' => $this->value,
                'compare' => 'LIKE'
            );
        $wp_query->set('meta_query', $meta_query);
    }
}

I use this type to create a facet that show only pages with a specific template:

$d = new Meta_Facet();
$d->key = '_wp_page_template';
$d->value = 'article.php';

Pages that descend from a specific page

The last type of facet shows only descendants of a specific page. I use this kind of facets to show results from different sections of my page.

To do this I need to get a list of all the descendants ids. Then I add this list to the query variable post__in. This narrows down the search to only the list of descendants.

So a ancestor facet is much like a property facet – we just need the list of ids. So I extent the property facet and inject a data access object that can create the list:

class Ancestor_Facet extends Property_Facet
{
    public $ancestor_id;
    public $dal;
    /**
    * Constructs a new facet
    * @param int $id Facet id
    * @param string $name Facet name
    * @param string $icon Facet icon
    * @param int $ancestor_id Ancestor post id
    * @param DAL $dal Data access layer
    */
    public 
    function __construct($id, $name, $icon, $ancestor_id, $dal)
    {
        $this->ancestor_id = $ancestor_id;
        $this->dal = $dal;
        parent::__construct($id, $name, $icon, 'post__in', '');
    }

    /* (non-PHPdoc)
    * @see Property_Facet::add_facet()
    */
    public function add_facet(&$wp_query)
    {
        $this->value =
            $this->dal->get_descendant_ids($this->ancestor_id);
        parent::add_facet($wp_query);
    }
}

Applying the facet

The next part of the puzzle is the mechanism that add the facet to the search when the request /s=foobar&facet_id=4 is made. This is done by the Applier with is hooked up to the pre_get_posts event and hence run each time the wp_query is about to fetch pages.

class Applier
{
    /**
    * Applies any selected facets to the wp_query
    * @param WP_Query $wp_query
    */
    function on_pre_get_posts(&$wp_query)
    {
        if ($wp_query->is_search())
        {
            $facet_query = new DVM_Facets_Facet_Query();
            $facet_query->
                from_query_string($_SERVER['QUERY_STRING']);
            if ($facet_query->has_facet_id())
            {
                $dal = new DVM_Facets_Dal();
                $facet = $dal->
                get_facet($facet_query->facet_id);
                $facet->add_facet($wp_query);
            }
        }
    }
}

The wp_query is used extensively in WordPress but most of the time the applier does not do anything either because the wp_query is not prepared for a search or because no facets are specified in the query string.

But when a search is done and a facet is requested the applier gets the facet and then add it to the wp_query.

The filter uses a Facet_Query to interact with the query string. This object encapsulates the query string and the parsing of it.

Posted in PHP.

Tagged with , , , , , , , , , , , , .


Rambling in the trenches – the other side

Half a year ago I went from a job as a .NET developer to a job as a research librarian. While my new job still revolves around IT, my daily routines have changed somewhat: As something new, I am spending many hours a day using web based backend tools like OpenCMS and Apache Roller.

World War 1 poster

There are a lot of benefits in developing web based tools: They are potential available from any computer and deployment and maintenance can be centralized. Web based tools also reduces the complexity of having multiple platforms and versions active.

But for the people using these online tools, the experience can be frustrating!

So slow and so quirky!

One thing that strikes me is how slow most web based tools are to work with over time. And it really adds up when you work with web based tools for hours each day.

Another thing that seems to characterize a lot of web based tools is the way you have to tweak them. My OpenCMS administration gives me an error when I delete files in certain folders. And Roller messes up my HTML if I change back to the WYSIWYG editor before saving. The editor for my private WordPress blog seems to be just as lousy.

Web based tools made on the fly

I think that there are a couple of reasons why so many web based tools suck: I many of the projects I’ve been involved in as a programmer, the backend has not been properly prioritized. While the frontend was prototyped, tested, reviewed by usability folks and so on, the specifics of the backend part were often made up on the fly.

Sometime I made small administration tools intended for me only – but they ends up in the hands of the customer.

The browser is made for browsing

But even some of the best web based tools like e.g. the Google Analytics administration is no match for a well designed desktop application. There is a lot of techniques for making a web based tool behave like a Windows application: Partial updates via AJAX, client side logic with JavaScript and all kind of fancy controls to make up an appealing GUI.

Orange office

The point is to avoid time consuming post backs, and make the GUI update it self on the fly. But even with all these techniques, a web based tool is simply no match for a local application. An example is the CMS Sitecore. Every measure has been taken to make Sitecore work like a multi window application. It looks really nice – but it is still painstakingly slow.

The cloud

Most of the tools that I have mentioned here are somehow hosted locally. I host my blogs myself, and the OpenCMS and Roller blog is hosted by the library.

But many programs that I use are hosted by ‘strangers’ and available as a service over the Internet. Like e.g. FogBugz or Google Calendar.

The notion of the cloud seems to be tied to web based tool: The ideal cloud based application is independent of the computer running it. It only requires a browser. You can you access you Google documents from anywhere. But some cloud based applications actually uses a small dedicated application to be installed. Application like Dropbox has applications like this, even though they can also be used as pure web based tools.

My current project

IBM System/360

In my new hobby project (a geocaching application) I am making the administration interface as a WinForm application. The real work is still carried out on the web server, but the entire administration interface is handled locally. The WinForm application uses WebServices to communicate with the website.

Sadly, if you are using a Mac, or have restricted access to your computer and/or the net, you simply cannot use the desktop application. But if you can, you actually get an application that works properly with a responsive interface that uses worker threads and standard Windows controls – not some strange system dreamt up by a developer in the last minute before launch.

Posted in C#.

Tagged with , , , , , .


Echo server

This spring, I follow a course in mobile and distributed systems at the IT University of Copenhagen. During the course, I am going to post some of my homework to this blog.

The course is part of the The Master of Science in Software Development and Technology programme, and focuses on distributed programming, concurrency, Web Services, network communications and so on. And althougth most of it actual code will be written i Java, I figure that most of it stuff is highly relevant in .NET as well.

Back to school

One of the first assignments is to write three echo services, one using TCP, another using UDP and a third using a JAX Web Service. The service should be able to handle multiple concurrent clients. Heres is my shot at a TCP echo server.

Echo server using TCP

First of all I encapsulated the client connection in a classed called TCPConnection. The naming is probably a bit off: Good naming should hide internal implementation details, and hence the class should be refactored to something like ClientConnection, I guess.

Well, whatever the name of the class, instances of the class encapsulates a socket and contains methods for reading from and writing to the socket using strings instead of the sockets input and output streams. To accomplice this, the class uses decorated readers.

At last, notice that the class contains not exception handling, and merely forwards the exceptions to a more appropriate place(s) for exception handling. This allowes the class to be used in different environments:

package echoservice.model;

import java.net.*;
import java.io.*;

public class TcpConnection {

    Socket socket = null;
    InputStream inputStream = null;
    OutputStream outputStream = null;
    PrintWriter outputWriter = null;
    BufferedReader inputReader = null;
    String clientAddress = null;
    int port;

    public TcpConnection(Socket socket) throws IOException
    {
        this.socket = socket;
        clientAddress =
        String.format("%s:%d",
        socket.getInetAddress().getHostAddress(),
        socket.getPort());
        inputStream = socket.getInputStream();
        outputStream = socket.getOutputStream();
        outputWriter = new PrintWriter(outputStream, true);
        inputReader = new BufferedReader(
        new InputStreamReader(inputStream));
    }

    public String readLine() throws IOException
    {
        return inputReader.readLine();
    }

    public void println(String x)
    {
        outputWriter.println(x);
    }

    public String getClientAddress()
    {
        return clientAddress;
    }

    public void disconnect() throws IOException
    {
        outputWriter.flush();
        // this will cause output stream to close as well
        outputWriter.close();
        // this will cause input stream to close as well
        inputReader.close();
         if (socket != null) socket.close();
    }
}

Next I created a thread to do the actual message echoing on a TCPConnection. The TCPConnectionThread takes a TCPConnection in the constructor, and extends the java.lang.Thread class.

Old computer

This allows me to excuting it as a new thread. This class includes exception handling as well, and write exception messages to the console.

package echoservice.model;

import java.io.*;

public class TcpConnectionThread extends Thread {

    TcpConnection connection = null;

    public TcpConnectionThread(TcpConnection connection)
    {
        this.connection = connection;
    }

    public void run()
    {
        System.out.println(
        String.format("Connected to: %s",
        connection.getClientAddress()));
        String input = null;

        while(true)
        {
            System.out.println(
            String.format("Waiting for: %s",
            connection.getClientAddress()));
            try {
                input = connection.readLine();
            } catch (IOException ex) {
                System.out.println(
                String.format("Error reading from %s: %s",
                connection.getClientAddress(),
                ex.getMessage()));
                break;
            }

            if (input == null || input.isEmpty()) break;

            System.out.println(
            String.format("Received: %s", input));
            connection.println(input);
            System.out.println(String.format("Sent: %s", input));
        }

        try {
            connection.disconnect();
            System.out.println(
            String.format("Disconnected from: %s",
            connection.getClientAddress()));
        }
        catch (IOException ex)
        {
            System.out.println(
            String.format("Error disconnecting from %s: %s",
            connection.getClientAddress(), ex.getMessage()));
        }
    }
}

Finally we need the server class. Whereas there can exists multiple parallel instances of the TCPConnectionThread, each handling a single client, there exists only one server instance. The server instance listens to a port, and when a clients connects, creates a TCPConnection and pass it on to a new instance of TCPConnectionThread. The server is started by the calling its listen() method, which causes the server to block while listening and starting threads. The only way to stop the server, is to shut it down.

package echoservice.model;

import java.io.*;
import java.net.*;

public class TcpHandler {

    int port;
    ServerSocket serverSocket = null;

    public TcpHandler(int port) throws IOException {
        this.port = port;
        serverSocket = new ServerSocket(port);
    }

    public TcpConnection listen() throws IOException
    {
        while (true)
        {
            new TcpConnectionThread(
            new TcpConnection(
            serverSocket.accept())).start();
        }
    }

    public int getPort()
    {
        return port;
    }
}

The UPD service it actually quite different, because UPD is non connection based. This means that my server handles incoming datagrams from all clients in a single worker thread, and echos to the socket address where the datagram came from.

In regards to concurrency, this actual makes the server somewhat simpler. On the other hand, the low level nature of UDP makes the actual reading and writing of datagrams a bit messy.

School kids

My Web Service based service is also non connection based as well. as Web Services should be stateless if possible. Well, thats all – until next Tuesday, then is back to school again!

Posted in C#.

Tagged with , , , , , , , .


WCF exception handling

I recently read a old post on http://www.techinterviews.com containing interview questions for C# developers. One questions goes: ”Why is it a bad idea to throw your own exceptions?” And the answer is “Well, if at that point you know that an error has occurred, then why not write the proper code to handle that error instead of passing a new Exception object to the catch block? Throwing your own exceptions signifies some design flaws in the project.”

Now, there is a importance difference between knowing that an exception occured, and knowing enough to be able to write the proper code to handle the exception.

Handling exception in WCF services

This is especially true when working with WCF services. WCF services requires special care regarding exception handling. This is because exception normally are not allowed to be passed through a WCF channel. Instead WCF uses SOAP fault messages to pass exceptions.

ENIAC

Fault messages are part of the the SOAP specifications: A SOAP fault is a piece of XML within the SOAP envelope, that includes elements like a fault reason and code and a detail element that can contain custom XML. The layout of fault messages differs some between SOAP 1.1. and 1.2, but WCF hiddes these differences. (You can however interact with the fault message directly using the MessageFault class). And most important: SOAP faults are platform independent.

A simplified SOAP 1.2. fault message could looks like this:

<?xml version='1.0' ?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope" xmlns:rpc='http://www.w3.org/2003/05/soap-rpc'>
 <env:Body>
  <env:Fault>
   <env:Code>
    <env:Value>10</env:Value>
   </env:Code>
   <env:Reason>
    <env:Text xml:lang="en-US">Error reason</env:Text>
   </env:Reason>
   <env:Detail>
    <e:FaultDetails xmlns:e="http:// example.org/faultsdetails">
     <e:Description>Exception thrown</e: Description>
     <e:Occured>999</e:Occured>
    </e:FaultDetails>
   </env:Detail>
  </env:Fault>
 </env:Body>
<env:Envelope>

WCF encapsulates the SOAP fault message in the FaultException, located in the System.ServiceModel namespace. The FaultException comes in two flavors: A non-generic version, contains the basic fault message elements Action, Code and Reason as properties, as well as the properties inherit from System.Exception.

Beside this simple version, WCF also ships with a generic version FaultException, that takes a instance of a custom class, and serializes it to the detail element of the fault message using the DataContractSerializer. This of cause requires that FaultDetails is DataContract serializable, and the operation that may throw e.g. a FaultException<FaultDetails> to marked with the FaultContract attribute: Here my FaultDetail class includes two properties, a timestamp and a description (the constructor is overloaded, so that FaultReason and FaultCode can be specified as well):

[OperationContract]
[FaultContract(FaultDetails)]
public void ThrowException()
{
  throw new FaultException(new FaultDetails 
  {Occured = DateTime.Now,
  Description = "Exception thrown"}));
}

This exception can be caught on the client side:

[OperationContract]
catch (FaultException fault)
{
  Console.WriteLine(String.Format("{0}: {1}",
  fault.Detail.Occured,  
  fault.Detail.Description);
}

Fault exceptions over the wire

The point with the FaultExceptions are can be send over the WCF channel as SOAP to the client. Should any other CLR exception ever reach the WCF channel, WCF will instead send a general FaultException to the client. But this has some side effects: Besides from sending a general FaultException, WCF will also kill the session (all WCF bindings except BasicHttpBinding are sessionbased), and put the calling client in faulted state, meaning the the client will have to reconnect!

It become increasingly clear to me just how important this kind of behavior is – and not only in WCF services: A unhandled exception indicate that something bad has happend, which may leave the application in a unknown or unpredictable state – hence the only safe thing to do is to shut it down as fast as possible. This also means that you should never claim to be able to handle an exception without been absolutely sure that this can be done safely. There is no shame in simply giving up.

Including exception details in the fault exception

The generalt fault message emitted by WCF do not include details about the server side exception. This can be enabled on by marking the service with: [ServiceBehavior(IncludeExceptionDetailInFaults=true)]. This means that WCF will now include the exception details in a FaultException instance. This behavior somewhat mimics the normal (WinForm) way of exception handling: A unhandled exception, the application gets killed, and the user gets the exception details. Like all debugging information, the ExceptionDetails may include stuff that the client should not know about. Therefore ExceptionDetails should only be included in debugging scenarios.

Throwing you own fault exceptions

A much better approach is to catch the exception before it reaches the channel, and retrown a FaultException. Lets say that we actually know enough to be able to safely handle the exception, and we do not want to end the session and fault the client. If e.g. the exception indicates something like say a invalid parameter in a service operation call, I may want to handle the exception on the server, and then send an exception to the client (instead of returning some custom ”error” result, that the client then has to actively interpret).

Of course, the post on techinterviews.com clearly dates back before WCF. But it still holds that that one should also never try to handle a exception unless you are sure that you have the information needed. This is especially true in layered code, whether its server-client code like WCF or multi-tier.

Posted in C#.

Tagged with , , , , , , .


Microsoft Team Foundation Server: Moving from Visual Source Safe

As a developer at Magnetix A/S, I am currently spearheading the implemention of Microsoft Team Foundation Server (TFS) across the organization. This include moving most of our codebase from our old Visual Source Safe, setting up custom project templates and describing best practices and workflows.

The process have not been without problems, but my general experience is that TFS, once up and running, offers a some substantial step forward compared to our previous setup.

The Team Foundation Server is build upon Sharepoint and Reporting Services and combine advanced source control with reporting, task and document management. The server is exposed via Web Services and the TFS Client API. The most important client tool is the Team Explorer add-on for Visual Studio, but other clients exists including some third party applications. I am not going further into the product details as these a readily available on the web. Rather I am going to describe some of the choices and problems we faced during the implementation, and some of the decisions we took.

Four men and a computer

First of all, as a company developing custom Web Applications, we maintain a rather large collection of source code. Most of our solutions are Visual Studio 2008, but we do some active projects written in Visual Studio 2005 and even Visual Studio 2003.

Although it is possible to use the TFS Source Control (called Team Foundation Version Control) via Visual Studio 2003 via the MSSCCI provider, the Team Explorer is avaiable only for Visual Studio 2005 and 2008. Hence we chose to leave our 2003 projects in our old Visual Source Safe.

Team Projects

Team Foundation Server organize projects and in Team Projects. Each Team Projects is based upon a project template, and includes a Sharepoint site (called Project Portal), a collection of documents, a source code folder and a collection of work items. Work items are a central concept in TFS: Work items a pieces of work, that needs to be dealt with. This could be a bug, a change request, a risk assessment or some other task. The types and layout of work items avaiable is defined in a project template. The project template describes many other aspects of the Team Project, includes policies and workflows, the initial structure of the document collection and so on. The template also includes a process guidance, that is a description of the project model in the form of a collection of web pages.

Creating custom project templates

TFS ships with two project templates describing a Microsoft Solutions Framework process for Agile software development, and another template describing a heavyweight CMMI process. But using the Team Foundation Server Power Tools, you can define you own templates – either from scratch or based upon a existing template. Other third party templates are also available.

At Magnetix A/S, we use our own process model mainly informed by Agile software development. Therefore I created a template based upon the Agile template, but modified according to our model. These modifications included changing the available tasks, which is done using XML.

Visual Studio solutions versus Team Projects

A Team Project defines a Sharepoint site, called the Project Portal. As we are planning to expose this portal to our customers, we generally opted for a single Team Project (and Portal) for each customer. This generally means that our Team Projects contains multiple Visual Studio solutions, as we typically develop more than one product for each of our customers. I believe that this decision is the right one, as Team Projects cannot be nested hieracally, and we really did not want to maintain a Team Project per Visual Studio Solution. Instead we created a Project area for each solution, so that e.g. work items can be target to a specific solution within the collection of solution that make up the Team Project. A few of our customers are so large that a single portal is not viable. These customer was divided according to the divisions within the customer.

Adding source code

With the Team Projects in place, I was ready to move our code base. The solutions was downloaded from our Visual Source Safe, and the source control bindings was removed. We initially left the solutions in Visual Studio as a precaution, but set them all to read only, so that no one accidently checked new code into our now deprecated Visual Source Safe.

The folder structure of our Visual Source Safe repository was not consistently applied. On our TFS we want to store each solution in a more uniform fashion. We defined a general solution template, somewhat inspired by the way in which solutions are stored in Subversion: Each solution is stored in a Main folder, containing folders for source code, third party libraries and builds. Parallel with the Main folder, we maintain a folder for seperate folder for branches and a folder for releases.

As most of our solutions are Web Application, we generally do not support more than the most recent version, stored in the Main branch. Therefore, I predict that the Releases folder will be only sparsely used. Instead I recommend that our developers to apply a TFS label their releases. The label marks a certain version of the source code, and in this way, the labelled version can be restored later and then branched if the need arises.

Files that should placed under source control

When adding solutions to source control, you want to make sure that only the relevant files are placed in the source control repository. As a rule of thumb, not files that are generated by the build process must be placed under source control. Also, we recommend that documentation is placed somewhere else, e.g. on the Project Portal. This is because TFS do not handle changes done outside Visual Studio very well: If you want to edit a Word document, you need to check out the document in Visual Studio, do the editing and finally check the document back in via Visual Studio, as Visual Studio do not reconize any change done from the outside (a behavior that hopefully will change in 2010). Also, you do not want to add the following file types: Solution user option files (*.suo), project user option files (*.csproj.user or *.vbproj.user) and WebInfo files (*.csproj.webinfo or *.vbproj.webinfo).

The content of the bin folder is generally created during the build, and hence the bin folder should not be placed under source control. Some exceptions to this rule is the .refresh files that is placed in the Bin folder of Web Sites as these files controls the inclusion of references during the build. Also some systems (like Sitecore or EpiServer CMS) places large amount of libraries in the Web Applications bin folder. You may want to check these files in – if your policy is to keep complete system in the repository (this can be somewhat problematic in Sitecore 5.3, as the complete Web Application include more than 25.000 files!).

Computer with wheel

A special case occures you later references some of these CMS libraries. As soon as you reference a file in the bin folder, you should move the file outside the bin folder, and place it in a dedicated folder for third party assemblies (e.g. libs).

Finally it is worth mention that the Clean Solution option in Visual Studio is generally not finegrained enough to prepare a solution for source control. On the other hand, the default filter applied when adding files to TFS may be to restrictive, and exclude files that you do want to place in the repository. Hence my favorite way to add files is to clear the filter (it is applied dynamically), sort the files by type, and then go through the files manually.

What’s next

Using TFS is a whole chapter in it self. A lot of can be said about the way in which files is checked out and placed on the local harddrive via Workspaces, how to branch files and how to expose the Web Access and the Project Portals (we are in the process of creating a work item webpart for the Project Portal, so that our customers can report issues directly from they portals - I found a proof of concept on Richard Fennells blog). I will return to these subjects in a later post. Also, you may what to explore the possibilities for TFS integration in programs other than Visual Studio – e.g. Excel and Microsoft Project. A good starting point is the Team System Developer Center.

kick it on DotNetKicks.com

Posted in C#.

Tagged with , , , , , , , , , , , , , , , .


Modifiers

Modifiers in C# are keywords added to class and field declarations (and in some cases enums, interfaces and structs as well). C# supports a wide range of modifiers, and in this post, I will go through some of the advanced modifiers and explaining how they can be used.

Old computer network

The concept of modifiers is in itself not that hard to understand: Most modifiers says something about how you are allowed to use a class or method. While some modifiers forbids you to do something, e.g. inherit from a class marked as sealed, other modifiers permits you to do something, e.g. override a method marked as virtual.

Whether a modifier permits or forbids something is a result of the default state of thing: As a default a class can be inherited from – so we need a modifier to forbid this. And as a default a method cannot be overridden, so a modifier is needed to allow this.

But modifiers can be confusing when more than one modifier is added to the same class or method: Some modifier implicit includes other modifiers: A method marked with the override modifer is implicit virtual, and you are therefore not permitted to add both modifiers. But other modifier explicit requires another modifier – and you have to supply both, e.g. the sealed modifier explicit requires the override modifier.

Enough said, lets get started…

Abstract (class): An abstract class is a class that cannot be instantiated. Thus the only way to use of an abstract class is to use it as a base class. Abstract classes are the opposite of sealed classes, so the two modifiers cannot be used together.

Abstract (method): An abstract method is like a virtual method. But unlike a virtual method which is allowed to be overridden, a abstract method must be overridden. Thus an abstract method contains no implementation (it would always be overridden). This also means that a class containing abstract methods cannot be allowed to be instantiated (the instance would contain empty methods). Thus abstract methods can only exist inside abstract classes. Since an abstract method is already implicit virtual, the virtual modifier cannot be used together with the abstract modifier.

Virtual (class): Do not exists; you cannot override a class, only inherit from it.

Virtual (method): Allows a method to be overridden. Cannot be used together with the private modifier, because if the method is private, it will be impossible to override it. Cannot be used together with the abstract modifier, because a abstract method is already allowed to be overridden. Last, the virtual modifier cannot be used with the override modifier, because overriding methods a implicit virtual.

New (class): Do not exists; you cannot override a class, only inherit from it.

New (method): Provides a new implementation of a base class method to a derived class, and hides the existing method. The base class method must have the same name and signature.

Override (class): Do not exists; you cannot override a class, only inherit from it.

Override (method): Does the same as the new modifier, with a twist Unlike the new modifier, the overridden method in the base class method must be virtual or abstract to be overridden. You can also override methods that them self override some method. This is because a method marked with the override method is implicit marked as virtual.

The difference between new and override modifiers

To understand the difference between overriding and supplying a new implementation, we need an example: We have two classes, the one inheriting from the other. Both classes have the same two methods, but while the derived class replaces the implementation of Method1, it overrides the implementation of Method2.

public class Base
{
    public void Method1()
    {
        Console.WriteLine("Base.Method1");
    }
    public virtual void Method2()
    {
        Console.WriteLine("Base.Method2");
    }
}

public class Derived : Base
{
    public new void Method1()
    {
        Console.WriteLine("Derived.Method1");
    }
    public override void Method2()
    {
        Console.WriteLine("Derived.Method2");
    }
}

class Program
{
    static void Main(string[] args)
    {
        Base b = new Base();
        Derived d = new Derived();
        Base d_in_b = new Derived();
        b.Method1();
        b.Method2();
        d.Method1();
        d.Method2();
        d_in_b.Method1();
        d_in_b.Method2();
    }
}

The result of running this code, is the following.

Base.Method1
Base.Method2
Derived.Method1
Derived.Method2
Base.Method1
Derived.Method2

The difference is that if you mark a method with the new keywork, the base method is still accessible on the derived class – you just have to downcast it to the base . The override keyword overrides the base methods implemetationen, rendering it imposible to used on the derived class – even if a instance of the derived class is casted. This is like the difference between implementation and explicit implementation of an interface, but that is another story.

Sealed (class): A sealed class cannot be inherit, hence a sealed class cannot serve as base class for a derived class.

Sealed (method): A sealed method cannot be overridden. But the sealed modifier can only be used, when overriding an existing method (this is because a ordinary non-overriding method is implicit sealed). Hence the sealed modifier will only make sense together with the override modifier.

Static (class): A static class is a class that cannot be instantiated. But a static class includes methods, that can be used from the type itself. Is a class is static, all its method must be marked a static as well (this will not happens implicit). A static class cannot be inherit from, and is therefore implicit sealed – hence it cannot be marked as either abstract or virtual.

Static (method): A static method is a method that must be used from the type, and not from the instance. A static method is not allowed to interact with instance specific fields.

kick it on DotNetKicks.com

Posted in C#.

Tagged with , , , , , , , , , , , .


Using generic methods and constraints

Generic methods are a powerful way to provide type safety yet creating flexible methods that support multiple types. The idea behind a generic metode is simple: A generic method accept parameters and returns values of a range of different types. Hence it is generic, opposite to specific. But there is a little more to the story.

Let us first look at a non-generic way to accomplice this: That is making methods that accepts and returns objects of some base class or interface: E.g. many methods accepts a Stream as a parameter - this means that they will work on all the specific streams in .NET (FileSteam, MemoryStream and so on). 

This kind of non-generic, yet type flexible, method works great in many scenarios. But in some scenarios, generic methods offer stronger typesafety and better performance:

An common would be, that you need to return a object of some unknown type, instead of accepting it as a parameter: Where using a  FileStream as a Stream object in a parameter provides flexibility, returning a FileStream as a Stream, like in this example, may mean that the calling code may have to cast the returned Stream back to a FileStream:

public Stream NonGeneric(Stream stream)
{
    //do something
    return stream;
}

So why is this bad? What happens when you supply this method with a FileStream, is that the method returns the same FileStream, but now as a Stream. The returned object is still a FileStream, but for all the calling code knows, the returned object could be any kind of Stream.

So to use some of FileStream methods, the calling code may have to cast the Stream back to a FileStream. Since the returned Stream is in fact a FileStream, this cast goes well. But first the CLR check whether the Stream is in fact a FileStream (or can be cast to one) by examine the objects metadata.

This is done runtime and is in fact time consuming. What also happens, is the if you for some reason change the input parameter to a MemoryStream, the code including the FileStream cast will still compile. But at runtime you will get a InvalidCastException. Hence this makes the code more error-prone.

Generic methods

Generic methods offers a less errorprone and more efficient way do the exact same. The idea is, that you simply supply the specific type (FileStream) as a parameter together with the ordinary parameter. This extra parameter is called a type parameter.

Now the function knows exacty what kind of Stream it should return, and yet you can still use the same method later with some other stream, by supplying some other type os stream.

Inside the method, you still have to threat the stream parameter as a Stream – otherwise the method would not be generic (work on different streams). But outside the method, a specific stream goes in, and the same specific type of stream comes out. In C# syntax, this looks like:

public T GenericMethod<T>(T stream) where T : Stream
{
    // do something
    return stream;
}

The type parameter is conventional called T, but this is just a convention, so in my next example, I will call it TheType. Also notice the where clause concluding the methods signature. This constaints the types accepted to types that inherit from Stream. Otherwise the method would accept all kind of types, which would make the method more generic that we would like. Often constaints are needed, so let’s take a look at another constraint:

public TheType New<TheType>() where TheType : new()
{
    return new TheType();
}

This method is called New, and takes one type parameter, TheType. There is no ordinary parameters. The method returns a object of the type supplied as a parameter. Because of the where clause, this method will only accept types that have a parameterless constructor.

The reason why this constraint is needed, is because the method constructs a new instance of the supplied type, and returns this object. Because this is done with the new keyword, we have to make sure that the supplied type parameter is in fact a type with a parameterless constructor. As before, this constraint is checked at compile time. Your code will simply not compile if the supplied type does not meet the constraint.

Calling generic methods

When you call a generic method, you supply the class name as type parameter. Is method above is wrapped in a class called GenericsExamples, I would call:

GenericsExamples ge = new GenericsExamples();
DateTime dateTime = ge.New<DateTime>();

On a side note, you may find it a little strange that the type parameters are not System.Type instances, but simply class names. This is to make the code easy to read.

The next method demonstrate how to constraint the type parameters to type implementing a specific interface. The is needed because I am using the the Convert.ChangeType, which requires a that the source type implements IConvertible:

public U Convert<T, U>(T value) where T : IConvertible
{
    return (U)Convert.ChangeType(value, typeof(U));
}

Calling this method will convert a supplied instance of the class T to a instance of the class U:

Other kind of constraints

We have now seen three ways to constraint the type parameters accepted: By base call or interface or by having a parameterless constructor. There is three more constraints, where T: struct means that the type parameter T must be a value type, excluding enums. The constraint where T: class means the the type parameter must be an object (same as where T : System.Object). The last constaint is where T : U. This simply means that the type parameter T, must derive from some other type parameter U.

 

Generic classes

Type parameters can also span the entire class. Then the class is called generic. The type parameter is the moved to the declaration of the class. This means that you do not have to supply the type parameter in each method call, but only once, when you declare the class:

public class GenericClass<T> where T :new()
{
    public T New()
    {
        return new T();
    }
}

When declaring this class, you will have to supply a type that meets the new constraint:

GenericClass<DateTime> gc = new GenericClass<DateTime>();

Following, the generic class can be employed without type parameters:

gc.New().AddMonths(1);

kick it on DotNetKicks.com

Posted in C#.

Tagged with , , , , , , , , , , .


Formatting enums using CustomFormatters

C# enums are named constants, making code easier to write and read. But the enum names should not be written to the UI. This post shows two methods of formatting an enum to a string suitable for the UI.

A C# enum is simply a named constant: Let’s say you are working on a application where you need to represent the fundamental forces in some way (if you cannot remember what that is, don’t worry, this could be anything really – fruits, cars, countries). The four forces are represented with a number, from 1 to 4 - and the C# enum construct lets you attach a name to each number:

public enum Force : byte
{
    [Description("Gravity")]
    Gravity = 1,
    [Description("Electromagnetic force")]
    Electromagnetic = 2,
    [Description("Weak nuclear force")]
    Weak = 3,
    [Description("Strong nuclear force")]
    Strong = 4
}

Each enum pair is also given a description using the DescriptionAttribute (located in System.ComponentModel). I will return to this attribute later.

Now, if you want to output an enum name to the UI (e.g. the console or a Webpage), the simplest way is to do the following:

Force f = Force.Strong;
Console.WriteLine(String.Format("Force: {0}", f));

The String.Format method simply calls f.ToString(), and inserts the resulting string in the overall string. The result is as expected:

Force: Strong

But this solution should be avoided. The enum name is a coding “shortcut”, and should never be displayed on the UI. That is because special characters and spaces are not allowed in the enum names, and localization is not supported.

The means that if I would like to output Strong nuclear force instead of the enum name Strong, I would need another way. This could be accomplished but outputting the description instead of the name. This requires some coding, but can be done like this:

public static string GetDescription<T>(T value)
{
    DescriptionAttribute[] d = (DescriptionAttribute[])(typeof
    (T).GetField
   
(value.ToString()).GetCustomAttributes(typeof(DescriptionAttribute),
   
false));
    return d[0].Description;
}

This method takes an enum type and an enum value, and returns the DescriptionAttribute as a string:

Console.WriteLine(String.Format("Force: {0}", GetDescription<Force>(f)));

And the result is:

Force: Strong nuclear force

While this method is rather common, it is not a favorite of mine. The reason is, that the DescriptionAttribute is not a suitable place to put UI strings because it can only accessed through reflection. The method is not very flexible, say you need to format the same enum in two different ways in different parts of you application, since you cannot have more than one description to an enum pair.

The best way is, in my opinion, to use a CustomFormatter. This needs a little explanation: Formatting in C# is done by objects that implement the IFormatProvider interface. This interface contains only one method, object GetFormat(Type formatType). When a method employs a IFormatProvider, it will ask the GetFormat method for the specific formatter needed. If the method needs to format some numbers it will call GetFormat and ask for a NumberFormatInfo formatter. This object contains all the methods needed to format numbers.

On a side note, notice that the most common IFormatProvider is CultureInfos: They got a GetFormat method, and asked for a NumberFormatInfo object, it will return a culture specific NumberFormatInfo object. The CultureInfo object contains other formatters as well, formatters for formatting Dates, formatters for comparing strings and so on. The IFormatProvider is a factory, that will construct a whole range of different formatters when asked to.

Well, back to the enums: When the String.Format method is called with a IFormatProvider, it will first try to retrieve a type of formatter, that implementing the ICustomFormatter interface. It will then use this CustomFormatter to format each of the parameters.

The ICustomFormatter interface contains a single method: string Format(string format, object o, IFormatProvider formatProvider). This method is called by String.Format for each parameter. The first parameter takes a format string. This string can be added in the String.Format method, like:

String.Format(myFormatProvider, "Force: {0:R}", f);

Then the string “R” is passed to the CustomFormatter. This can be used to support different kinds of formatting with the same CustomFormatter. The object o is the object to be formatted, and the formatProvider is the formatProvider that actually constructed this CustomFormatter.

Now, let’s create a CustomFormatter that will format the Force enum:

public class ForceCustomFormatter : ICustomFormatter
{
    public string Format(string format, object o, IFormatProvider
    formatProvider)
    {
        if (!o.GetType().Equals(typeof(Force)))
        {
            return o.ToString();
        }
        else
        {
            Force force = (Force)o;
            switch (force)
            {
                case Force.Electromagnetic:
                     return "Electromagnetic force";
                case Force.Strong:
                     return "Strong nuclear force";
                case Force.Weak:
                     return "Weak nuclear force";
                default:
                     return force.ToString();
            }
        }
    }
}

The formatter is simple: If the supplied object is anything but a Force enum, the formatter will simply call the objects ToString method. Force objects is on the other hand formatted using a switch statement.

Last, we will need a provider that will construct a ForceCustomFormatter:

public class ForceFormatProvider : IFormatProvider
{
    public object GetFormat(Type formatType)
    {
         return (formatType == typeof(ICustomFormatter)) ? new
        
ForceCustomFormatter () : null;
    }
}

This method constructs a ForceCustomFormatter is asked for a ICustomFormatter.

Let’s summarize: The call

Console.WriteLine(String.Format(new ForceFormatProvider(), "Force: {0}", f));

Needs to format the Force enum f. It will call the ForceFormatProvider to retrieve a CustomFormatter. Then it will pass the parameters on at a time to the CustomFormatter, and use the resulting string. This is, in my opinion, the best way to format enum – and other objects as well. In reality, you want to make formatters that can format more than one object. You will also want to take a look at the part:

if (!o.GetType().Equals(typeof(Force)))
{
     return o.ToString();
}

This is the most basic formatting option, and if the object – like e.g. DateTime supports formatting with ToString(string format, IFormatProvider), you should revert to this method instead.

Posted in C#.

Tagged with , , , , , , , .


First experience with Sitecore 6

I just started a new project, my first one in Sitecore 6. Sitecore have improved 5.3 in a number of ways.

The installation was not without its problem. The installer complained about a missing ASPNET user, although the user was clearly present. Hence I ignored the warning, and continued. After installing the files, I tried to install the databases via the installer, but for some reason it failed and rolled back the whole installation. So I installed again and reverted to my old procedure of attaching the mdf file manually, and creating a login for the Sitecore user.

Databases

But here came the first pleasant surprise: The number of databases have been reduced from 7 to 3, leaving only Core, Master and Web. This clearly simplifies the process of attaching and deploying databases. And in a environment like ours, where we host multiple Sitecore instances on a single Microsoft SQL server, the aesthetic aspect of not having 7 databases per instance popping up in the enterprise manager should not be underestimated.

Inline editing

Sitecore 6 features inline editing, a feature found in other Content Management Systems. The page can be edited directly on the page though editable panels, and not via the simple input forms used in the previous Sitecore versions administration interfaces. I predict that this feature will be very well recieved among our clients, and make it a lot easier for Sitecore to target not-so-technical administrators.

The interface

The overall impression of my administration interface of choice (desktop), is that the interface have been simplified a bit, and hence been made a little more responsive. A new detail that I greatly enjoyed was the icon that appears besides the titles of locked items. This makes it very simple to ensure that your work have been checked in.

Sitecore WebControls

Sitecore 6 ships with at collection of Sitecore .NET WebControls, located in the Sitecore.Web.UI.WebControls namespace. I you prefers sublayouts over renderings, these will come very much in handy. You can simple insert a Sitecore field value in the markup like any other WebControl – without writing a single line of codebehide. These WebControls are configured in the sc tagprefix. This collection includes WebControls for rendering ImageFields, TextFields and so on in the typesafe and intellisense enabled manner common to ASP.NET developers.

All said, Sitecore is not a ‘out of the box’ system, and  nor should it be (in my opinion): Sitecore advantage is that is a flexible platform for development more that a classic CMS. And you still need to manually tweak configuration files and permission to get the instance online. But my first experience in Sitecore 6 was definitely a positive one.

Posted in C#.

Tagged with , , , .


Overflow, checked and unchecked

The CLR supports both unchecked and checked integer arithmetics. This post explains how this is exposed in the C# language.

As default, C# does integer arithmetics in a unchecked context. In a unchecked context division by zero throws a exception, whereas overflow does not. Overflow happens when the range of a type is exceeded, e.g. by adding to the byte b, so that result exceeds the range of 0-255. When this happens, the CLR truncates the result, and simply count over from zero: 254, 255, 0, 1, 2 … Hence the following statement will not throw an exception:

for (int i = 0; i < 256; i++)
{
    b++;
}

Checked context

But the CLR actually supports integer arithmetics with overflow checking. You can mark a portion of your code as checked, or use the /checked compiler option to make the whole code compile to in a checked context, causing the following code to throw an OverflowException:

checked
{
    for (int i = 0; i < 256; i++)
    {
        b++;
    }
}

But the overflow checking comes with a performance penalty – actual code like the above (when not overflowing) runs about 3.5 times faster in a unchecked context than in a checked. As an alternative, you can check for overflow youself, given that you know the precise circumstances under which a overflow will happen: 

for (int i = 0; i < 255; i++)
{
    if (b < 255) b++;
}

This check runs a bit faster than the all encompassing CLR check, and reduces the performance penalty to about 2.5.

Posted in C#.

Tagged with , , , , , , , .