MongoDb

MongoDb C# Driver 2 - A Nice(ish) Abstraction by James Heywood

I've been doing a bit of work in the evenings of late to put something together that better demonstrates some of my back end development as I realised there was no example data access code in my bitbucket repos. 

So I've resurrected part of an old project, the project was an on-line car owners portal for Porsche, in fact some previous posts on this blog relate to this, I had to remove the name Porsche from those articles (and screen grabs) under threat from a previous employer, but I feel enough time has passed now that it's ok to mention their name, if not well we'll see what happens!

You can see my efforts in the following repository; https://github.com/jdheywood/4um - a rough estimate of the time I've spent on this is somewhere between 10 and 15 hours over the course of a week.

Overview

Anyway the part I have chosen to rewrite to show what I can do is a mini forum, where users can ask questions and other users can reply. There is a search function and the ability to favourite questions and answers so users can revisit these as and when needed. 

When we built this for the client it was a single .NET MVC application using forms authentication, knockout.js to enhance the user interface and MongoDb for the data storage. I've decided to rewrite this as an API and an SPA so that I can better separate the data from the interface, my plan is to recreate the knockout interface once the API is ready and then after that have a go at writing interfaces using a variety of JS frameworks to contrast and compare these approaches.

The user interface (when ready) will consist of two forms, one for users and another for admins (or maybe one form that adapts based on the user's role?), with a JS framework using ajax to call the API for data operations. A stretch goal would be to add OAuth support to ease registration and access, but that's way down the line.

 

Progress

For now I've been concentrating on getting the back end up and running, I thought this would be relatively quick to do but Mongo have rewritten their C# driver completely since I last looked at it so I've ended up re-writing this part completely. Also when I built this a couple of years ago we had very little test coverage so I've tried to address this shortcoming.

So what I have so far is a nice(ish) abstraction over the Mongo C# drivers (if I do say so myself!) and the beginnings of an API that uses the repositories I've created to access the MongoDb collections.

Yes I know repositories are so 2000s and in the 20 teens it's all about CQRS, but hey I can always refactor these at a later date, I wanted to concentrate on getting the mongo abstraction in place first before I changed too much.

I have written unit tests for the various factories that abstract the MongoDb drivers and integration tests to demonstrate and prove the repositories functionality. 

I've also created a simple console application to load sample data to mongo, as the integration tests set up and tear down the data they use when they finish you have no data, hence the need for loading sample data to demonstrate the API.

If you have access to a Windows machine and Visual Studio 2013 you can pull down the repo, build the solution and see this running. You'll need to install MongoDb first though, instructions/links can be found in the readme of my github repo.

If you run the integration tests you can see the repositories in action, similarly if you run the Forum.Tools.Samples console application then fire up the API and hit one of the endpoints; /api/questions, /api/answers, /api/searchterms you can see the API returning data from MongoDb.

 


Good, Bad, Better

Some things I like about this
- The abstraction over MongoDb via MongoContext, MongoClientFactory, MongoDatabaseFactory and Generic MongoCollectionFactory
- The use of SimpleInjector for IOC/Dependency Injection, especially the ability to verify the container
- The use of MongoContext in the API Startup as a single entry point to set up the data access
- The integration tests covering the repositories

Some things I would change given time;
- Refactor the repositories into Commands and Queries to allow easier testing
- Introduction of API data transfer objects and mapping between domain objects and these DTOs
- Swap out the use of an installed MongoDb for an embedded or hosted version
- Use AutoFixture customisations to build test and sample data instead of semi-hand rolling these objects

The next things for me to do to get on with this project;
- Define the API routes and build out the API layer
- Change the API to return JSON instead of XML
- Unit and Integration tests of the API
- Install swagger to self-document the API; https://github.com/swagger-api/swagger-ui 
- Set up a simple web application
- Markup and styling for the web app
- Introduction of knockout to provide UI functionality

So quite a few things to do before this is a functioning application, however the data access layer is well defined, covered by tests and helps to demonstrate the kinds of applications I've built, I hope you like it.

 

Summary

One final point to note is that I had trouble unit testing my repositories due to issues substituting the MongoCollection objects that the repository methods use, I have used integration test to add test coverage because of this issue. Even though we need integration tests anyway to actually test data storage and access we should be able to write unit tests for our repositories which run quickly and are less expensive in terms of maintenance, deployment and execution, so how do we do this?

The issue stems from the collection objects and the fluent API that the MongoDb C# driver provides, when trying to set the .Returns(object) for the chained fluent API methods off a collection NSubstitute throws errors as it doesn't know which method you are trying to set the .Returns() on.

After some googling on this and how best to resolve what I've determined is that this is a problem with my code and abstraction, rather than NSubstitute (way to state the bleeding obvious eh?). The way to resolve this is to introduce another layer to the abstraction, something along the lines of an IForumMongoCollection, and have this implement methods of IMongoCollection, this way my MongoCollectionFactory can return my ForumMongoCollections and I can mock/substitute these in my unit tests.

So a simple idea when you step back and think it through, it alluded me at the time I was writing this code though, but that's what you get for coding late into the night. So that's now top of my to do list, once that's in place I can start building out the application properly.

Anyway enough blogging for now folks, hope you like this article and the repository, as ever throw comments and questions at me below if you are so inclined.

Cheers

 

Knockout.js, .NET MVC & MongoDb by James Heywood

So this is a little overview of a project I worked on a while ago utilising these three technologies. The idea of this post is to show how a request/response cycle is handled from the initial request of a page by the browser, through the server code, to the data store and back again.

The stack is an ASP.NET MVC 3 web application, using the Razor view engine, Knockout.js for client side UI functionality and MongoDb for data storage. The web application makes use of a Windsor container to provide IOC, server side logic is separated using a service and repository pattern and constructor dependency injection is used throughout.

The application is hosted on a Windows 2008 R2 Server provided by Rackspace, and we have a staging server of the same spec which we can bring up and take down as and when needed to save costs, we do this by snapshotting an image of the staging box and backing this up to storage, again provided by Rackspace.

I have discussed the application in a previous post here but from the point of view of MongoDb schema design, in summary it is a responsive web application for Porsche owners and the area of the application that we will discuss in this post is the Knowledge Base. 

As a note when I first published this article my previous employer rang me up to complain about my use of the name Porsche (as if somehow mentioning this was a sin?!) so I amended the article at their request, with hindsight I feel that I should have just stuck to my guns, after all I did the work and have a right to discuss it, so long as I don't give away trade secrets, anyway rant over!

Knowledge Base

This is an area where owners can access useful information about their cars as well as ask, search and bookmark answers to questions asked either by themselves or other owners. There is an administrative interface to the knowledge base so that Car staff can respond to and manage questions, however we are only concerned with the user or owner side of the knowledge base for now, and in particular the landing page for this area that shows recently answered questions as well as controls to search for answers and ask a question.

Initial Request

To enter the knowledge base area from the site navigation the user requests the page /knowledgebase/findanswers this request is routed to the FindAnswers method of the KnowledgeBaseController by the "Default" routing rule shown below, this will be familiar to you if you have worked with ASP.NET MVC as it is the out-of-the-box routing rule set up for you in the project template.

public class RouteConfig
{
    public static void RegisterRoutes(RouteCollection routes)
    {
        routes.MapRoute(
            name: "Default",
            url: "{controller}/{action}/{id}",
            defaults: new { controller = "Home", action = "Index", id = UrlParameter.Optional },
            namespaces: new string[] { "CarPlus.Web.Controllers" }
        );

    }
}

As you can see we have added our own defaults and namespaces properties to the route to catch requests for controllers and/or actions that we do not support, in this instance the request would be routed to HomeController.Index() and handled there appropriately.

The Controller

Back to our example, if we take a quick look at a snippet of our KnowledgeBaseController we can see that the Index() method simply redirects to the FindAnswers() method.

public class KnowledgeBaseController : JsonController
{
    private readonly IKnowledgeBaseService _knowledgeBaseService;

    public KnowledgeBaseController(IOwnerService ownerService,
                                                          IKnowledgeBaseService knowledgeBaseService)
        : base(ownerService)
    {
        _knowledgeBaseService = knowledgeBaseService;
    }

    public ActionResult Index()
    {
        return RedirectToAction("findanswers");
    }

    public ActionResult FindAnswers()
    {
        return View();
    }
}

To digress briefly you can see that our KnowledgeBaseController inherits from JsonController, this is the parent controller for all of the owner accessible areas of the site and it provides a method to return Json encoded data, which is used by the various AJAX requests that are made by our Knockout.js ViewModels, but more on that later. Our JsonController is shown below for reference. 

[NoCache]
public abstract class JsonController : AuthController
{
    public JsonController(IOwnerService ownerService)
        : base(ownerService)
    { }

    protected override JsonResult Json(object data, 
                                                              string contentType, 
                                                              System.Text.Encoding contentEncoding, 
                                                              JsonRequestBehavior behavior)
    {
        return new JsonNetResult
        {
            Data = data,
            ContentType = contentType,
            ContentEncoding = contentEncoding,
            JsonRequestBehavior = behavior
        };
    }
}

As you can see this in turn inherits from AuthController and passes a reference to the OwnerService to this so that this can manage user and session data, but I'll detail more on that in another post some time. 

Getting back to the point, as we can see in the KnowledgeBaseController code above, the FindAnswers() method simply returns the relevant view, by convention this is the view in our project /Views/KnowledgeBase/FindAnswers.cshtml

The View

The key parts of the view FindAnswers.cshtml are shown below. I have edited this code for clarity to just show the parts relevant to our use of knockout, for example I have removed any reference to classes and some of the non-semantic markup needed for layout and styling purposes. What is left should just be what I need to explain to you how the knockout bindings work.

FYI the views are written using the Razor view engine that is part of ASP.NET MVC, again if you have used ASP.NET MVC you should be familiar with this, although having said that there is precious little we are using from this engine in the sample code I show below, bonus points if you spot the Razor syntax!

<div id="knowledge-base-wrapper">

    <section id="searchResults">

        <div data-bind="visible: showFindAnswers">
            
            <h1>Knowledge Base</h1>
            
            <div id="search-results">

                <h2>Recent questions</h2>
                <p data-bind="visible: recentAnswers().length === 0">

                There are currently no recently answered questions.</p>

                <ul data-bind="foreach: recentAnswers">
                    <li data-bind="visible: answers().length > 0">
                        <h3 data-bind="text: questionText"></h3>
                        <ul>
                            <li>
                                <div data-bind="foreach: answers">
                                <span data-bind="text: dateTime"></span>
                                <a href="#" data-bind="click: $root.createBookmark.bind

                                ($root, $data), css: { selected:  isBookmarked}">

                                Bookmark this answer

                                </a>
                                <h4>Answer</h4>
                                <p data-bind="text: answerTextSummary"></p>
                                <button data-bind="click: $root.selectAnswer.bind

                                ($root, $parent, $data)">

                                Read More

                                </button>
                            </li>
                        </ul>
                    </li>
                </ul>
            </div>
        
        </div>
        
    </section>
    
</div>


@section scripts {
    <script type="text/javascript">
        $(document).ready(function () {
            ko.applyBindingsWithValidation

            (new window.CarPlus.ViewModels.KnowledgeBaseViewModel

                     ("@ViewBag.URLSiteRoot", 'findAnswers'),

                     document.getElementById("knowledge-base-wrapper")

            );
        });
    </script>
}

There is more in this view, for example we have a search function as well as the ability to submit a new question to the knowledge base, however for our purposes we are going to focus on how the recently answered questions are loaded into the DOM by knockout post page load.

As you can see at the bottom of the view code is the scripts section (Razor syntax, plus one if you spotted it). I have removed the references to our scripts for clarity so we can concentrate on the important part which is where we bind our knockout view model to the DOM. As you can see the ko.applyBindingsWithValidation call does this, and contains another bit of Razor syntax which is to pass in the site root or domain to the ViewModel constructor from a ViewBag property, (plus two if you spotted both). 

The ViewModel

Below is a snippet of our ViewModel, with just the relevant parts to this article shown to keep this easier to follow;

window.CarPlus.ViewModels.KnowledgeBaseViewModel = function (siteRoot, dataContext) {
    var self = this;
    self.siteRoot = siteRoot;

    // Observables

    self.recentAnswers = ko.observableArray([]);
    self.showFindAnswers = ko.observable(true);
    self.currentSection("knowledge-base");

    self.search = ko.observable().extend({ required: { message: 'Please enter a search term.'} });

    // Load view model data.
    if (dataContext == 'findAnswers') {

        self.getRecentAnswers();
    }
    if (dataContext == 'getMyQuestions') {
        self.getMyQuestions(true);
    }
    if (dataContext == 'getMyBookmarks') {
        self.getMyBookmarks();
    }
    // Set the ko validation group if necessary.
    if (ko.validation) {
        self.Errors = ko.validation.group(self);
    }
};

As you can see the constructor of the VieWModel takes two parameters, the first is the siteRoot which is the environment specific URL of the site domain, this allows us to alter the domain for the various AJAX calls that the client side code makes depending on whether we are in development, staging or live environments. The second parameter is the dataContext, the context in which we are using the ViewModel.

This ViewModel is used on three different pages of the application, this parameter allows the ViewModel to load data and model behaviour relevant to the page the user is viewing. For our purposes we are concentrating on the FindAnswers page so our dataContext is 'findAnswers'. 

Just a quick thought on this, it may be tidier to split the ViewModel into three, one for each page, rather than have a single ViewModel tailor its data and behaviour for multiple pages. This would be an ideal candidate for refactoring if we had the time to revisit this code. As it stands we have a large and complex ViewModel in use across multiple pages, which although it keeps code in one place, makes it more difficult to manage in the long term as changes can affect multiple pages so require greater testing.

Namespacing

Back to our explanation, as our dataContext is 'findAnswers' we can see that the first thing our ViewModel does after initialising various observables/observableArrays is to call 'getRecentAnswers'. This function is declared against a namespace that we have added to the window object, in the same way that the ViewModel itself is namespaced. Our namespaces are managed in a javascript file imaginatively called CreateNamespaces.js which ensures that the various namespaces we use exist, a brief snippet of this is shown below relating to our case study here; 

window.CarPlus = window.CarPlus || {};
window.CarPlus.Utilities = window.CarPlus.Utilities || {};
window.CarPlus.ViewModels = window.CarPlus.ViewModels || {};
window.CarPlus.KnowledgeBase = window.CarPlus.KnowledgeBase || {};
window.CarPlus.KnowledgeBase.Models = window.CarPlus.KnowledgeBase.Models || {};
window.CarPlus.KnowledgeBase.Constants = window.CarPlus.KnowledgeBase.Constants || {};
window.CarPlus.KnowledgeBase.DataServices = window.CarPlus.KnowledgeBase.DataServices || {};

Function and Promise

The ViewModel defines the function getRecentAnswers, which can be seen below; 

window.CarPlus.ViewModels.KnowledgeBaseViewModel.prototype.getRecentAnswers = function () {
    var self = this;
    self.recentAnswers([]);
    var promise = window.CarPlus.KnowledgeBase.DataServices.getRecentAnswers

    (this.siteRoot, self.recentAnswers, self.answerTextSummaryLength);
    promise.done(function (data) {
    });
    promise.fail(function () {
        toastr.error('Sorry, there was a problem getting recent answers.');
    });
    promise.always(function () {
    });
};

This uses jQuery promise() to call into the knowledge base data service, which requests the most recently answered questions from the server. The observable array self.recentAnswers is passed into this function and is populated by the function when the promise is resolved. As this observable array is bound to the DOM once it is populated the DOM element(s) bound to the contents of this array are updated and the recently answered questions are displayed for the user of the application.

If there is a failure, that is if the promise is not resolved by the data service, we use a javascript library called toastr to inform the user. I highly recommend toastr, its a tiny library, easy to use and a nice user experience, also the demo uses a quote from The Princess Bride so the author has great taste in movies. We planned to use it only in development but the client really liked the feedback toasts so it ended up in the finished application.

DataService and Deferred

The data service getRecentAnswers function is shown below for reference. The data service functions are responsible for resolving (or not if something goes wrong) the promises made by the view model functions, so we can see the use of jquery Deferred() here to keep or break the promises.

This also makes use of our constants, which are defined in a separate javascript file, again imaginatively called 'Constants.js'. This basically holds the controller names for the various parts of the application so as to allow the data services to build up the urls they need to call to get and post data.

getRecentAnswers = function (url, recentAnswersObseravble) {
    var deferred = $.Deferred();
    window.CarPlus.DataServices.AjaxHelperService.ajaxRequest

    ("get", url + window.CarPlus.KnowledgeBase.Constants.url + 'getRecentAnswers')
        .done(getSucceeded)
        .fail(getFailed);

    function getSucceeded(data) {
        $.map(data.recentQuestions,
            function (item) {
                if (item.Answers) {
                    var question = new window.CarPlus.KnowledgeBase.Models.Question

                    (item.Id, item.Text, item.DateTime, item.NiceDateTime);
                    recentAnswersObseravble.push(question);
                    if (item.Answers && item.Answers.length > 0) {
                        var answers = $.map(item.Answers, function (answer) {
                            return new window.CarPlus.KnowledgeBase.Models.Answer

                            (answer.Id, answer.Text, question.id, question.text, answer.Public, answer.IsBookmarked);
                        });
                        answers.forEach(function (answer) {
                            question.answers.push(answer);
                        });
                    }
                    return question;
                }
            });
        deferred.resolve(recentAnswersObseravble);
    }

    function getFailed() {
        deferred.fail();
    }

    return deferred.promise();
}

For reference the part of Constants.js we are concerned with looks like below; 

window.CarPlus.KnowledgeBase.Constants = function() {
    return {
        url: '/knowledgebase/',
        iamgeUploadLimit: 2000,
        imageUploadTypes: ['png', 'jpeg', 'bmp', 'tiff', 'jpg']
    };
}();

Server Side

So what does the request for recent answers look like when we hit the server? If you have been following carefully (or if I have explained this well, the jury is out on this latter part so let me know in the comments if any of this makes sense please!) the data service will be calling the following url for its recent answer data; <domain>/knowledgebase/getRecentAnswers.

This is routed to the GetRecentAnswers() method of our KnowledgeBaseController via the ASP.NET MVC routing rule we have in place and this action looks like this; 

[HttpGet]
public JsonResult GetRecentAnswers()
{
    var recentQuestions = _knowledgeBaseService.GetNRecentlyAnswered

                                        (_numberOfRecentlyAnsweredQuestionsToFetch, true);

    return Json(new { recentQuestions = recentQuestions });
}

The private variable _numberOfRecentlyAnsweredQuestionsToFetch is defined in the controller and is set by a configuration setting so it can be easily changed without having to rebuild the server side .NET code. At present this is set to 2.

The KnowledgeBaseService is injected into the controller through our use of the Windsor IOC container and constructor dependency injection. The service implements the interface IKnowledgeBaseService and the concrete class for this is declared in our container, which itself is initialised on Application_Start() in Global.asax.cs. 

Service

The service call looks something like this; 

public List<Question> GetNRecentlyAnswered(int maxResults, 
                                                                          bool hideRemoved)
{
    // get top N answers ordered by datetime
    var answers = _answerRepository.GetNRecent(maxResults);

    // use these answers to get questions (by question id against answer)
    var questionIds = new string[answers.Count];
    for (int i = 0; i < answers.Count; i++)
    {
        questionIds[i] = answers[i].QuestionId;                
    }
    var questions = _questionRepository.GetByIdArray(questionIds, hideRemoved);

    // sort the questions retrieved by the answers associated with them (descending).
    var sortedQuestions = questions
                            .OrderByDescending(q => q.Answers
                                .Where(a => a.Public == true)
                                .Max(a => a.DateTime))
                            .ToList();

    return sortedQuestions;
}

As you can see the service makes use of repositories, as we have a nice separation of concerns in our application. These repositories are declared against their interfaces in the service constructor and as with the controller our container is responsible for tying these interfaces to their implementation on application start, when the service is constructed the dependant repositories are injected and available for use by the service.

Repository

Just to follow this through then, the repository calls used by the service above are shown below. Please note the nice syntax of the MongoDb C# driver, I can't praise these drivers enough, they make working with Mongo in C# a breeze.

public List<Answer> GetNRecent(int maxResults)
{
    var query = Query.And(Query.EQ("Public", true), Query.EQ("Removed", false));
    var result = collection.FindAs<Answer>(query)
                                .SetSortOrder(SortBy.Descending("DateTime"))
                                .SetLimit(maxResults);

    return result.ToList();
}

public List<Question> GetByIdArray(string[] questionIds, bool hideRemoved)
{
    if (!hideRemoved)
    {
        var query = Query.In("_id", new BsonArray(questionIds));
        var result = collection.FindAs<Question>(query);
        return result.ToList();
    }
    else
    {
        var query = Query.And(
                        Query.In("_id", new BsonArray(questionIds)), 
                        Query.EQ("Removed", false));
        var result = collection.FindAs<Question>(query);
        return result.ToList();
    }
}

One final point to note about our repositories, regarding tying these to the Mongo database and collection. If we look at the constructor of our AnswerRepository we can see that we declare and initialise private variables for our Mongo database (or document store, although the driver refers to this as a database, perhaps to make it easier for developers new to NoSQL?) and collection and then set these via configuration settings. There has to be a better way to tie these together yet maintain some configuration over this, perhaps I could hook something up to the container, any suggestions or thoughts on this are most welcome.

public class AnswerRepository : IAnswerRepository
    {
        private MongoDatabase database;
        private MongoCollection collection;

        public AnswerRepository()
        {
            string connectionString = ConfigurationSettings.AppSettings["mongo.connstr"] as string;
            var server = MongoServer.Create(connectionString);
            
            string dbname = ConfigurationSettings.AppSettings["mongo.dbname"] as string;
            database = server.GetDatabase(dbname);

            string collectionName = ConfigurationSettings.AppSettings["mongo.collection.answers"] as string;
            collection = database.GetCollection<Answer>(collectionName);
        }
           
    ...
}

POCO

As you will have seen in the repository and service code above we deserialise the Mongo JSON (or BSON) data into .NET objects. We refer to these as POCO, which I believe stands for Plain Old CLR Objects, as that is what they are. They're not tied to any ORM as such (other than our implicit Mongo code which can be thought of as an ORM of sorts, albeit a very basic one), they are simply objects we use to work with data in our application.

We have two main objects, Question and Answer. One question can have many answers, so we have a nested document/object structure. This relates nicely back to my previous blog post on this application and issues around schema design for MongoDb, have a read here if you're interested.

Below are the classes for our Question and Answer POCOs for reference. As you will see the use of POCO may seem odd as we have the data in JSON and we ultimately pass it to the ViewModel in JSON but it does allow us to add a few nice things such as custom constructors so we can create new objects from posted data (or from stored data, for example if we want to duplicate or clone an object) as well as helper functions, so in our Question object we have a NiceDateTime() method to provide us with formatted date strings for use in the JSON we pass to the client.

namespace CarPlus.Models.POCO
{
    public class Question : Entity
    {
        public Question()
        { }

        public Question(string jsonData)
        {
            ...
        }

        public Question(BsonValue bsonValue)
        {
            ...
        }

        public string Id { get; set; }
        public int OwnerIdAsked { get; set; }
        public int OwnerIdAnswered { get; set; }
        public DateTime? DateTime { get; set; }
        public string Text { get; set; }
        public string Model { get; set; }
        public string Area { get; set; }
        public int Views { get; set; }
        public bool Approved { get; set; }
        public bool Referred { get; set; }
        public bool Archived { get; set; }
        public bool Removed { get; set; }
        public bool Replied { get; set; }

        public Answer[] Answers { get; set; }
        
        #region Helpers

        public string NiceDateTime
        {
            get
            {
                if (DateTime == null)
                {
                    return "";
                }
                else
                {
                    return DateTime.Value.ToString("HH:mm dd-MM-yyyy");
                }
            }
        }

        #endregion
    }

    public class Answer : Entity
    {
        public Answer()
        { }

        public Answer(string jsonData)
        {
            ...
        }

        public string Id { get; set; }
        public string QuestionId { get; set; }
        public int OwnerId { get; set; }
        public DateTime? DateTime { get; set; }
        public string Text { get; set; }
        public string[] Tags { get; set; }
        public int Views { get; set; }
        public bool Public { get; set; }

        public bool IsBookmarked { get; set; }

    }
}

An Odd Marriage

Once we have the data we need in POCO form from Mongo we then serialise this back to JSON (via the Json() Method of the JsonController) and return it to the ViewModel on the client who then consumes this data and binds it to the DOM. So we have an odd marriage of technology here in that we store data as JSON in Mongo and work with data as JSON in Knockout but in the middle we have to deserialise to objects to work with the code in C# and then re-serialise to JSON for the ViewModels. 

We decided to use .NET as it played to our strengths as a development team and allowed us to get started quickly, however if I had my time again I would look at using another platform for the web application, in particular I would use a dynamically typed and interpreted language, such as python or maybe node rather than a statically typed and compiled language like C# as this fits better with the data we store and the way we manipulate this in the UI.

I doubt whether this will ever be re-written, and its no real problem as the application works well and can be supported by the development team as is. Further to this the use of ASP.NET MVC and C# was fun and as stated above the C# drivers for Mongo are great, certainly much better than the python drivers I have used briefly on other projects. Another benefit is that we can add value to the data and application via the POCO in the form of helpers and constructors, so its not a bad marriage, just perhaps a little odd but with nice benefits.

ViewModel Models

So we serialise the POCO back to JSON and return this to the Data Service, which then satisfies the promise made by the ViewModel. The data returned from the Data Service is a representation of the POCO but in a form that can be easily bound to the DOM by the ViewModel, to demonstrate this the Question and Answer models we work with client side are shown below;

window.CarPlus.KnowledgeBase.Models.Question = function (id, text, dateCreated, niceDateTime) {
    var self = this;
    self.id = id;
    self.questionText = text;
    self.dateCreated = moment(dateCreated).startOf('day').fromNow();
    self.niceDateCreated = niceDateTime;
    self.answers = ko.observableArray([]);
};

window.CarPlus.KnowledgeBase.Models.Answer = function (id, answerText, questionId, questionText, isPublic, isBookmarked) {
    var self = this;
    self.id = id;
    self.questionId = questionId;
    self.questionText = questionText;
    self.answerText = answerText;
    self.isBookmarked = isBookmarked;
    self.public = ko.observable(isPublic);
};

As you can see one Question can have many Answers, in our Question model above this is represented by the observabe array 'answers'. Also if you compare this to our Question POCO you can see we only map the properties that we want to use in the DOM and our ViewModel operations, so some properties of the POCO may be ignored when we return data via the Data Service. One quick point to note is that on our Question model above we make use of a library called moment.js which is a great way to display blog like time stamps (1 day ago, 10 minutes ago, etc), we also make use of our POCO helper method value NiceDateTime so that we have both dates available in useable formats should we wish to use these to display to the user.

The DOM

If we now return to our DOM we can see that we have a <ul> bound to the observable array 'recentAnswers' as well as a <p> which is only visible if we have some recent answers to show. The 'recentAnswers' observable array is an array of window.CarPlus.KnowledgeBase.Models.Question objects, and each of these has a property called 'answers' which is itself an array of window.CarPlus.KnowledgeBase.Models.Answer objects, so this allows us to show a list of questions and for each question, the related answers, as shown below;

<div id="search-results">
    <h2>Recent questions</h2>
    <p data-bind="visible: recentAnswers().length === 0">

      There are currently no recently answered questions.</p>

    <ul data-bind="foreach: recentAnswers">
        <li data-bind="visible: answers().length > 0">
            <h3 data-bind="text: questionText"></h3>
            <ul>
                <li>
                    <div data-bind="foreach: answers">
                    <span data-bind="text: dateTime"></span>
                    <a href="#" data-bind="click: $root.createBookmark.bind($root, $data),

                                                         css: { selected:  isBookmarked}">Bookmark this answer</a>

                    <h4>Answer</h4>
                    <p data-bind="text: answerTextSummary"></p>
                    <button data-bind="click: $root.selectAnswer.bind($root, $parent, $data)">Read More</button>
                </li>
            </ul>
        </li>
    </ul>
</div>

For each question (recentAnswer) we display the question text and then we iterate through each answer to this question and display these answers within a nested <ul>.

You may have noticed that we have some extra functionality available to the user here, for each answer the user can create a bookmark, or select the answer (to show a detailed view), these are bound to the ViewModel functions createBookmark and selectAnswer respectively. These are worth a quick mention as they help to demonstrate binding context.

Root and Parent

As the elements that the createBookmark and selectAnswer functions are bound to are nested within a foreach binding the context of these functions needs specifying. If we simply had $createBookmark.bind() we would be trying to bind to the createBookmark function of an Answer, which does not exist so would throw an error.

What we need to do is ensure that we bind at the right level, hence the use of the $root prefix to these function bindings. We could use $parent here, however this would also fail, as we are nested within two foreach bindings we need to use $root which places us at the root of the ViewModel, rather than simply the $parent of the Answer, which in our case would be the Question and again would fail as Question has no function called createBookmark.

The knockout binding context is discussed in more detail here and is pretty self explanatory, however it has caught me out a few times during development so its well worth reading and remembering.

Summary

To summarise then, we have seen how we can structure an application using ASP.NET MVC, Mongo and Knockout and we have followed a use case (to show recently answered questions in our knowledge base) from initial request to response, through the ViewModel initialisation, AJAX data requests and responses and finally data binding to the DOM and display of the results.

Along the way I have thrown in a few random thoughts and plenty of snippets of code to demonstrate how it all hangs together, I hope this has been interesting and also maybe of some use to someone out there working on a similar application. As always your comments and thoughts are most welcome and finally thanks for reading this somewhat long winded post.

Oh and as I wrote this post I learnt a couple of things, a) I may need to split my rambling posts into smaller chunks and b) I need to figure out how to better format code snippets, perhaps some custom CSS will help. Anyway enough for now.

Cheers,
James

 

MongoDb Schema Design by James Heywood

Before we get started, if you haven't heard of MongoDb or are unfamiliar with document stores (as opposed to relational databases) have a quick read through the MongoDb site as a primer for this post here; All done? Good then as long as we're all sitting comfortably I'll begin.

So recently, (well I say recently, it was actually a few months ago) I worked on a customer engagement platform for a high end car manufacturer (Porsche) which I may or may not be at liberty to discuss, so let's just call them Car, to avoid any litigation from previous employers and third parties. The application is known as CarPlus.

As a note when I first published this article my previous employer rang me up to complain about my use of the name Porsche (as if somehow mentioning this was a sin?!) so I amended the article at their request, with hindsight I feel that I should have just stuck to my guns, after all I did the work and have a right to discuss it, so long as I don't give away trade secrets, anyway rant over!

Overview

This platform took the form of a responsive web application which selected Car owners were invited to (it is in the second phase of invites now, hopefully it will roll out to all UK owners soon) so they could gain access to knowledge and information about their car(s) and be a central place to store useful information about their vehicle(s) such as MOT, insurance and warranty details amongst other things.

The site took a mobile first design, initially the css/html framework was hand rolled by my front-end developer colleagues at Bite, for the second phase this was replaced with a customised implementation of bootstrap by the front-end developer from Brilliant Noise, one of three partner agencies in the project, the final partner being Endless a design agency who put the branding, look and feel together.

I could describe how the project was organised and the challenges we faced collaborating across three agencies, but this would be a post in its own right and for now I would rather concentrate on the technical side of things, in particular MongoDb and my experience with this technology on this project.

Technology

To summarise the technology, we decided to use a .NET platform as Bite at the time was primarily a .NET shop. Most of the sites we built used Castle Monorail and were variations on a core custom CMS built around Monorail, Windsor, Active Record & NHibernate, however for the CarPlus project we decided to try our hand with Microsoft ASP.NET MVC 4 and the Entity Framework. 

Shortly after development started we decided to swap Entity Framework for MongoDb as it seemed a good fit for the application. Read and write performance was important as was the ability to easily mutate the schema and scale out the back end given that the initial project was a bit of a proof of concept and could easily change direction in terms of data storage and scope. 

The final part of the stack was knockout.js which we decided to use as a way to provide a rich user interface that could optimise the data interface between the client and server. Further to this as the application would primarily be used on mobile devices having an optimal way to pass data back and forth and minimise the need for page refreshes was a key requirement, something that knockout deals with nicely, once you get to know it. If you haven't used knockout then give it a whirl, they have some excellent tutorials on their site.

Back to the project. One point to note is that having a .NET site between the client side knockout and MongoDb document store was a good fit for our skill set at Bite but from a technical point of view seemed a bit of an odd marriage. We store data as JSON (or more accurately BSON) in Mongo then deserialise this to concrete objects in .NET to work with in the server side code, then we serialise this back to JSON and pass it to the client for knockout to use. So we have this odd change from dynamic to static typing and then back again, which although odd actually works really well especially because the C# driver for Mongo is really nice to work with. There's probably a post in here somewhere, perhaps I'll get round to supplying some example code to show the path of data from the front to back end and vice-versa.

Schema Design

For now though let's look at MongoDb, specifically the way I designed the schema for the application and the pros and cons of my decisions.

This post is partly inspired by a MongoDb Webinar I attended last week, the 2nd in a series of 8, this last instalment was on the subject of schema design and issues to consider when designing an application, something I could have done with last year but you live and learn. It was interesting hearing from the guys at Mongo about this topic as it helped validate some of our thoughts and decisions on this subject as well as highlight some of the down sides. You can see the slide shows of the previous sessions and sign up for the rest of the series here; and I urge you to do this as they are really interesting and will help you if you plan on using MongoDb or indeed if you are already using it.

Anyway back to CarPlus, the application had several key user journeys identified with the main area of the application centring around the owners cars. This became known as the 'Vehicle Manager' and it allows users to swipe between the cars they own (oh to have more than one high end sports Car!) and see related information such as when their MOT is due, images of their vehicle and their preferred and/or nearest service centre amongst other things.

To structure this data in a traditional RDBMS I would follow the methods I was taught at University and identify the entities and relationships involved and produce an ERM to hold the data in a normalised structure, 3rd/4th normal form, no repeating groups, primary and foreign keys, referential integrity and all that jazz. This would be fine for a traditional transactional based system, however it would not be an optimal structure for the needs of this application and as we were taking data from the client's system that would remain the primary store for managing this data, we had no need to model and maintain a highly transactional database.

Enough of the background, what we ended up with was four main collections of documents,

  • Owners 
  • Vehicles 
  • Documents (Tax, MOT etc.) 
  • Centres

Owners and Vehicles have a variety of properties as well as nested sub-documents to allow for flexible access of data in the context of both owners and vehicles. Documents are related to vehicles and owners and have keys to allow us to relate the data to the relevant car and person. Centres are basically lookup data for use in the application.

A brief structure of an Owner document is shown below; 

The Owner Document

> db.owners.findOne()
{
        "_id" : 12345678,
        "Title" : "Mr",
        "FirstName" : "James",
        "LastName" : "Heywood",
        "Email" : "jdheywood@yahoo.co.uk",
        "EncryptedPassword" : "01a3089435c0394fc8653af0c65f81ef",
        "Address1" : "123 Some Street",
        "City" : "Brighton",
        "Postcode" : "BN1 1AB",
        "MobileNumber" : 07999999999,
        "VehicleCount" : 1,
        "Vehicles" : [
                {
                        "_id" : "16a1902b-1570-413e-b30b-ec734181a8e6",
                        "VIN" : "AB9AAA99AAAA99999",
                        "Model" : "Car Turbo",
                        "ModelCode" : "XXXXXX",
                        "Registration" : "ABC 123",
                        "CurrentMileage" : 50000,
                        "RegDate" : ISODate("2014-02-09T09:00:00Z"),
                        "Documents" : [
                                    {
                                            "_id" : "7da032cf-3bc7-41d8-8fd1-370a13fd11cb",
                                            "Type" : "Tax",
                                            "Name" : "Car Tax",
                                            "ValidFrom" : ISODate("2012-01-04T22:12:01.683Z"),
                                            "ValidTo" : ISODate("2013-01-04T22:12:01.683Z"),
                                            "Band" : "M"
                                    },
                                    {
                                            "_id" : "4619c928-c3e2-424e-a27e-7e3b00345136",
                                            "Type" : "Insurance",
                                            "Name" : "Car Insurance",
                                            "ValidFrom" : ISODate("2012-01-04T22:12:01.683Z"),
                                            "ValidTo" : ISODate("2013-01-04T22:12:01.683Z"),
                                            "Provider" : "Direct Line",
                                            "ReferenceNumber" : "REF-NO",
                                            "ContactNumber" : "0800 123 456",
                                            "Premium" : 450,
                                            "Excess" : 150,
                                            "YearsNoClaims" : 5
                                    }
                        ]
                }
        ],
        "LastServiceCentre" : {
                "_id" : ABC123,
                "Name" : "Car Centre Brighton",
                "Code" : "0987654321",
                "Address1" : "Address line one",
                "City" : "Brighton",
                "Postcode" : "BN1 1AB",
                "Phone" : "01273 123456",
                "Loc" : [
                        -0.14179,
                        50.822959
                ]
        },
        "VisitedCentres" : [ "0987654321", "1234567890", "2314568790", "02847462919" ]
}


As you can see an Owner has a collection of Vehicles, each Vehicle within an Owner has a collection of Documents which are populated by the user via the application, to help bring together useful information. 

Vehicles are duplicated in their own collection as I initially thought having another; flatter collection of this data would be useful to the application, more on this in a moment.

Documents are also duplicated in their own collection again for the same reason, flexibility of data access. 

The Owner has a property LastServiceCentre which is itself a nested document, again we have a separate collection of Centres for lookup purposes, the one held against the owner being the last place the owner visited.

Relation

As well as the last visited Centre we can also see that the Owner has an array of visited Centre identifiers. This is an example of a relationship, each identifier in this array relates to a Centre in the Centres collection. It should be noted that there is no referential integrity, these identifiers are managed by the application and could be any strings as far as Mongo is concerned.

My reason for storing these as identifiers is that if we kept each Centre visited by the Owner as a nested document this would result in a large amount of duplicated Centre data inside the documents of the Owner collection. I am prepared to accept a certain level of duplication in relation to the last visited Centre as this information is of particular use to the application, however information on all of the Centres ever visited is of less use so I have no need to tolerate duplication, in this case a set of identifiers will suffice and our application will just have to look these up if and when needed.

Nesting

This nested approach to the Owner data allows us to access data not just about the Owner but also related items such as Vehicles, Documents and Centres in a single request for data from the store. In a relational system we would have to make multiple requests for data from different tables to piece together the whole picture of the owner. Nesting data in this way reduces the chat between client and server, however it also means that data requests can be heavier than with a more piecemeal approach unless you are careful to project only those properties you need in your queries. 

In our case, as we access data about the Owner when they log in having this data in one document is a good thing, fewer requests makes for a more performant application which in turn keeps the user happy and aids in the adoption and use of the application. This is especially important considering that this is a mobile application first and foremost, the fewer page loads and data requests the faster and slicker the application feels and the better the user experience.

Duplication

As well as nesting Vehicles within Owners I also decided to maintain a separate duplicate collection of just Vehicles. My thinking at the time was that this would provide an alternative way to access the data as initially the requirements of the system were a little vague due to the fact it was a prototype. With hindsight this was a bad decision, the duplication of this data in itself is not really an issue as space is cheap and the size of the documents is minimal, the real issue is in the maintenance of this data. If a Vehicle needs updating it has to be changed in two places, which adds complexity to the application that it could do without. Further to this as we have had several developers on the project there is a risk that data is accessed and managed from one source or the other but that it is not kept in synch, therefore a review of the code is necessary.

With a separate Vehicles collection we do gain some advantage from simpler queries, for example if we want to identify all Vehicles of a particular model registered this year, (for an administrative purpose rather than for the use of the Owners) having the Vehicle collection allows us to write a simpler query than if we only had this data nested within Owners. I'm not convinced that this outweighs the overhead of maintaining the data in two collections though at the present time. Perhaps if the data analysis requirements expand in a later stage it will be more useful, but there is wisdom in coding for the here and now rather than for what might happen or may be required further down the line. If I had the chance to revisit this I would remove the Vehicles collection even though it may well be a bit of a tricky task at this stage.

Similarly the duplication of Documents has only really added unnecessary complexity to the system, we only ever access this data in the context of a specific Vehicle so there is no real benefit to storing this data in its own collection, again if I had the time to revisit and refactor I would definitely remove the Documents collection altogether and ensure access is only via the Owner.

Lookup

The Centre collection is of real use as a lookup, particularly due to the fact we have geolocation data (lat and long). In order to use Mongo's geospatial indexing and querying this had to be stored in a certain format, at the time (prior to version 2.4 of MongoDb) this was an array of two values [ x, y ] or more accurately [ longitude, latitude ]. This is now known as a legacy coordinate pair as the latest version of MongoDb supports GeoJSON format data which I have yet to have a play with. The geospatial query allows us to find the nearest Centres to the Owner by passing the lat and long of the user from their browser, provided their browser supports the W3C geolocation specification.

Summary

So to wrap this up this whole post is a rather long winded way of saying that duplication of data is fine as long as you are aware of the overhead(s) this presents to you when managing this data. If you would rather not have the trouble of managing more than one set of data then you can avoid this by nesting documents.

Relating collections is another approach you may wish to take, however this doesn't really play to Mongo's strengths and can feel like a safety net for those of us familiar with RDBMS who are perhaps a little nervous about making the leap to a document store.

I would strongly recommend that you consider the access paths to your data, what do you need and when? Further to this consider how many requests you want to make to read data and if appropriate write data, does your application need to worry about being chatty or not? The answers to these questions will dictate the best approach, nesting and/or relating data. For our purposes we have both although as I mention above we could and probably should do away with the duplicate collections and simply nest our documents to keep our code clean and maintain DRY principles.

And finally refactoring and revisiting code is a luxury, one which I suspect very few of us have in our day jobs. We often build up a technical debt that rarely gets paid off until the project rolls back around for an update. The more consideration we have for architecture and design up front the less debt we should incur.


Tl;dr: nesting is good, relating may be of use, duplication can cause headaches, code for access paths and requirements known now, refactor later.


Thanks for reading, next time I'll try and document some of the code used in a request/response cycle in the CarPlus application and discuss some of the challenges, wins and fails we had developing this.

Cheers,

James