Monthly Archives: September 2012

IndexTank–Search Engine

You might not have heard of IndexTank, it started life as a SaaS startup that provided a nice RESTful search API. LinkedIn purchased the company and closed it down, but they did the right thing and open sourced both the search engine, but also the back end tools to host the engine to multiple clients. A couple of great companies then started offering IndexTank as a service both Searchify and IndexDen.

The IndexTank search engine uses Lucene under the covers and it supports all the normal search options (AND OR etc). The big thing with IndexTank is that once you call the API to add text, it becomes available in the search results. If you’ve played with SharePoint, you’ll know that you need to wait for the crawler to run on its schedule. Having the results added to the search index instantly is great.

The engine supports categories which again in the SharePoint terminology would be called ‘refiners’ and the engine also supports adding other meta-data.

We use IndexTank for LearnWeaver, here you can see the categories on the right (images, Video, Blog post, Canvas)

image

Each of the items has meta-data associated, such as the URL, image location and the item type.

IndexTank has a number of client libraries, including Ruby, .NET, python and Java.

I looked at Elastic Search for LearnWeaver, but found that IndexTank was easier to work with and look after.

ELB SSL Certificates–Setup

Setting up Amazon’s Elastic Load Balancer (ELB) to perform SSL termination is pretty easy. Performing all the SSL overhead on a dedicated device will reduce the processing overhead of your web servers, sure its probably not that much anyway, but it also allows for other flexibilities.

To setup ELB to do SSL termination you will need to have enough information to complete this dialog from the AWS console:

image

The Certificate Name is easy, that’s just what you want to refer to it as. The rest need some work.

The process I follow to generate a certificate is this:

1. Use a site like http://www.namecheap.com/ to purchase a cert. I’ve found that the Comodo Essentail SSL works across all the modern browsers, IE8, Chrome, Firefox, iPad, Andriod (Nexus 7).

2. The site will ask for a CSR (Certificate Signing Request), rather than use IIS I like to use a web tool, such as http://www.gogetssl.com/eng/support/online_csr_generator/ This site will email you the CSR once you’ve entered the relevant information into the form.

3. Upload the CSR into namecheap, they will then process and email you the certificate. It will start with:

—–BEGIN PKCS7—–

4. To use this in Amazon you will need to convert from PKCS7 to PEM format, this can be done with the command line OpenSSL tools, or again you can use an online tool:

https://www.sslshopper.com/ssl-converter.html

5. Now that you have your pem encoded certificate you can add that to the ELB form in the Private key section. Follow the same conversion process for the public key section.

6. If you leave the certificate chain blank, you will find that your SSL mostly works, until you use Firefox or Chrome on Andriod and receive a nice red error screen stating that your site isn’t trusted. You really need to include the Chain. To complete this step you need to download the intermediary certificate from https://support.comodo.com/index.php?_m=downloads&_a=view&parentcategoryid=1&pcid=0&nav=0 depending on the SSL cert you purchased. This file can also be converted to pem format with the online converter.

Time zone challenges

Building a system that is used by users across the world can be challenging when you have to deal with times. Here are a couple of things that I’ve found while developing LearnWeaver.

Always store dates as UTC. Don’t store dates as local time, its so much harder if your dates are in a different format.

Where possible I like to show textual dates to users, like: ‘5 minutes ago’. This doesn’t work in all situations, but generally speaking I think it works well.

You probably want to capturing the users current timezone, this can be done with client side JavaScript:

$("#Timezone").val(new Date().getTimezoneOffset());

 

Here the user’s current TimeZone offset is written to a hidden field, its important to remember that this is an offset, so here in Australia (eastern seaboard) it will return –10 since UTC is 10 hours behind, rather than the usual +10.

Now if with the .NET framework you can find the timezone this offset corresponds to using:

TimeZoneInfo.GetSystemTimeZones()

 

You can then store or make use of the TimeZoneInfo object that matches the BaseUtcOffset.

Displaying a time back to a user in their local timezone is easy:

var timezone = TimeZoneInfo.FindSystemTimeZoneById( "timezone info stored earlier" );

var localDate = TimeZoneInfo.ConvertTimeFromUtc(date, timezone);

If a user has to provide a date in a form (maybe like a blog post), then they will provide that date in local format. You must then convert to UTC (since we store dates in UTC). This can be done by:

var localDate = DateTime.SpecifyKind(userDate, DateTimeKind.Local);

var timezone = TimeZoneInfo.FindSystemTimeZoneById("stored timezone");

var utcDate = TimeZoneInfo.ConvertTimeToUtc(localDate , timezone);

The biggest thing is storing dates using UTC format. Since our servers are hosting using AWS, the default time setting is UTC.

CloudFront Setup

In my last post I mentioned that I used a custom origin with CloudFront, to set this up I used the following c# code and the AWS .NET SDK.

AmazonCloudFrontClient client = new AmazonCloudFrontClient("key", "secret"]);

CreateDistributionRequest request = new CreateDistributionRequest();

request.DistributionConfig = new CloudFrontDistributionConfig(); 
request.DistributionConfig.CustomOrigin = new CustomOrigin(); 
request.DistributionConfig.CustomOrigin.DNSName = "website.com";
request.DistributionConfig.CustomOrigin.HttpPort = 80; 
request.DistributionConfig.CustomOrigin.ProtocolPolicy = OriginProtocolPolicy.HttpOnly; 

request.DistributionConfig.Comment = "Static assets"; 
request.DistributionConfig.Enabled = true; 

client.CreateDistribution(request);

All that’s left now is to configure you assets to be served from the CloudFront URL, which means that your markup must reference the CloudFront address.

With LearnWeaver I use the Squish It project. This tool will minify my JavaScript and css resources. I can use the following in my views:

@MvcHtmlString.Create(
                Bundle.JavaScript()
                   .Add("~/Scripts/somefile.js")
                   .WithOutputBaseHref(LWExtensions.CDNROOT())
        
               .Render("/Content/js/someoutputfile.js"))

The method LWExtensions returns the base url such as https://d206rufdr120az.cloudfront.net by setting it up as a config option I can have different settings for different environments.

CloudFront–gzip Compression with Windows Server

CloudFront is the CDN solution provided by AWS, its really easy to work with. We use it to deliver static assets for LearnWeaver. We have setup a custom origin which points back to our web servers. If CloudFront has a cache miss it will request the resource from our servers and when it does this it will pass the headers of the client’s browser and use them as part of the cache key. So if a browser supports g-zip this will be forwarded onto the web servers which will return the content g-zip compressed. If different headers are passed to CloudFront, say if a client doesn’t support g-zip, then a cache miss will occur and the content will be requested again.

So really the fact that CloudFront doesn’t explicitly support g-zip compression isn’t really too much of an issue if your using a smart origin like your webservers. But here’s the rub with Windows Server 2008: CloudFront supports HTTP 1.0 and the default settings of IIS are to disable compression for HTTP 1.0 clients. So even if your browser sends the proper headers indicating that it supports g-zip, CloudFront will request the resource from your webserver using HTTP 1.0, then your web server will return the resource uncompressed.

Fortunately there is an easy fix. On your webserver, navigate to:

C:\Windows\System32\inetsrv\config\applicationHost.config

Look for the section: <httpCompression>

There will be an attribute: noCompressionForHttp10=”true”

Set this value:  noCompressionForHttp10=”false”

Now your webserver and CloudFront will work like a dream.

LearnWeaver Tech Stack

LearnWeaver is hosted using’s Amazon’s EC2 (Elastic Compute Cloud). We use Elastic Load Balancing (ELB) to do SSL termination and general load balancing to our web servers. All the web servers are in a separate security group. Our web servers are launched from a prebuilt AMI and once running pull down the latest code version via git. Then they can be added to the load balancer.

We make use of Redis, which runs on Linux servers (based on the Amazon Linux image), we make use of Redis’s ability to have a slave, this gives us redundancy should a avaliablity zone go down. We use an EBS volume to write the Redis data file to disk. All of these servers are in a dedicated security group.

We make use of Amazon’s Relational Database Service (RDS) for managing our MySQL database. RDS provides us with so much flexibility over us running and maintaining our own SQL instances. I look forward to the day when I launch read replica’s in multiple availability zones.

We use S3 for storage of our media, while our web assets such as JavaScript and image files are served up using CloudFront. We have a custom origin of our web server, so if CloudFront has a cache miss, it will request the asset from our server. We do employ the use of query strings in our assets which we use as a hash of our asset file, so when we deploy new code we don’t need to invalidate the CDN.

The speed improvement is noticeable with the CDN enabled, having assets be served from Sydney as opposed to the US is dramatic (for us Australian’s that is). There are a few gotcha’s with g-zip compression and headers that need to be setup in order for to be seamless, but once configured we haven’t had any issues.

We use Route 53 for DNS, again the ability to resolve our domain quickly and locally is the main goal here.

State of things

In March I started working on LearnWeaver which is an eLearning tool for educators, students and parents. It has a lot of features, some of which I would like to dedicate an entire blog post to. But the important thing for me is that it has given me an opportunity to work on a modern web application again.

I’ve been able to dive deep into the latest JavaScript frameworks and see the power of the modern browser.

I’ve been able to work with ‘cloud’ technologies such as AWS, little things like setting up CloudFront and delivering my minified and compressed assets to a CDN is exciting to me.

I’ve been working with data storage systems such as Redis and MongoDB along with traditional databases such as MySQL.

After spending the past six months working on a modern web application, I’m feeling pretty happy about the state of the web. It’s a very exciting time to be a developer, it seems that the rest of the world is beating a path to us. App stores are thriving and will continue to grow as more people purchase smart devices. Web based software as a service is now mainstream, most people don’t think twice about paying for a useful service.

In any case, I’m proud of my work on LearnWeaver, here’s to the next challenge, keeping teachers, students and parents happy!