Posts

Two ways to transparently make use of SQL Azure row-level security from Entity Framework

Announced January 29, 2015, SQL Azure now (natively) supports row-level security. I’m not going to explain how to implement this, the link I just provided does an excellent job at that. I think it is a really cool feature because it supports a lot of scenario’s where one user/department/tenant/etc isn’t allowed to see data from another user/department/tenant/etc. It prevents you from having to write queries throughout your codebase that end with: where UserId = 'johndoe' or where DepartmentId = 42. There’s always a risk you forget to do this somewhere, causing data-leakage. Instead, you provide a context to run your queries in and let a security policy decide what data to return to the client.

But, you might ask, the problem of appending a where-clause to each query is now replaced by the problem of specifying a context before each query. What have I actually gained? Well, when using Entity Framework, it is relatively easy to intercept all queries when they’re sent to the database and modify them. We’re going to use this technique to make sure context is provided before each query. And actually there is another way to accomplish the same that may even be better, I haven’t tested this thoroughly yet. I’ll describe this technique at the end.

Set the stage

But first we need to set the stage. I’ll describe all the steps here so that you can repeat them on your own SQL Azure database. First, we create a simple table that has the data we wish to secure with a security policy.

create table SecretTenantInfo
(
  Id int primary key identity(1, 1),
  SecretInfo nvarchar(50) not null,
  Context nvarchar(8) not null
)

The column of interest is Context where we store the context for a specific tenant. Info for all tenants is stored in the same table so it is important no tenant ever sees the SecretInfo of the other tenants.

Next, we create a stored procedure that allows us to easily set the CONTEXT_INFO for a particular tenant. To separate security concerns from business concerns, this stored procedure is created in another schema: security.

create schema [security]
go
create procedure [security].sp_setContextInfo
  @context [nvarchar](8)
as
begin
  declare @contextBinary varbinary(128) = convert(varbinary(128), @context, 0);
  set context_info @contextBinary;
end
go

Note that the information stored in CONTEXT_INFO has a maximum length of 128 bytes so in this example I cast the incoming nvarchar(8) to a varbinary(128). Next we need a table-valued function that takes a context as input and returns a row for each target table row that satisfies a certain condition.

create function [security].fn_contextSecurityPredicate(@context [nvarchar](8))
returns table
with schemabinding
as
  return
    select 1 as fn_contextSecurityPredicateResult
    where convert(varbinary(128), @context, 0) = context_info()

Note that this predicate function is similar to a function you could use in a CROSS or OUTER APPLY statement. In fact, let’s try and see what happens if we use this function in an OUTER APPLY. First we add two rows to our table:

insert into SecretTenantInfo (SecretInfo, Context)
values ('Tenant_0', 'Ctx_0'), ('Tenant_1', 'Ctx_1')

And then we run the following SQL statement:

select i.*, o.fn_contextSecurityPredicateResult
from SecretTenantInfo i
outer apply [security].fn_contextSecurityPredicate(Context) o

The result is as follows:

Outer apply result

As you can see, NULL is returned as the predicate value for each row in the table since we haven’t set a context yet. If we would change the OUTER APPLY to a CROSS APPLY, no data would be returned. So internally SQL Azure uses a CROSS APPLY (or something very similar) to enforce a security policy. Speaking of security policies, this is the last part of our database setup.

create security policy security.contextFilter
    add filter predicate security.fn_contextSecurityPredicate(Context) on dbo.SecretTenantInfo
    with (state = on)

Now that we have a security policy in place, no data is returned from any select statements on our SecretTenantInfo if we haven’t provided the right context.

Entity Framework (solution 1)

Our first solution uses the IDbCommandInterceptor interface. This interface has three method pairs: NonQueryExecuting/NonQueryExecuted, ReaderExecuting/ReaderExecuted and ScalarExecuting/ScalarExecuted. We’d like to modify queries before they arrive at the database so we need the Executing methods. In the example below I user the ReaderExecuting method, the other two have a similar implementation.

public class RLSCommandInterceptor : IDbCommandInterceptor
{
  public void ReaderExecuting(DbCommand command,
                              DbCommandInterceptionContext interceptionContext)
  {
    // Check if we are running the query on the right context.
    if (interceptionContext.DbContexts.Count() == 1 &&
        interceptionContext.DbContexts.Single() is ConnectorContext)
    {
      var rowLevelSecurityContext = ...get context...;

      command.CommandText =
        "EXEC [security].[sp_setContextInfo] '" + rowLevelSecurityContext + "';\r\n" +
        command.CommandText;
    }
  }
  ......
}

First we check whether the query is actually run against the right Entity Framework context (which in my case is called ConnectorContext). This check isn’t absolutely necessary but if you enable migrations, Entity Framework runs a number of queries on startup that do not really need a context. Next we need to acquire our current context. It depends on your application what that is. It may be a user id obtained from the current thread or a tenant id obtained from the environment. And finally we modify the SQL that gets sent to SQL Azure. All queries now automatically have the right context set.

Configuration of a command interceptor can be done in code or in a configuration file and is described here.

Entity Framework (solution 2)

As I promised, there is another solution for the same problem. A disadvantage of the described approach is that a context is set for every request which isn’t necessary. Context info is session- or connection-scoped so we need to set it just once per session or connection. PLEASE NOTE that I haven’t thoroughly tested the next part yet. If I encounter any problems I’ll update this post.

When opening an Entity Framework DbContext, there is an opportunity to run SQL statements as illustrated in the following code.

public class ConnectorContext : DbContext
{
  public ConnectorContext() : base("ConnectorContext")
  {
    // Open connection to database to enable setting context info.
    var conn = Database.Connection;
    conn.Open();

    var rowLevelSecurityContext = ...get context...;
    Database.ExecuteSqlCommand(
      "EXEC [security].[sp_setContextInfo] {0}", rowLevelSecurityContext);
  }
  ......
}

So now, instead of providing context for every query, we provide it once on the connection when opening the database context. As I said I haven’t really begun testing this but I think this is a problem-free way to make use of SQL Azure row-level security without bothering the rest of your codebase with this concern.

UPDATE

As I said already, solution 2 hadn’t been very well tested yet. Well, it turns out there are two problems. First of all, in the DbContext constructor I call a stored procedure. This procedure is created by one of my migrations. If you start with an empty database, this migration has not yet been applied so the stored procedure does not yet exist (a classic chicken-and-egg problem). Second of all, opening a database connection in the constructor of a DbContext breaks Entity Framework migrations. It took me a while to figure this out because the error messages point in an entirely different direction. When you attempt an Add-Migration, you get the following error:

Unable to generate an explicit migration because the following explicit migrations
are pending: [201507021605136_InitialCreate, 201507030629460_SamlTokenXmlRequired, ...,
201507081458529_CreateSecurityPolicy]. Apply the pending explicit migrations before
attempting to generate a new explicit migration.

That’s weird because I’m pretty sure all these migrations are already applied. My next step was to try an Update-Database. This gave me the following message:

System.InvalidOperationException: This operation requires a connection to the
'master' database. Unable to create a connection to the 'master' database because the
original database connection has been opened and credentials have been removed from the
connection string. Supply an unopened connection.
---> System.Data.SqlClient.SqlException: Login failed for user ''.

Actually this gives a hint in the right direction but when pasting the error message into Google you are pointed in an entirely different direction. Finally I tried providing an explicit target migration to Update-Database but that resulted in yet another error:

There is already an object named 'SsoInfo' in the database.

In my InitialCreate I happen to create a table named SsoInfo so Update-Database just attempts to start at the first migration and continue from there. I’m still in a situation where I can just drop my database and start over so I tried that as well but the above errors kept appearing.

After backtracking my changes I figured out that opening a connection in the DbContext constructor is the problem. So what we actually want is to ‘know’ in the constructor whether we are using the context at runtime or at design-time (when adding migrations, for example). It turns out this is possible by adding an implementation of the IDbContextFactory<TContext> interface to the same assembly that has your DbContext implementation. Mine is as follows:

public class MigrationDbContextFactory : IDbContextFactory
{
  public ConnectorContext Create()
  {
    return new ConnectorContext(isMigrationsMode: true);
  }
}

The ConnectorContext constructor has changed as well:

public ConnectorContext(bool isMigrationsMode = false) : base("ConnectorContext")
{
  if (!isMigrationsMode)
  {
    // Open connection to database to enable setting context info.
    var conn = Database.Connection;
    conn.Open();

    var rowLevelSecurityContext = ...get context...;
    Database.ExecuteSqlCommand(
      "EXEC [security].[sp_setContextInfo] {0}", rowLevelSecurityContext);
  }
}

By the way, note that this solves both our problems: the call to sp_setContextInfo is never made when updating our database and no connection is opened in design-time mode.

Azure DocumentDB performance

UPDATE: I promised testing from within the data center and more documents to test against. Check out the bottom of the post for additional interesting results. Spoiler: data center performance is really good!

On August 21 Microsoft released Azure DocumentDB. As the name implies, it’s a document database. It allows schema-less JSON documents that can then be queried using a SQL-like language. It’s still in preview mode but that doesn’t prevent me from some testing 🙂 ScottGu’s announcement promises linear performance (low-single-digit) scaling because of automatic indexing. Now that’s interesting because usually indexing a data store requires careful planning and is one of the more complex database design topics. So I thought I’d take DocumentDB through a performance test. More specifically, a query performance test. Data usually is read much more than it is written so read performance is more interesting than write performance (at least in my test).

The test is designed as follows:

  1. Generate a large number of random JSON documents (1048576 in my case). The fact that we have random documents should make it pretty hard for DocumentDB to index them efficiently (assumption!) This also means the test does not represent a real-world scenario. Usually you’d see similarly structured documents inside the same document collection.
  2. Repeat the following process:
    1. Import a batch of documents into Azure DocumentDB. After each import the number of documents is respectively 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576.
    2. Determine two sets of test queries against the imported documents. Each set contains 1000 queries. The first set has queries that find a document based on a property at root level in the document, the second set has queries that find a document based on a child-level property. I was curious whether there would be a difference between the two.
    3. Run each set of queries two times against the current document collection. I was curious whether there would be some kind of learning effect if you repeat the same set of queries. And running each set two times allows us to determine average performance.

To elaborate on bullet 2.2: suppose I have the following JSON document:

{
  "key1" : "value1",
  "key2" : {
    "subKey1" : "subValue1"
  }
}

I would generate two queries:

SELECT d["key1"] FROM DataSet d WHERE d["key1"] = "value1"
SELECT d["key2"]["subKey1"] FROM DataSet d WHERE d["key2"]["subKey1"] = "subValue1"

It will be interesting to see whether there is a difference between root-level property indexing and child-level property indexing.

The test itself was run on a laptop with 16GB memory and an Intel Core i7-4600U processor. During the test the CPU utilization of the process did not go above 15% and memory pressure was negligible.

Results

The results are, well, rather uninteresting… 🙂 Summarizing: it doesn’t matter if you perform queries against one document or one million documents. Query performance is not linear but there is no performance degradation when increasing the number of documents to query against. Also, it doesn’t matter if you query a root property or a child property, these are treated similarly. And there is no learning effect: if you run the same batch of queries again, performance is nearly the same. So far, Azure DocumentDB performance seems to scale pretty well.

DocumentDB query performance

In the graph you see two lines: one for the root property queries and one for the child property queries. Since I could not detect a learning effect, I averaged the results of the two root and the two child runs. If you’re wondering where data point 1296618 comes from, check out update 1 below.

One interesting observation is on absolute performance. On average my test program used 86 seconds performing a 1000 queries. That’s 86ms per query which is not very fast. Especially since I live about 30km from datacenter West Europe where my database was hosted. This seems to have something to do with location: I ran all tests from the office but when I tested from home, everything was suddenly twice as fast. I suppose I should test from within the West Europe data center to see if performance gets better.

What’s next?

First of all, I’m going to keep adding documents. I have a million of them now and am going to add another million and repeat the same test. And again, and again… Second, ScottGu’s blog post promises low single digit latency which I don’t see in my test. I’m assuming this is because of network latency on my test location. I’ll run other tests from the West Europe data center directly instead of from my own laptop to see if latency drops.

Other observations

Some things I noticed during setting-up of the test (and not having read the documentation too carefully before starting, since everything mentioned below is actually documented…):

  • There’s a maximum document size of 16KB. If you try to create a larger document you receive an error (RequestEntityTooLargeException). There is already a post on the DocumentDB forum explaining why there’s a limit and why it has this value. There is also a very popular feature request for larger document sizes (currently under review) on the DocumentDB feedback site.
  • You cannot create a query that selects on the keywords IS and NOT (and as I later discovered: BY, JOIN, AND, OR). I found this out by incident because of the size of my random test set. Somewhere in the set there exist (valid) JSON documents that have property names IS and NOT, possibly leading to queries of the form: SELECT * FROM DataSet d WHERE d.IS = "SomeValue". This leads to a BadRequestException: Syntax error, incorrect syntax near 'IS'. Here is my post on the forum asking how to escape these keywords in a query.
    The answer is actually quite easy: just use JSON syntax for getting property values. You can escape any property value simply using d["IS"] instead.
  • I had one document that was valid according to both JSON.NET and JSONlint.com but that generated a BadRequestException: The request payload is invalid. Ensure to provide a valid request payload. Since this is a valid document, I have no idea why DocumentDB rejects it so I asked about this on the forum.
    The problem with my document was that it uses a property name, in this case _ts, that is also used internally. There are some more internal properties that I assume you can’t use that can be found here.
  • If your document has a property at root level that is named id, it must be a string. If it’s either an object or an array you get an error message: Can not convert Object to String (or Can not convert Array to String respectively). This error is generated on the client, not on the server.
    An id property is not required: DocumentDB will generate a GUID if you don’t provide one. Check out my forum question here.

Update 1: Document collection quota exceeded

My intention was to keep adding documents to the collection and see if performance degrades. My next batch was supposed to be another 1048576 documents. However, DocumentDB in preview mode allows a maximum of 3.3GB per document collection as described here. The error I receive is a Microsoft.Azure.Documents.ForbiddenException: "Quota exceeded. Consider creation of a new collection". Not sure why I receive this error because the portal reports I’m only using 3.15GB. I asked another question on the forum.

DocumentDB Usage

However, I now have 1296618 documents which is still a respectable number. I suppose I could add another collection and add more documents but I also assume that performance of another document collection is the same. Queries are scoped per document collection, not per document database so performance between collections is probably similar.

I ran a final test against the 1296618 documents I have now and added the results to the graph. No spectacular results.

Update 2: Run the test set from the data center

I was a little disappointed by the absolute performance results: each query takes 86ms on average when run from my office location. However, when I run the same test set from home it suddenly drops to about 40ms; location seems to matter. So what if the test is run directly inside the West Europe data center? I created a simple Azure Website with one button and ran the same test as described above. The results are a lot better: we are much closer to the low single digit performance that was promised. Again, there is no difference between root and child queries and there is no learning effect.

The results are (average results per query from one test run):

  • Root queries: 7.6838ms
  • Child queries: 7.8093ms

That’s about 10 times better than my office location results and 5 times better than my home location results. Obviously, proximity matters. All data center tests were run against the full set of 1296618 documents.

Since this test is run simply from a website with a button, it’s pretty easy to run it simultaneously from separate browser tabs. So I opened ten tabs and clicked the button in each one (not a very scientific approach, I agree). Performance in each browser tab is comparable and between 7ms and 10ms consistently. Remember that each test consists of 4000 queries. Multiply this by the number of tabs and you have 40000 queries within approximately 40 seconds, with an average query response time within 10ms. I think this is quite impressive, especially considering the complete JSON junk I threw at DocumentDB.

A simple PowerShell DSC resource for configuring an Azure Virtual Network

PowerShell Desired State Configuration (DSC) is a new Powershell feature shipped with Windows Server 2012 R2 and Windows 8.1. In short, it allows you to specify what a machine configuration should look like and leaves it to the so-called Local Configuration Manager on the target machine to get into this desired state. There are plenty of resources available if you’d like to learn more so I won’t get into Powershell DSC details here. I can particularly advise the Powershell DSC Book from PowerShell.org.

I was configuring an Azure Virtual Network (VNet) through PowerShell as part of a larger script and thought it would be a nice exercise to write a basic PowerShell DSC resource for configuring the network. PowerShell options for configuring an Azure Virtual Network are quite limited: the most important cmdlet is Set-AzureVnetConfig that accepts an XML file that must describe your entire network configuration. There are no cmdlets for adding a virtual network site or DNS server, for example. This is somewhat limiting but it makes the initial job of writing a DSC resource a lot easier.

First, you should download the latest PowerShell DSC Resource Kit (at the time of writing this is Wave 5). It contains two things we need among a lot of other DSC resources: the xAzure module containing the xAzureSubscription DSC resource for setting up an Azure subscription before configuring our virtual network and the xDscResourceDesigner module that allows easy creation of DSC resources. Unzip the resource kit to C:\Program Files\WindowsPowerShell\Modules like this:

DSC Resource Kit Install Directory

The DSC resource we’re creating mimics the behavior of the Set-AzureVnetConfig in that it only accepts a path to the .netcfg file. Besides we add one additional property for later use: the current XML configuration. The script to create the DSC resource template is as follows (run as administrator):

Import-Module xDSCResourceDesigner

$ConfigurationPath =
    New-xDscResourceProperty -Name ConfigurationPath -Type String -Attribute Key
$CurrentVNetConfig =
    New-xDscResourceProperty -Name CurrentVNetConfig -Type String -Attribute Write

New-xDscResource -Name cAzureVNetConfig `
                 -Property $ConfigurationPath,$CurrentVNetConfig `
                 -Path 'C:\Program Files\WindowsPowerShell\Modules' `
                 -ModuleName cAzure -FriendlyName cAzureVNetConfig -Force

Running this script results in the following files and folders in C:\Program Files\WindowsPowerShell\Modules:

C:\...\cAzure\cAzure.psd1
             \DSCResources\cAzureVNetConfig\cAzureVNetConfig.psm1
                                           \cAzureVNetConfig.schema.mof

The cAzure.psd1 file is the module manifest. The cAzureVNetConfig.schema.mof file is the WMI Managed Object Format file that describes our resource to WMI. And finally cAzureVNetConfig.psm1 that contains the actual scripts for the DSC resource. It consists of three functions:

  • Get-TargetResource: this function usually retrieves the current state of the target system. For example, the built-in File DSC resource may check to see whether a specific file or directory exists and return this information.
  • Set-TargetResource: this function is responsible for manipulating the target system so that it gets into desired state.
  • Test-TargetResource: this function should return $true or $false, indicating whether the target system is in the desired state or not.

Let’s start with the simplest: Test-TargetResource. For now, we just let it return $false, indicating that the Azure Virtual Network is not in the desired state. We could implement a comparison between the XML document we provide and the current configuration to check for differences but all that would gain us is preventing an unnecessary Azure network configuration update. So Test-TargetResource is as follows:

function Test-TargetResource
{
    [CmdletBinding()]
    [OutputType([System.Boolean])]
    param
    (
        [parameter(Mandatory = $true)]
        [System.String]$ConfigurationPath,

        [parameter()]
        [System.String]$CurrentVNetConfig
    )
    $result = $false
    $result
}

Next is Get-TargetResource. It’s main job is to get the current network configuration, although we don’t do anything with it in this first version:

function Get-TargetResource
{
    [CmdletBinding()]
    [OutputType([System.Collections.Hashtable])]
    param
    (
        [parameter(Mandatory = $true)]
        [System.String]$ConfigurationPath
    )

    #Get current VNet config.
    $currentVNetConfig = (Get-AzureVNetConfig).XMLConfiguration

    #Return both configuration path and the current configuration.
    $returnValue = @{
        ConfigurationPath = $ConfigurationPath
        CurrentVNetConfig = $currentVNetConfig
    }
    $returnValue
}

The function takes one parameter: the configuration path for our new network configuration and it returns a hash table with the configuration path and the current configuration.

The last function we need is Set-TargetResource which applies our new network configuration:

function Set-TargetResource
{
    [CmdletBinding()]
    param
    (
        [parameter(Mandatory = $true)]
        [System.String]$ConfigurationPath,

        [parameter()]
        [System.String]$CurrentVNetConfig
    )

    #Set the new VNet config from the specified configuration path.
    $operation = Set-AzureVNetConfig -ConfigurationPath $ConfigurationPath -Verbose -Debug
    Write-Verbose "Setting Azure VNet config result: $operation.OperationStatus"
}

It takes two parameters, the configuration path and the current configuration and then calls Set-AzureVNetConfig using the specified configuration path. Pretty easy, all combined. Now on to how to use all of this. And a tip: before using our new resource, despite it being in the correct folder (on PSModulePath), you may need to reboot. I have no idea why.

To use the new resource you have to write a configuration, use the configuration to generate a MOF file for our target system and apply the MOF file. The target system in this case is localhost since we run the configuration from the current machine. To be able to retrieve the current Azure VNet config you need an Azure publishsettings file. You can download yours here: https://windows.azure.com/download/publishprofile.aspx. Once you have your publishsettings file and a valid Azure VNet config, you can run the following script to run our newly created resource.

Configuration SetAzureVNet
{
    Import-DscResource -Module cAzure
    Import-DscResource -Module xAzure

    xAzureSubscription MSDN
    {
        Ensure = "Present"
        AzureSubscriptionName = "Windows Azure MSDN - Visual Studio Premium"
        AzurePublishSettingsFile = "C:\Temp\Azure.publishsettings"
    }

    cAzureVNetConfig VNet
    {
        DependsOn = "[xAzureSubscription]MSDN"
        ConfigurationPath = "C:\Temp\vnet.netcfg"
    }
}

SetAzureVNet -OutputPath C:\Temp\DSC -Force

Start-DscConfiguration -ComputerName 'localhost' -Path C:\Temp\DSC -Wait -Verbose

Some explanation is in place. First the Configuration keyword. The new keyword allows you to specify a PowerShell DSC configuration. This is a declarative way of specifying the desired state of a node.

Import-DscResource looks like a PowerShell cmdlet but it actually isn’t and it can only be used inside a configuration. We use Import-DscResource to import both our new module and the xAzure module from the DSC Resource Kit.

Next we use the xAzureSubscription resource to initialize our Azure subscription. Note you need the name of your subscription and your publishsettings file. Finally we call upon our new resource to configure the Azure VNet. Note that our resource has a dependency on xAzureSubscription, meaning that it should only be executed when that one has run.

That’s it! When you run the script from PowerShell ISE, this is what you should get:
Run VNet config script from PowerShell ISE

I didn’t go into DSC details here but I’ve included enough references so that you can find this information on your own.

Combine Node.js Tools for Visual Studio with Windows Azure Mobile Services

I’m experimenting with Windows Azure Mobile Services (WAMS) and as you may or may not know, these are written using Node.js. Unfortunately my favorite editor, Visual Studio (did you know it even won a Jolt Award), doesn’t natively support Node.js. But to remedy this there is the open source Node.js Tools for Visual Studio or NTVS, currently in alpha but I haven’t seen any problems so far.

But back to WAMS. I’d like to develop this inside Visual Studio and deploy from there as well. For that to work you have to combine Node.js and Git deployment with your WAMS scripts. I’ll walk you through it from the beginning.

Create a new mobile service

We start with a mobile service. Go to the Azure Management Portal and create a new Mobile Service. Once you have that, configure Git support for the service: go to the dashboard and click Set up source control.

Configure Git repository

If you’ve never before configured Git, the portal will ask for a username and password to control access to the repository. If you’ve configured Git before but no longer remember your credentials, go to the dashboard and Reset your source control credentials. Once Git is configured, go to the configure tab of your service and copy the GIT url that is now available.

Two more things before we continue in Visual Studio: we now have an empty Mobile Service, without any tables. To see anything interesting we should at least have some script files so let’s create a table on the data tab. With this table come four Node.js scripts for insert, update, delete and read. When we leave these untouched and we clone our Git repository in the next step, they are not included unless you make a small change to each script. I simply added a comment at the top of each script.

Update scripts

Inside Visual Studio

We now have a GIT repository for our source files and we can use Visual Studio to clone this repository. To do this we open up Team Explorer. In Visual Studio 2013, Team Explorer gives access to both ‘traditional’ TFS repositories as well as Git repositories. Under Local Git Repositories click Clone and enter the repository url copied earlier.

Clone Git Repository

If everything went well we now have a local Git repository as you can see in the screenshot below. What’s even better, there’s also sub folders for custom API scripts, scheduler scripts, etc. This is something you don’t get when you access your Mobile Service from Server Explorer in Visual Studio.

Local Git repository

Node.js Tools for Visual Studio

So far so good. We have a local Git repository containing all Node.js script files for our Mobile Service and when we update one of the script files, the change will be reflected inside Visual Studio and we can push it back to the master branch on the server, updating our Mobile Service.

However, the local Git repository is simply a collection of script files and although you can open these all separately inside Visual Studio, this isn’t quite the workflow I want. Besides my Mobile Service I have a number of other projects (class libraries, a Windows Phone app, etc.) and it would be great if these were all part of the same Visual Studio solution. For this there exists a NTVS project template called From Existing Node.js code.

Node.js from existing code

When you choose this project, you can either create a new solution or add the project to an existing solution. A wizard starts that lets you select a folder that contains your Node.js files. Simply select the Git repository folder.

Select Git repo

When you click Finish, you’re done. You now have a Node.js project inside Visual Studio that you can use to edit all your Mobile Service script files. Do not forget to push the newly added NTVS njsproj file to the master branch.

Making your Mobile Service scripts into a NTVS project has at least two major benefits. First you have the entire Mobile Service in a single project that you can open inside Visual Studio alongside other projects in your solution. Second, you get all the NTVS goodies like IntelliSense and npm. For example, when I type require(, I get a list of all available Node.js modules. Also check out the native Git support in the Solution Explorer: testtable.delete.js is marked as changed.

NTVS IntelliSense

Walkthrough: Hosting FTP on IIS 7.5 in a Windows Azure VM

I have been struggling yesterday and today to get FTP working on IIS 7.5 in a Windows Azure Virtual Machine and I just achieved victory! To remember all the steps myself and to help others in achieving the same, I’ll describe how to accomplish this.

Spin up a virtual machine

First of all, you need a virtual machine. I needed a Windows machine with a SQL Server database so I chose ‘Microsoft SQL Server 2012 Evaluation Edition’ from the available templates.

VM choice

Once the machine has booted, you can RDP into it via the connect option at the bottom of the management portal.

Connect via RDP

 

When you’re in, you need to configure IIS. A summary of the required steps:

  • Add the ‘Web Server (IIS)’ role to the server.
  • Add the IIS features you need.
  • Add a TCP endpoint to your VM in the management portal with public and private port 80.

To enable FTP, make sure you enable the ‘FTP Server’ role services for your IIS role:

image_37

Add and configure FTP site

The next step is to create the actual FTP site in IIS. Right-click on ‘Sites’ in IIS Manager and select ‘Add FTP Site…’:

 

Add FTP site

Specify the name and the local path for the site:

Site information

Specify binding and SSL information:

Bindings

 

And finally specify who should have access to the FTP site. Note that I selected Basic Authentication and the administrator user. This corresponds to the local administrator account on the VM (the same account you use when you use RDP to login). This is definitely not the best solution. When you do not use SSL to secure access to the FTP site, your FTP credentials are sent in cleartext when logging in to the FTP site.

Authentication

Local testing

You should now be able to access the FTP site from within the VM. Open a command prompt, type ‘ftp 127.0.0.1’ and login with your administrator account.

Local test

Well, that was the easy part. You now have an FTP site that you can access locally. When you try to access it from another machine, you will notice that you can’t get a connection.

We are now getting into the nitty gritty details of the FTP protocol. Whereas you may think that FTP only uses port 21, it actually doesn’t. I’m not going into the details but there’s a good explanation here.

Configuring remote connectivity

First of all, for active FTP, in theory you need to allow access to ports 21 (FTP command port) and 20 (FTP data port). So you need to add two endpoints to your VM:

FTP Active Endpoints

So far the theory. When attempting to connect to the FTP site using Filezilla, explicitly indicating that we’d like to use active mode, still no connection can be established. I haven’t figured out why exactly…

But of course we can still try to configure passive FTP. For this to work, we need to tell the IIS FTP server the port range it can use for data connections and we need to add endpoints to the VM that correspond to this port range.

First of all, configure the port range and external IP address for passive data connections. This can be found in IIS Manager:

Firewall support

 

Firewall support

 

The external IP address should be the Virtual IP address you can find in the Azure Management portal. Unfortunately, it seems impossible to specify the data channel port range here. To set this, we need the appcmd utility, which can be found in %windir%\system32\inetsrv.

 

appcmd set config /section:system.ftpServer/firewallSupport 
    /lowDataChannelPort:7000 /highDataChannelPort:7014

 

In the example, I chose ports 7000 to 7014 but you can choose any port range you like as long as it corresponds to the endpoints you configure for your Azure VM.

For configuring 15 extra endpoints for my VM I decided to use the Windows Azure Powershell cmdlets which you can download here. You can also add 15 endpoints in the management portal but you can only add them one by one which takes a considerable amount of time. To be able to use these cmdlets, you first need the publish settings file for your Azure account. There are a number of ways to download the publish settings file and one way is to start Windows Azure Powershell and use the cmdlet Get-AzurePublishSettingsFile. It opens a browser and allows you to download the publish settings file that corresponds to your Windows Live id.

When you have downloaded the publish settings file, you can import it using the Import-AzurePublishSettingsFile cmdlet and we’re ready to start adding endpoints. I simply created a text file containing the list of commands I wanted to run and copied that into the Powershell window:

 

Get-AzureVM -ServiceName 'myServiceName' -Name 'ftpportal' 
    | Add-AzureEndpoint -Name 'FTPPassive00' -Protocol 'TCP' 
                        -LocalPort 7000 -PublicPort 7000 
    | Update-AzureVM
Get-AzureVM -ServiceName 'myServiceName' -Name 'ftpportal' 
    | Add-AzureEndpoint -Name 'FTPPassive01' -Protocol 'TCP' 
                        -LocalPort 7001 -PublicPort 7001 
    | Update-AzureVM
...

We’re almost there. Although the Windows firewall seems to allow all traffic that’s required, you also need to enable stateful FTP filtering on the firewall:

 

netsh advfirewall set global StatefulFtp enable

Finally, restart the FTP Windows service and we should be up and running:

net stop ftpsvc
net start ftpsvc

Testing with Filezilla confirms that we can now successfully connect to our new FTP site, hosted on a Windows Azure VM:

 

Filezilla test

Adding extra user accounts

As I said before, using the default administrator account for accessing your FTP site is a BAD idea because credentials are sent in clear-text. Therefore, create a new local user account on the VM and add an FTP authorization rule to allow access to your FTP site.

References

I had some help writing this article, mainly from this article by Angelo Laris that describes how to add active and passive FTP functionality to an Azure Web or Worker Role.
Other references include:

Comparing WIF claim sets from AppFabric Access Control Service V2

Windows Identity Foundation or WIF is a Microsoft technology that allows you to move authentication and authorization logic out of your application. Suppose you are building a web site, then using WIF you no longer have to build your own login page and user store. Instead, you let a so-called Identity Provider, sometimes called a Security Token Service or STS authenticate your users. Your web application, called the Relying Party (RP), has a trust relationship with the STS and when the STS presents a user token to the RP, the RP knows that the user represented by the token is authenticated.

To clarify the example a little, suppose you choose Windows Live as your STS. When I hit a secure section of my web site, WIF intercepts the request and redirects it to the Windows Live authentication page. I sign in with my Windows Live credentials, Windows Live creates a security token for me and redirects me back to my web site using this security token. WIF checks the token and I’m authenticated.

Multiple identity providers

In the example above I used a single STS but suppose I want to allow users from multiple identity providers (Google, Yahoo, Facebook, etc). Microsoft has a solution for this called AppFabric Access Control Service V2, available on portal.appfabriclabs.com. It is a proxy between your application and currently five identity providers.

Identity providers

In the screenshot I have activated two identity providers.

Claim sets

When you sign in to an identity provider, it generates a token that represents who you are. So what exactly is a token? A token contains a set of claims about a user and is digitally signed by the STS. So when I login to Windows Live, a token is generated that contains some claims about me. A claim can be an e-mail address or a last name or anything else that the identity provider may wish to disclose about you. And that’s what we are interested in today. Each identity provider provides a different set of claims. Some may produce more claims, others may produce just an identifier. So the question is: what are the claims provided by each identity provider currently supported by AppFabric ACS?

For this post I have not yet tested Microsoft Active Directory Federation Services 2.0 because I haven’t set that up yet. When I have, I’ll update this post or write a new one describing the possibilities that ADFS offers.

Before we start it is important to note that AppFabric ACS simply passes all claims from each identity provider through unchanged.

Windows Live ID

Windows Live is the provider that offers the least amount of claims. It only gives your application a unique user identifier. The other claim is added by AppFabric ACS to let you know what identity provider was used (you’ll see this claim type for every STS).

Windows Live Claims

Windows Live does not offer any options for getting more user information. This is a deliberate choice to protect user privacy. This is also the reason that Windows Live does not ask the user for confirmation to share information with a third-party application: no personal information is actually shared.

Google

Google offers a little more user information when asked for it. Besides, Google as an additional step explicitly asks your permission to share your information with AppFabric ACS. In my case this means that I have to confirm that rwwilden-appfabric-labs.accesscontrol.appfabriclabs.com is allowed access to information from my Google account.

Google claims

The reason Google asks for your permission to share information is that they actually provide information that can be traced back to a person.

Google does not offer any options for getting additional information.

Yahoo!

Yahoo! provides the same claims Google does. They also have a confirmation step to allow the user to think again before sharing information with a third-party application.

Yahoo! claims

Facebook

Facebook does not have a very good record of keeping its user’s data private (I inserted some random links to sites I found when searching for ‘facebook privacy violation’). However, the set of claims when using Facebook as your STS is limited. The extra claims are an access token and an expiration date.

Facebook claims

Besides, Facebook asks for your permission when you sign in for the first time, just as Google and Yahoo! do.

However, you can configure AppFabric ACS to ask the Facebook STS to grant access to an extensive set of permissions based on the access token claim. For example, I can let AppFabric ACS ask Facebook for permission to read a user’s birthday.

Facebook AppFabric ACS

The above screenshot is from the AppFabric ACS portal. When I sign in this time via Facebook I get a new request for permission (in Dutch) asking me to allow the third-party application access to the birthday field.

Facebook request for permission

When I allow this, there are no new claims added to the claim set. However, I can use the provided access token to get additional user data via Facebook’s Graph API. The url
https://graph.facebook.com/rwwilden?fields=birthday&access_token=…
gives me a small JSON document:

{
   "birthday": "04/11/1977",
   "id": "100001960422926"
}

containing my birthday. So far so good. My application has requested permission to access my birthday, I have given this permission and using the provided access token claim I can access the birthday field.

What about other fields? The documentation for the Facebook API User objectspecifies that I need the user_work_history permission to read the work field. I have never given this permission so the following url should generate an error or at least no work history.

https://graph.facebook.com/rwwilden?fields=work&access_token=…

And it works as expected:

{
   "id": "100001960422926"
}

 

Conclusion

The differences between identity providers currently supported by AppFabric ACS are large. On the one end there is Microsoft with Windows Live, providing only a user identifier. On the other end there is Facebook with a lot of configuration options. If you want to support Windows Live ID, the only information the STS gives you is that a user is authenticated. Additional user information must be stored inside your application.

Even if you use Google or Yahoo! you probably need to store additional user information. The Facebook API offers all personal information you may ever need to know about your users so there is no need to store any additional user information inside your application. Facebook clearly wins when you look at it from an application builder standpoint.

From a privacy standpoint it is clear that Microsoft wins. For a third-party application making use of Facebook as an STS it is very easy to know all about its users. It’s only one button click away.