MonthAugust 2013

Reading Excel files in .Net using C#

Customers and project managers seem to have one thing in common, they both like working with Excel. Though Excel isn’t really my favorite application, there is one thing very nice about it. It is very easy to read an Excel file in .Net.

In this post, I will try to describe how you can read and parse an Excel file.

This post is based on a computer which has Office 2010 installed. If you try to read excel files on a computer or server without office installed, you first need to install this.

Interfaces

Nobody seems to care about interfaces anymore, that’s why I’m telling you that I do think it is important. It makes your application loosely coupled and unit testable! It also makes you think about generic solutions for problems.

Program to an interface, not an implementation!!

The interface I created for my reader file:


public interface IFileReader<T>
{
     T Read(string filename);
}

I created the following class which implements the IFileReader interface


public class ExcelReader : IFileReader<DataTable>
{
     public DataTable Read(string filename)
     {
      ....
     }
}

The connection string

The Excel file can be read as a database, using an OleDb  (Object Linking and Embedding, Database) connection. This is a connection designed by Microsoft for accessing a variety of sources.

I use two different connection strings for reading Excel files, one for the good old xls files and one for the xlsx files.

I check on file extension for detecting the right connection string (I know this is dirty!!!)


 string connectionString;
 var filename = filePath.Trim().ToLower();
 if (filename.EndsWith(".xlsx"))
 {
    connectionString = string.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=YES;IMEX=1\";", filePath);
 }
 else if (filename.EndsWith(".xls"))
 {
    connectionString = string.Format("Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1\";", filePath);
 }

Please put these connection strings in the connectionstrings section of your app.config or web.config!!

Reading sheets

Now that we have a connection string, we can create a connection.


OleDbConnection connection = new OleDbConnection(connectionString);
connection.Open();

Make sure you use try/catches for the connection.
Now that the connection is open, the next step is reading the tables. The tables in Excel are the available sheets.


DataTable tableSheets = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);

Every row in the tableSheets is a sheet


string dataTableName = tableSheets.Rows[0]["TABLE_NAME"].ToString(); 

The above code will give you the name of the first sheet. The sheet name can be used to retrieve the data of this sheet. You can also use a foreach or for loop to retrieve all sheet names if you need to collect data of all sheets.

Reading data


OleDbCommand command = new OleDbCommand(string.Format("SELECT * FROM [{0}]", dataTableName), connection);
command.CommandType = CommandType.Text;

This Command reads the data of the first sheet.
I like to use a DataTable to store the retrieved data. A DataTable is a representation of one table and contains both Columns and Rows.
The first row in a excel sheet are read as columns. The data in the Rows can be retrieved by using the name of the column as index key (like a dictionary).

So we need to fill the DataTable with the result of the Command:


var dataTable = new DataTable();
OleDbDataAdapter dataAdapter = new OleDbDataAdapter(command);
dataAdapter.Fill(dataTable);

The OleDbDataAdapter executes the command and fills the DataTable with the resultset.

The datatable is filled and ready to use. Make sure you handle exceptions properly!

Clean up!

The only thing left to do is clean up! Close the connection and dispose all disposable objects! Why? Because you don’t want memory leaks or other strange behavior in your application! Always dispose disposable objects!  

if (connection.State == ConnectionState.Open)
connection.Close();
connection.Dispose();
command.Dispose();
dataAdapter.Dispose();

I recommend using usings instead of calling dispose!

Setting readonly variables

A lot of people don’t seem to realize that setting readonly variables is indeed possible, it can only be done from inside the constructor:

public class SomeClass
{
   private readonly ISomeService _someService;
   private readonly ISomeOtherService _someOtherService;

   public SomeClass(ISomeService someService, ISomeOtherService someOtherService)
   {
      _someService = someService;
      _someOtherService = someOtherService;
   }
}

This is perfectly fine working code and I strongly recommend it. I use this a lot with dependency injection (DI). This makes sure that the injected instances cannot be overridden, because they can only be set in the constructor.
That’s why I really don’t like the requirement of a parameterless constructor in UserControls in WebForms. There is no possibility for injection, only resolving is possible.

Opening VS 2012 Web projects in VS 2010

Have you ever met a front end developer who doesn’t mind working in Visual Studio? Well, I’m one of the lucky ones who have!
He doesn’t even own an iPhone and still his work is pretty awesome! (no, he did not pay me to write this!)

We work together on several projects. He works uses a basic version of visual studio 2010, while I work with visual studio 2012.
I created some .net 4 web projects for him to work in, and asked him to get the latest version from TFS.

He asked me if he should install WebMatrix and I answered with a simple no. This caused the projects to not being able to load.
I started checking the project references, nothing shocking. The project properties were also pretty basic, using custom webserver.

I discovered that the problem was IIS Express. This is not installed in VS 2010. Probably didn’t exists at the time of release.

So I checked the web server settings to see if IIS Express is enabled, but the project settings were set to custom webserver and the already disabled IIS Express checkbox was not checked.

I finally checked the project file, by using edit project file. This will unload the project and allows you to edit the project file, which is basically just an xml file. You should definetly check this out if you have never done that before.

And that’s where I saw the problem:

<UseIISExpress>true</UseIISExpress>

Setting this to false and reloading and committing the project file solved the problem!

Sitecore user serialization

A colleague gave me an excel file with usernames, password and profile information and asked me if I could import them in Sitecore.
After writing an import tool (read about reading excel files here), I thought to be clever and used Sitecore serialization to import all users to the test environment.
Serialization works quite well. It creates a user file for every user (in the data\serialization\security folder) and even the roles can be serialized.
I copied these files to the test environment and started the deserialization process. All sitecore users were visible in sitecore, but none of the users where able to login.
That’s when I found out that passwords are not serialized!
So remember! Serialization of users looks nice, but you do need to create new passwords!