čtvrtek 26. ledna 2012

Art of teaching computer science

I was studying for my final masters exams at CTU Prague, that means revising more or less all my previous studies. Thanks to some job experiences, that I had before, I am more or less able to see if what I have learned will be useful for me in the future and also what was missing during my studies.

From this consideration I concluded this post, which describes what the school should give to computer science students. I have visited ENSTA ParisTech as well as UPV Valencia during my studies and I will mix here my experiences from all these universities, so it's not only about the Faculty of Electronics at CTU Prague(my alma mater).

I have structured the ideas into groups:
  • Importance of math
  • Understand the full stack
  • Don't bet too much on general engineering
  • The importance of being polyglot
  • Start at the lower level
  • Adoption of new technologies in courses
  • Teaching project management
  • The importance of the world around

The importance of math

Math is important. We don't like to study it, because it hurts - you have to think and it takes in general lot of time (that depends on your talent). What are the areas important for future CS engineer?

Well I think: Algebra, Analysis, Probabilistic, Statistic, Graph theory.
The studies of these subjects should be done during the first years - so the theory can be build up on these grounds.

There is a small difference between French and Czech approach. In France, when going to engineering school, you have to pass two years of "Preparatory classes". These classes are common to all future engineers and contain lot of general sciences, specially math. This gives to the French students better starting position for the later courses. I took Cryptography course back at ENSTA, it was completely based on Information theory and concepts from algebra such as Groups and Rings, which we have only scratched at my Algebra course at CTU and I had some hard time passing this course.

I was able to pass the course only because of the fact, that there was a practical part of implementation of RSA algorithm in C, which most of the French guys failed to finish (about that later).

Understand the full stack

This to me means to have complete understanding of how computer works. Basically if you are a software engineer you should know how your programs are compiled and interpreted by the hardware. You should know the structure of the processor, memory, arithmetic unit. You should understand that arithmetic unit can be constructed from NANDs etc...

What is important is the notion of abstraction. We do not have to be experts on all levels, but we should know what each lower layer contains and understand how it works.

Don't bet too much on general engineering

I had it the first year at CTU. And as I mentioned in France, they even have two years of general engineering. (To be honest in France it is more "general science"). It is important to learn general engineering approach to solve a problem - but it has to stay in intentions.

At CTU I remember having first year courses from Circuit theory. Even the mathematical analysis course was done the way, that the math could be later used for describing circuits by differential equations and solving them. This did not serve me later at all. As I said I believe knowing the entire computer stack is important - but not to this extend.

In France I have experienced even more off this "too much of general engineering" effect. I know some guys who after two years of higher studies for Computer Science degree did not understand well Object oriented approach. That is wrong - learning the OO programming takes time and practice specially we have to start early. And if you spend two years of your CS studies on general sciences and maybe you add a little bit of Algorithmic, it's already too late.

When I come back to my Cryptography course, which I have passed in France, the implementation of RSA in pure C was no peace of a cake, specially when you have to treat the large numbers using libraries such as (http://gmplib.org/). But I have managed it thanks to the quantity of the code which I have written before.

If I remember the French guys, with better understanding of the mathematical principals behind, did not succeed - you need to get programming in your blood to be able to solve complicated tasks...and it was no quite yet in their blood.

The importance of being polyglot

You cannot really understand benefit's of one language, when you cannot compare it to other languages. From my point of view students should meet at least 4 type of languages during their studies:

  • Imperative languages: C
  • Object oriented languages: C++, Java, C# (yes I know that they are not pure)
  • Dynamic languages (completely interpreted): Python, Ruby or even only JavaScript
  • Functional languages: Scala, F#
  • And some assembly language

OO programming has to be mastered - the other ones have to be at least touched!

When young CS students gets out of school, he might be asked to develop a small web application which performs CRUD operations on some data in database. If the only languages that he met, during the studies are Java and C, than he will probably take Java and start defining the different layers, take some web framework and start wiring all the pieces together.

If he would hear about Ruby before, he would probably be able to code the same application in half the time, thanks to the scaffolding and the dynamic nature of the language.

Start at the lower level

For some reasons, Java is taught as primary language at the universities. That is wrong. It is much harder to grasps the pointer arithmetic, when you start C/C++ later. I think that the partial reason for that is laziness. When you know that most of the stuff can be managed without C++, why bother? (Yes I am talking from personal experience).

Maybe start with C and teach OO programming with Java/C# or C++?

Adoption of new technologies in courses

This is a big problem. At ENSTA as well at CTU, we took too much time talking about SOA, Web Service, WSDL, UDDI. These concepts which were IN but let's say 5 years ago. So what is the reason of learning the structure of SOAP protocol - well, there are some messages, operations - yes sure. But technologies come and change. Schools have to react FAST to new coming technologies and also to the technologies leaving the spot-light. It is not useful now to know how UDDI works, when it is never used in the real world.

The technologies have to be carefully selected - and if they are out - they can be simple withdrawn from the syllabus and replaced with new ones. I had a course of J2ee where JSF and the JavaBeans approached had been thought as the way to do Java web apps. I think that this came from the fact that at some time around 2009 when jsf was released it was considered to be the way to do J2ee/web application development.

Later I realized, that it is not the only way, actually it is not even "the way" to do that (check this). So there are other frameworks and maybe all of them should be presented to the student equally - not betting only on one of them.

Since couple years, there is quite a buzz around big-data treatment. Lot of news is also coming from NoSQL DBs. It would be great to have a course about HBase, Cassandra, MongoDB, Map-Reduce and the way to treat big data in general...but to offer that, schools would have to be fast.

Maybe following the Gartner predictions or Forrester would help them find the right topic (of course these predictions have to be filtered by someone with a bit of distance. There is still too much marketing in stuff coming from gartner/forrester.).

Teaching project management

This is topic of it's own. I remember that at Valencia and later in Paris I was forced to do some estimations using the COCOMO metrics, which no one really proved to me that it works. I had to see too much overcrowded slides describing the traditional waterfall approach. I had also short course on Agile methods - but completely useless again while being TOO THEORETICAL.
Project management when taught has always the tendency to become too verbose - too much slides and talking and no practice!

But project management is important. And there is just one way to teach it - through projects. Why not have projects which last the whole year and in which the students have to participate?

There are thinks such as Continuous Integration and Configuration Management, which are of great importance to the success of the project. And these thinks are not really taught at schools. Maybe they are presented on the slides.

I would like to see a course, during which the first week or two would be dedicated to the building the Continuous Integration platform, using let's say Jenkins, with automatic build, unit testing, acceptance testing (FitNesse, GreenPepper, Selenium or what-ever), notifications etc. The rest of the course can be dedicated to the development of the project on THAT platform and everyone will see the benefits.

The importance of the world around

IT students, and some geeks in general have the tendency to ignore the world around. We use computer to about anything: study, learn, communicate with social networks, watch movies.

But there are some social skills which you can not pick up at the computer desk.

School should find a way to make students spend more time together.
At ENSTA the school has an area with pools, baby-foot and small bar. The students are in charge. There are regular parties and the students sometimes spend the whole day at school. You go to your classes, spend time at the library and you just switch to the bar later. You can meet all the peers from other study fields...

At last and not least. The profs, should motivate the students to leave for one year and study abroad...the option which is not that popular at IT faculties.

Summary

That's about all that on in my mind. Maybe someone will pick from this lists some points which might be useful while creating new computer science program or syllabus.

pondělí 16. ledna 2012

Bind-able layer for Bing Maps

I had a special requirement on showing items on Bing Map.

I needed to show a collection of collections of objects - in this case bike routes. Actually each route was composed of collection of routes (which could be interconnected at some places).

This cannot be achieved only by MapItemsControl - which can render only one dimensional collection.

So what I really need was to add dynamically a collection of MapItemsControl to the map.

Instead of using MapItemsControl I have decided to use the MapLayer class. I have created a new layer, which exposes a DependencyProperty "Routes".

This property can be bound to a two dimensional collection. I use List of Lists, but I guess I should have used IEnumerable to make the usage more general. When the collection changes, the routes are drawn to the card, by creating MapPolyline objects.

Of course a similar class could be created for Pushpins - and for displaying a set of set's of places.

When taken further, actually this approach could even expose a Template which could be set to specify how each item of the group will be rendered.

Here is the code:

public class BikeRoutesLayer : MapLayer
{
    private static Color[] _colors = { Colors.Blue, Colors.Green, Colors.Orange,Colors.Gray };

    public List<List<LocationCollection>> Routes
    {
        get { return (List<List<LocationCollection>>)GetValue(RoutesProperty); }
        set { SetValue(RoutesProperty, value); }
    }
    public static readonly DependencyProperty RoutesProperty =
        DependencyProperty.Register("Routes", typeof(List<List<LocationCollection>>), typeof(BikeRoutesLayer), new PropertyMetadata(new PropertyChangedCallback(RoutesChangedCallBack)));
    
    static int i = 0;
    private static void RoutesChangedCallBack(DependencyObject sender, DependencyPropertyChangedEventArgs args)
    {
        var layer = sender as BikeRoutesLayer;

        var list = args.NewValue as List<List<LocationCollection>>;

        if (list != null)
        {
            foreach (var bikeRoute in list)
            {
                foreach (var route in bikeRoute)
                {   
                    MapPolyline line = new MapPolyline();
                    line.Locations = route;
                    line.StrokeThickness = 1;
                    line.Stroke = new SolidColorBrush(_colors[i%_colors.Length]);
                    layer.Children.Add(line);
                }
                i++;
            }
        }
    }
}

pondělí 9. ledna 2012

NHibernate NFluent and custom HiLo generator

Azure SQL is not completely compatible with SQL Server. All the limitations are described over here. One of the limitations is that every table in Azure SQL needs CLUSTERED INDEX.

If you are using NHibernate & NFluent, than any identity mapping will create clustered index if it can.

If you want to use HiLo generator to get the ID's, than you need to configure special table for the generator. To use the generator you can let NHibernate to create the table.
Id(x => x.Id).GeneratedBy.HiLo("1000");
However this way it will create only one table with one ID. In a typical scenario you will want to use one table and store all the actual ID's in a particular row or column for each of the entities in the database.
Id(x => x.Id).GeneratedBy.HiLo("1000","hiloTable","myentity");
This supposes that you have a table called "hiloTable" which contains "myentity" column.

However you would have to write the script for the table creation, so you are loosing the possibility to run NHibernate and generate your database.

The solution which solves this two issues is to create own generator and base it on HiLo generator.
Here is the mapping for using own generator
Custom%lt;UniversalHiloGenerator%gt;(
x => x.AddParam("table", "NH_HiLo")
.AddParam("column", "NextHi")
.AddParam("maxLo", "10000")
.AddParam("where", "TableKey='BalancePoint'"));

When overriding the NHibernate.Id.TableHiLoGenerator we have the option to override the script which is used for the creation of the table containing the IDs. This can be achieved by overriding the SqlCreateStrings method which returns an array of Strings, which are executed as SQL scripts against the database.

public class UniversalHiloGenerator : NHibernate.Id.TableHiLoGenerator
{
public override string[] SqlCreateStrings(NHibernate.Dialect.Dialect dialect)
{
List commands = new List();
var dialectName = dialect.ToString();

if(dialectName != "NHibernate.Dialect.SQLiteDialect")
commands.Add("IF OBJECT_ID('dbo.NH_HiLo', 'U') IS NOT NULL \n DROP TABLE dbo.NH_HiLo; \nGO");

commands.Add("CREATE TABLE NH_HiLo (TableKey varchar(50), NextHi int)");

if (dialectName != "NHibernate.Dialect.SQLiteDialect")
commands.Add("CREATE CLUSTERED INDEX NH_HiLoIndex ON NH_HiLo (TableKey)");

string[] tables = {"Operation","Account"};

var returnArray = commands.Concat(GetInserts(tables)).ToArray();
return returnArray;
}

private IEnumerable GetInserts(string[] tables)
{
foreach (var table in tables)
{
yield return String.Format("insert into NH_HiLo values ('{0}',1)", table);
}
}
}

This code is quite simple. The sql scripts create the table for storing the ID's for all the entities in the database. In this particular case, in each row of the HiLo table there are two columns, one specifying the name of the table for which the ID is stored and in the second column is the ID.

The code also checks the dialect of the database. This way it can create an CLUSTERED index on the table (which will run fine for SQL server and Azure SQL and is REQUIRED for AZURE) and will skip the creation of the index SQL Lite, where clustered indexes do not exists.

In the example above two table entities are envisaged: Operations and Accounts in separate tables.

This way several issues are solved:
  • The schema of the database can be created automatically by NHibernate
  • The HiLo table is created for each entity. To add an entity you can simply just add the name of the entity into the list of tables.
  • Clustered index is created on the entity in the case that the script is not run against SQL lite.