CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Patrick Smacchia [MVP C#]

  • Lessons learned from a real-world focus on performance

     
    I am glad to announce that we just released a new version of NDepend where the analysis phase duration has been divided by 4 and the memory consumption has been divided by 2.

     

    An interesting question is: Does such a massive performance gain means that it was badly coded the first time? Previous slower versions met several thousands of functional requirement needed to analyze properly any .NET application. In this sense, it was nicely coded. But still, there were room for improvement and my believe is that there is always room for better performance. Even on the current much faster version we identified several significant optimizations to be done. The downside is that this will require a lot of work. So the first lesson learned is that there is always room for better performance. We can complete this rule with the fact that the amount of work to make a program run faster grows exponentially with the gain expected.

     

    We quantified each performance gain source and we obtained this chart. As said we have a 75% gain obtained thanks to algorithm optimization, micro-optimization and parallelization. The current analysis duration (25% of the original analysis duration) can be split between 17% of the time spent in our code, and 8% of the time spent in tier code. In our case, tier code is the code of Mono.Cecil, used to build some object models of the assemblies analyzed.

     

     

     

     

    The bulk of performance was obtained with algorithm improvement and this is not a surprise. IMHO the surprise comes from the relatively small gain obtained from parallelization. Indeed, we did some massive refactoring to parallelized the analysis with the parallel monologue idea in mind. Now each assembly is analyzed as a task that doesn’t have any synchronization need with any other task. In other words each task uses its own states that are not visible from the other tasks. Unfortunately this requirement is not enough to ensure proper scaling on the number of processor. While we get the 15% gain from between 1 and 2 processors, the gain is almost zero between 2 and 4 processors. We identified some potential IO contentions and memory issues that will require more attention in the future. This leads to another lesson: Don’t expect that scaling on many processors will be something easy, even if you don’t share states and don’t use synchronization.

     

    When it comes to enhancing performance there is only one way to do things right: measure and focus your work on the part of the code that really takes the bulk of time, not the one that you think takes the bulk of time. I have a quick real-world illustration. Personally I have always been amazed by the performance of the C# and VB.NET compilers. Imagine, it can compile in a few seconds thousands of source files that took dozens or even hundreds of man-year to be written. At analysis time, NDepend needs also to parse sources files but it does it partially to only obtain comment and Cyclomatic Complexity  metrics. By measuring file loading time we had a good surprise: loading source file in memory is much cheaper than expected. For example, to load the 1728 C# sources files of NDepend it takes less than 0.2 seconds on the 13 seconds needed for a complete analysis of its own code. Now, knowing that it is almost free to load the source files in memory I am a bit less impressed by the performance of the C# and VB.NET compilers. And it leads to an important lesson: NEVER anticipate the performance cost, ALWAYS measure it. And the only professional way to measure is to rely on performance profiler tools. Actually, we use both dotTrace from JetBrain and Red-Gate ANTS Profiler and are happy with it.

     

    Something important concerning the measure of code performance is the percentage of time spent inside tier code, things like DB or network access. As we explained in the previous chart, in our case study 35% of the analysis time is spent inside the Cecil code. We can infer from this number that if we want to have an analysis duration divided by 2 in the future, our code will need to run 4.3 faster! ((100 – 35) / (50 -35) = 4,3). This clearly shows the importance of another lesson: assess your limits by measuring the percentage of time spent in tier code.

     

    In the dissection of performance gain, an interesting point is the 15% gain obtained from micro-optimization. I mean things like using the right kind of loop, using the right kind of collection, preferring to deal with primitive CLR objects as int and string (POCO idea), choosing carefully between structures and classes, buffering properties getter results into local variables or even exposing non-private fields. I could summarized the lesson learned by the fact that micro-optimizations are worth it while premature optimizations are the root of evil (as Donald Knuth said a long time ago). The difference between micro-optimizations and premature optimizations is that the first ones are driven by measurement while the second ones are driven by guesses, intuitions and hunches.


    Something not quantified in our performance gain is the fact that memory consumption has been roughly divided by 2. Less memory doesn’t necessarily means less managed objects but in our particular case we do allocate less objects. The nice consequence is that less objects means less time spent in the GC. Less memory also means less virtual memory page fault, which is IMHO the worst thing when it comes to polish performances of a program.

     

    Finally, we are now experiencing a great advantage of optimization: automatic tests naturally ran much faster and it is a good motivation to run them more often!

     

     

     

    I end up with an illustration of the tenet there is always room for better performance: the Amiga demo scene in which I participated in the early nineties. Every demos ran on a constant hardware. It was the perfect incentive for demo developers to produce code more and more optimized. During several years, every months some record were beaten, like the number of 3D polygons, the number of sprites or the number of dots displayed simultaneously at the rate of 50 frames per seconds. As far as I can estimate, the performance factor obtained in a few years was something around x50! Imagine what it means, running in one second a computation that took initially a minute. And as far as I know this massive gain was the result of both better algorithms (with many pre-computations and delegations to sub-chip) and micro-optimizations at assembly language level (better use of the chip registers, better use of the set of instructions...).

     

     

  • Composing Code Metrics Values

    NDepend provides 82 different code metrics which are all explained on this page. Several software development topics are addressed by these metrics, like:

     

    Metrics constitute one feature of NDepend amongst several others features like comparing code base snapshots, digging in the structure of a code base through visual artifact like matrix and graph, or defining active rules.

     

     

    I would like to focus here on some tricks to get the most of NDepend code metrics. Sometime the value of a metric is an immediate indicator to detect code flaws. Methods with more than 6 parameters and classes with a depth of inheritance greater than 8 are obviously flaw.

     

    But some other times, the value of a code metric might be meaningless outside of some contextual information. For example, a class 100% covered by tests is certainly a good thing. But this information is more relevant for a large class with 300 lines of code than for a mini-class with 10 lines of code.

     

    The same for a large method with 100 lines of code, this is a bad thing! But if the method has no cyclomatic complexity (meaning if the method has no loop, if, else, switch, try/catch…) this is not such a big deal since the method might still be easy to understand and to maintain. On the other hand a small method with 10 lines of code can become a real nightmare to maintain, especially if it is writing some fields.

     

    To handle properly such cases, we need to correlate different metrics and eventually some other properties of the code base. This is possible in NDepend thanks to the Code Query Language flexibility.

     

    For example you might want to know which classes are more than 95% covered by tests but not 100 covered yet. To avoid noise, you might wish to focus on larger classes. With CQL you can use an ORDER BY clause to sort classes by their number of lines of code and a TOP clause to limit the number of match.

     

     

     

    As you can see the query result automatically contains all metrics involved in the CQL query and it is up to you to append more metrics in the query. As a bonus you can see that for each metric displayed in the query result you'll get statistical information like average and standard deviation.

     



     

     

    Another example. Classes that use many other types (in other words, classes that have a high Efferent Coupling, Ce) are likely problematic. Indeed a high Ce often means too many responsibilities for a class. Pinpointing classes with high Ce is easy.

     

     

     

    But doing so, you will certainly match primarily UI classes, like forms and controls. Indeed, UI frameworks often foster classes with high Ce due to the need to have some kind of large mediator classes to makes the different UI components communicate. You might then want to discard forms and controls classes:

     

     

     

    And this last screenshot underlines a cool feature of CQL: the depth of usage metrics. You can see that each DeriveFrom XXX condition leads to a DepthOfDeriveFrom XXX metric. Obviously, here the values of this metric are N/A since we especially ask for classes that are not deriving from XXX. But you can use this ability to build depth of metrics usage on the fly to know for example where are your UI forms and sort the list by their DepthOfDeriveFrom Form class.

     

     

     

    The depth of usage trick is not limited to inheritance and you can use it on the object creation usage with the condition DepthOfCreateA, on the writing field usage with the condition DepthOfIsWritingField and more generally on any kind of usage with the condition DepthOfIsUsing and DepthOsIsUsedBy.

     

     

    As I explained previously on this blog, the depth of usage trick has several interesting applications such as getting all paths from a code element A to another code element B

     

     

    … or building some call graph or class diagram as explained here

     


     

  • An easy and efficient way to improve .NET code performances

     
    Currently, I am getting serious about measuring and improving performances of .NET code. I’ll post more about this topic within the next weeks. For today I would like to present an efficient optimization I found on something we all use massively: loops on collections. I talk here about the cost of the loop itself, i.e for and foreach instructions, not the cost of the code executed in the loop. I basically found that:

     

    • for loops on List<T> are a bit more than 2 times cheaper than foreach loops on List<T>.
    • Looping on array is around 2 times cheaper than looping on List<T>.
    • As a consequence, looping on array using for is 5 times cheaper than looping on List<T> using foreach (which I believe, is what we all do).

     

    The foreach syntax sugar is a blessing and for many complex loops it is not worth transforming foreach into for. But for certain kind of loops, like short loops made of one or two instructions nested in some other loops, these 5 times factor can become a huge bonus. And the good news is that the code of NDepend has plenty of these nested short loops because of algorithm such as Ranking, Level and dependencies transitive closure computation.

     

    Here is the code used to benchmark all this:

     

     

     
    Here are the results obtained with the System.Diagnostics.Stopwatch class:

     


     

    The fact that for is more efficient than foreach results from the fact that foreach is using an enumerator object behind the scene.

     

    The fact that iterating on array is cheaper than iterating on List<T> comes from the fact that the IL has instructions dedicated for array iteration. Here is the Reflector view of the methods:

     

     

     

    In the console output, we can see that foreach on array is a tiny bit more costly than for on array. Interestingly enough, I just checked that the C# compiler doesn’t use an enumerator object behind the scene for foreach on array.

     

    Some more interesting finding. The profiler dotTrace v3.1 gives some unrealistic results for this benchmark. IterateForeachOnList is deemed more than 40 times more costly than IterateForOnArray (instead of 5 times). I tested both RecordThreadTime and RecordWallTime mode of the profiler:

     

     

     

    Thus I also ran the benchmark with ANTS Profiler v4.1 and got some more realistic results, still not perfect however (IterateForeachOnList is now deemed 8 times more costly than IterateForOnArray,instead of 5 times).

     

     

     

    You can download the code of the benchmark here. All tests have been done with release compilation mode.

  • 2 talks on Architecture and on Regression at Prio Conference

     I'll give 2 talks on monday 10 november and tuesday 11 november at PrioConference in Baden Baden, Germany. Here are the abstracts:

     

     

    Architecture and Dependencies  (Monday 15h30 - 16h45)

    Why do we architect our software? Why do concepts such as component, abstraction, cohesion, layering or IoC are nowadays so popular? In this session, we will address these questions through the prism of dependencies. With appropriate tooling, a code base architecture can be concretely seen as a graph of dependencies. Good architectural practices and patterns results in enforcing some simple structural properties on this graph. Amongst properties exposed, we will focus on directed acyclic graph and on the need to keep components at low-level.

    This session aims at being practical oriented. Several real-world code bases will be dissected with the tool NDepend to illustrate principles and explanations presented.



    Avoid architecture regression with active conventions  (Tuesday 11h45 13h00)

    What the last decades taught us is that the code source is the design. Documentation on the reasons why a particular architectural decision has been taken sits aside the code base. Quickly, such architectural documentation becomes obsolete because the cost of keeping it in-sync with the code base is too high. Intentions behind design decisions get lost and as a result, the implementation ends up violating the initial guidance. This phenomenon is known as architecture regression/design erosion.

    In this session we will propose a way to avoid architecture erosion through the idea of custom active conventions. Concretely, custom conventions presented are written in the dedicated language Code Query Language (CQL), supported by the tool NDepend. CQL lets write conventions about a large range of architectural principles such as IoC, layering, cohesion, immutability/purity or encapsulation. Such convention is qualified as active because once integrated into a build process, it automatically warns as soon as it gets violated.

     

     
     

  • The (near) future of Code Correctness

    I just watched this amazing PDC presentation (the most impressing I saw so far): Contract Checking and Automated Test Generation with Pex by Nikolai Tillman and Mike Barnett.

     

     

     

    It is talking about .NET 4 support and tooling around contracts. I strongly encourage you to take an hour to watch it. .NET 4 will come with a contract API, System.Diagnostics.Contracts, much more compelling than the simple (yet very useful) System.Diagnostics.Debug.Assert(…). What is not said (unless I missed it?!) is if there will be some tooling to transform automatically existing Debug.Assert(…) within the new contract API calls. I hope so! The code base of NDepend contains currently 3 470 calls to Debug.Assert(..) for 69 256 lines of code.

     

    I take a chance here to praise the usefulness of contracts and more specifically of Debug.Assert(…). I always felt that there is too much emphasis in the community on automatic testing compare to contracts. For me, these 2 correctness techniques are equally efficient. And actually, what the presentation shows well thanks to the super-promising Pex tool, is that automatic testing and contracts are 2 sides of a same discipline that consists in finding code defects automatically. Pex will promote contract amongst automatic test addict developers and that is a good thing. As a side note, I explained in a previous post why and how Debug.Assert(…) checking must be activated during automatic tests run.Unit Test vs. Debug.Assert() 

     

    My feeling has always been that automatic tests and the code itself are 2 different ways to express the same thing: how a piece of code should behave. Code tells the machine how to do it, step by step (do this to compute result from an input) while automatic tests represent a more declarative way of saying what the code should do (this result must be computed from this input), what can be seen as specification. Generating automatically tests from code, this is the job of Pex, makes sense because as just said, the code already contains enough information needed to test it. There is a problem however: if the code contains some behavior defects, Pex will generate flawed specification. This is why contract are needed, to assert specifications that cannot be inferred from the code and see if the code abides by the contracts rules.


    Bets are open: C#5 will come with some syntax sugar that will generate IL code that will call the .NET 4 contract API under the hood, a bit like the using {...} syntax sugar calls the IDisposable API.

     

     

  • Advices for developers on starting an Independent Software Vendor (ISV) business

     

    As a developer (and if you read my blog there are good chances you are a developer) you have an enormous potential to start a company. I think that most developers don’t realize how big this potential is. I started an ISV business by selling the tool NDepend for .NET developers. I am still in the process of making the business growing and it is a great adventure. I would like to expose in this blog post some advices that could help a developer doing so.

     

     

    The cost of developing and selling software

     

    What do you need to start developing a software? A good PC, some development tools and an internet connection. Maximum 3000US$ investment (that you certainly have already invested). Most other engineering industries needs millions of $ investment. Think of the cost of starting a business of physical assets like electronic devices.

     

    Minimal investment is a good incentive but the biggest advantage of software over others engineering industries comes from the cost of unit. What does it cost to build one license of a software? 0 US$. What does it cost to build 1.000 licenses of a software? 0 US$.

     

    I am caricaturing and the provocation is intentional. Of course the cost of doing software is the investment made in development. Also, if you have 10.000 users, the cost of the support will certainly be 10 times the cost of supporting 1.000 users. Concerning the support it is all about making it scale. The experience we have with our tool, shows that playing the agility rules largely pays. Releasing often to make sure that all bugs reported are fixed, investing in the quality, investing in a neat UI by listening carefully to feedback, making sure to update the documentation each time a user ask a question, yes it pays. The number of questions asked and bug reported stays small and is not increasing linearly with the number of users.

     

     

    Advantages of becoming an ISV over doing another software development job

     

    Before digging further, I would like to underline the advantages of being an ISV over being a consultant, an employee or a trainer? Here are my thoughts:

    ·         You spend most of the time doing what you really like, coding.

    ·         Freedom! There is nobody to tell you what you should work on, when you should work or the amount of quality you want to put in your code. You can also (more than less) choose who you want to work with and the location from where you are working. South of France or a tropical island, nobody cares except you. The availability of a decent and reliable internet connection is the only limit.

    ·         Day after day, you capitalize on a code base and you can begin dream on. Successful small ISVs business sometime end up being bought by bigger companies.

    ·         The amount of incoming money is not proportional to the work put in it. If the software doesn’t sell well this is a very bad point. But in the case of successful sales, it can be much more income than any consultancy position. The reason is clear: as explained below, you can make it so that selling 1 or 1000 licenses cost you the same price. Software is the only industry that makes this magic possible. Is it by chance that the richest man in the world during the last 13 years was primarily a software developer?

    ·         Starting a business tends to be a positive point on a CV. Even if your business fails it doesn’t mean that you lost some time in your career. During a job interview you can ramble on the courage and motivation you had starting such an adventure and what you learnt on the way. Many interviewers you’ll meet are employee forever that might be impressed by what you dared.

     

    Controlling the human factor

    The only human factor you can control is your own motivation. How can you prevent anyone on earth potentially ruining your business? The answer I found is to multiply the number of clients. This is only possible if the price for a license is affordable, like a few hundreds of US$. The other option is to have a few clients and to sale expensive licenses. But then instead of having a powerful boss, you get a powerful client and here come the human factor.

     

    Don’t take me wrong here on the term powerful client. For example, we are constantly striving to give power to NDepend users by listening very carefully to any client feedback and do our best to satisfy each of their requests. Satisfying one user results in satisfying all users. On the other hand there is no user with enough power to impair our business. There is no user that invested more than a few hundreds or thousands of US$. If it happens that a company asks for its money back, this is not a problem. Hopefully it has never been even close to happen.

     

    Sometime you don’t have the choice on the range of clients: your idea of software will only help a few dozens of users, maybe important users that can pay hundreds of thousands of US$ licenses (like design tools for semiconductor). My opinion on this: forget it! If less than several dozens of thousands potential users and independent companies, then you will expose yourself to the human factor.

     

    Reducing the risk to almost 0

    What is the biggest risk? Quitting your job for a poor business and not being able to recover a job as good as you had.

     

    I would be scary to do some B2C (Business to Customer) software, even though the range of customers is virtually millions. Something like video game for mobile phones, real-estate web agency or web search engine. Last decade has proven that successful B2C software earn billions of US$ per year. On the other hand, the risk is extremely high. It is like playing lottery. How many web search engines do you use? How many web search engines projects have been started and failed?

     

    The safer solution is to find a B2B (Business to Business) market, to target professional users. You need to provide a software with an affordable license price (100/500 US$ per seat) that can save a few hours a week of some professional users. Professional users won’t hesitate to spend of few hundred US$ to save a few hours a week. To reduce the risk closer to 0, my advice is to start providing a free version of your software on the web. Then you can wait 6 to 12 months to see if indeed your software is able to get a few thousands of download. This will show if there is a real demand for what you want to do or if the usefulness of your software is just the product of your imagination. A bit of free marketing here (articles on specialized web sites, enthusiast’s bloggers reviews, user group presentations…) is needed for cranking what you are looking for: the word to mouth effect.

     

    This is what happened accidently for NDepend. At the beginning, 4 years and half ago, I developed a simple tool in a few days to compute a few metrics on a .NET code base. At that time, I was consulting on a large code base and part of my job was to help improving quality. Then I put the tool on the web simply because there were no reasons to not sharing it with others developers. One year and a half later, a few thousands of developers were interested in it. Some of them were very enthusiast about it. I realized that a small (but existing) percentage of them would be happy to pay a few hundreds bucks to have a more sophisticated tool to control precisely their code structure and quality.

     

    Do the math, a small percentage of a few thousands users a year, ready to spend a few hundreds of US$ each: it is enough to start making a living on it, and consequently it is enough to quit your job to work full time on it and with a bit of luck, starting hiring.

     

    Deciding what your software will do

    This is by far the most critical point. You absolutely need 2 fields of expertise: the ability of developing good software but also a strong competence on the field targeted by your product. A seasoned software developer usually lacks the second expertise. Which kind of professional users will your software help? Architect? Manager? Health pro? Insurance pro? Transportation pro? Financial pro? Administrator? Accounter? Security pro? Lawyer? Then you need to be associated with someone as motivated as you are and with many years of professional experience in the targeted field.

     

    We all saw several successful projects (in terms of sales) made by non-expert development team. I consulted in the past for a software company that made many millions of US$ a year by developing and selling a tool that made the life of Lotus Notes administrators a bit easer! What your software will do and how it will do is much more important than how it is developed. A very well coded software that is not useful to anyone is a failure. Hopefully having a sane development certainly helps satisfying more users and be able to add important features often.

     

    The market of software tools for developers is unique. Indeed, a developer can hold both the capacity to do the software and the expertise to imagine how to make the life of developers easier. The downside is that doing a tool for developers is very challenging because there are plenty of talented guys that already had many ideas. I wouldn’t even think of trying to start a project to compete with tools like Reflector, Resharper, CodeRush , DXPerience, dotTrace, ANTS Profiler, Dotfuscator, Llblgen, SmartFTP or Beyond Compare. Each of these tools is the result of many man-years of super talented developers. If you start a promising tool, some competitors will come and challenge you anyway. But at the beginning, it is your choice to avoid any field where some solid and recognized competitors are already there.

     

    However, each new technology represents an opportunity. Certainly, it is still time to do the next super useful tool for Silverlight developers. Within the next years there will be hundreds of thousands of Silverlight developers. The most enthusiasts ones will be ready to pay a few hundreds of US$ for a tool that will save them a few hours a week.

     

    A promising case-study of this tenet is the initiative of Ayende to develop a profiler for NHibernate users.

     

    Conclusion

    ·         Start small and take the time to assess in the real-world the sale potential of your software before quitting your job.

    ·         Target professional users that do spend money to save time and pain.

    ·         Don’t even start if you can’t count the potential number of professional users in thousands.

    ·         Don’t even start if some solid competitors are already addressing properly the need for your product.

    ·         The software should be able to pay the development bill with an affordable license price, like a few hundreds US$ per license.

    ·         Don’t even start if you need to borrow money to crank the business.

    ·         Once started, make sure to play the agile rules to limit the cost of the support. The support is the only recurrent cost in software that can potentially increase linearly with the number of users.

    ·         And the most important by far: be sure to have a super-solid expertise in the field your software will target. If you need a tier person for this expertise, ensure that his motivation is as strong as yours.

     

    After being a developer, a team leader, a consultant, a trainer, a software book writer, I found it very exciting starting an ISV business adventure. I hope that these few advices can help you doing so.

     

  • Comparing Hash Table Implementations Performances

     
    One of my preferred programming paradigm is hash table. I consider almost like magic being able to test if a collection contains a particular object, independently from the size of the collection (hence the term constant time often used to characterize hash tables). Personally, I use hash tables intensively and it is the cornerstone of all VisualNDepend and Code Query Language (CQL) implementation performances. Performances are here critical since the goal of NDepend is to let .NET developers knows anything about their code bases in real-time, even on millions lines of code application that spawns on multiple Visual Studio solutions (anything about their code bases include architecture/dependencies/layering facts, 82 code metric values, changes between 2 versions, state mutability facts, encapsulation facts...).

     

    I won’t go through all the details of hashtable here (load factor/collision/bucket/hash computation/hash on immutable state/caveat etc…). Jeff Atwood did it very well in his post Hashtables, Pigeonholes, and Birthday . Frankly, if you are not sure about hash table concepts or if you don’t use them on a daily basis, go read this post and understand why you are about to use hash tables extensively from now. Also, if you don’t believe me that hash tables are so central for good programming, then believe Microsoft engineers that provided the method Objet.GetHashCode() meaning that the hash concept applies for all .NET objects of all time.

     

    I was curious about comparing the performances of the different implementations of hash table .NET developers. My motivation was to see if it was worth switching from the .NET framework hash table implementations to some other ones.

     

    Basically there are the 2 classes in the .NET framework System.Collections.Generic.Dictionary<K,V> , System.Collections.Generic.HashSet<K> (I don't count the non-generic System.Collection.HashTable). There are also the implementations available in the 3 libraries C5 (by IT University of Copenhagen), PowerCollections  (by Wintellect) and Mono (by Novell). With the exception that PowerCollections doesn’t seem to provide its own Hash Dictionary.

     

    Find below the detailed results. The conclusion is that the .NET Framework implementations are a bit more cheaper than the Mono ones, while the C5 and PowerCollections implementations are behind. Basically .NET implementations are between 1.4 to 1.7 more performant on insertion than the Mono ones. Both are equivalent when it comes to test containment in HashSet<T> and the Mono implementation of Dictionary<K,V>.TryGetValue(...) is a tiny bit better than the .NET one. One has to consider that the big motivation for using hash tables is when there are a lot of containment tests/retrieval, since these operations are done in a constant time. Thus, the Mono flaw on insertion is not that serious and this is a good news for all of us that don't have access to the class System.Collections.Generic.HashSet<K> because we are still running on .NET v2.0. We can just use the Mono HashSet<T> implementation.

     

    Concerning memory , if you look at results you'll see that Mono collection seems up to  2 times cheaper that .NET ones but this results are biased. Indeed, some arrays objects are maintained and I cannot distinguish who reference what. I tried to do the memory tests separately but internally some arrays are maintained by the CLR ans it is still not possible to infer a clear result.

     

    Here is the C# source file used to do the tests. Notice that I prefixed Mono collections with Mono. to avoid collision with .NET Framework collections. All hash table tested have some string generic parameters, and we also did the test with int32 values, to get also a glimpse at how it behaves with value type. If you have feedback on these testing methodology please put them in comment. I know that the biggest concerns is the fact that a single insertion in an hash table can sometime takes a lot of time because internally it sometime triggers a re-arrangement of keys proportional to the number of elements. The only way to shorten these unwanted effects was to do tests for several values.

     

    The results for both performance and memory profiling were obtained with the great JetBrain dotTrace profiler. Concerning memory profiling tool I have to say that usually I use the Scitech .NET Memory Profiler which has some awesome real-time profiling capabilities without almost any overhead. I choosed dotTrace here because the result presentation is nicer and consistent with the performance result I wanted to present.

    Concerning performance profiling Red-Gate recently released .NET Performance Profiler v4.1 that has some great visualization features but honestly, I haven’t found the time yet to dig in it and see if I could prefer it over my good friend dotTrace.


     

     

     

    Hash Table: Performance Profiling Results

     

    Performance for 1M strings.

     

     

    Performance for 100K strings.

     

     

    Performance for 10K strings.

     

     

    Performance for 1000 int32.


     

    Performance for 10K int32.

     

     

     

    Hash Table: Memory Profiling

     

    Memory for 100 elements

     

     

    Memory for 1000 elements

     

     

    Memory for 10M elements

     

     

    Memory for 100M elements

     

     

    Memory for 1000M elements

     

     

     


     

  • Controlling the usage of libraries

     
    Recently, Scott Hanselman has proposed a survey to know which part of the .NET Framework real-world developers are using. The result will be interesting mostly for Microsoft engineers in charge of the .NET Framework. However, it might be interesting for a team to know which tier code its code base really uses. You certainly know coarsely if your code base is using or not, libraries such as ADO.NET, WPF, Windows Form, ASP.NET but also NHibernate, NUnit or Log4Net. But can you answer these 2 questions?

    ·         Which part of the code base uses which library?

    ·         Which types and members of the library are really used by your code base?

     

    Before rambling on why you should care about these questions and what are the perspectives offered by an accurate control of the usage of tier code, let's illustrate how NDepend can help answering such questions and more in seconds.

     

    Let’s analyze the code of the OSS (yet very professional) application Paint.NET v3.0. Assemblies of the application are in black while library/tier/framework assemblies are in blue. The Dependency Matrix shows which assemblies of the application are using which tier assemblies. For example, we can see that the library SharpZipLib is only used by the assembly Paint.NET.

     

     

     

    Things get really interesting when we unfold tier assemblies. Indeed, during analyzes, NDepend only focus on code elements of tier assemblies that are really used by the application. Here the Dependency Matrix unfold possibility is a powerful tool to dig in who is using what. For example, the following screenshot shows that 8 members of the class List<T> are used by 7 methods of the assembly Paint.DotNet.Data.

     

     

     

    One can easily browse this coupling with the Dependency Matrix…

     

     

     

    …but also, with the Dependency Graph which is clearly more appropriate here:

     

     

     

     

    Diff in the usage of tier code

     

    NDepend allows comparing 2 different builds of an application, to know which code has been refactored, added or removed. While a code base is evolving, the way it uses libraries is evolving also. Some libraries are discarded, some tier classes that wasn’t used are now used… While comparing 2 builds, NDepend also compare the tier code usage. Hence, while unfolding a library code in the class browser, we can easily pinpoint which tier code elements that weren’t used are now used (in bold), which tier code elements are not used anymore (striked) and which tier assemblies/namespaces/classes contains some usage change (underlined). The following screenshot illustrate the usage diff of collections between Paint.NET v2.72 and Paint.NET v3.0.

     

     

     

    We can see at a glance for example that between these 2 releases, developers of Paint.Net thought that using the interface IDictionary<TKey,TValue> was a good idea.

     

    If you prefer, you can harness the 3 dedicated Code Query language conditions IsUsedRecently / IsNotUsedAnymore / IsUsedDifferently to query the diff in the usage of tier code:

     

    SELECT TYPES FROM NAMESPACES "System.Collections.Generic" WHERE IsUsedRecently / IsNotUsedAnymore / IsUsedDifferently 

     

    Constraining the usage of tier code

    While this is cool to control the usage of tier code with such accuracy, I would like to explain how this possibility is useful for developers.

     

    One of the direct application is to forbid some incorrect usage of a library. For example, as explained here, while running with .NET v2.0 runtime, the ultra-popular method String.IsNullOrEmpty(String) can raise a NullReferenceException. I don’t really know if this bug has been fixed but in fact I don’t care, because we wrote the following CQL rule to make sure that we don’t use this method (we use our own implementation actually):

     

    // <Name>String.IsNullOrEmpty() is bugged and must not be used</Name>

    WARN IF Count > 0 IN SELECT METHODS

      WHERE IsDirectlyUsing "OPTIONAL:System.String.IsNullOrEmpty(String)"

     

    Notice the OPTIONAL: prefix. Without it, this query wouldn’t compile. Indeed, as the method IsNullOrEmpty() is not used by our code base, the analysis doesn’t reference it. Without the OPTIONAL: prefix the CQL compiler would emit a ‘Cannot find method named …’ error.

     

     

    Let's dig in another real-world example. We are relying on the DevExpress DXperience Windows Form library to enhance the usability of our product. Amongst plenty of cool stuff, this library lets control the skinning of controls.

     


     

    This is only possible if you use the DXperience buttons (and other controls) instead of the classical Windows Form button. To prevent a developer to use a classical Windows Form button, and then to guarantee a coherent skinning of our UI, we wrote the following rule:

     

    // <Name>All buttons should be of type DevExpress.XtraEditors.SimpleButton and not System.Windows.Forms.Button</Name>

    WARN IF Count > 0 IN SELECT FIELDS FROM ASSEMBLIES WHERE

      IsOfType "OPTIONAL:System.Windows.Forms.Button"

     

     

    The possibilities to control how your code uses libraries are endless. We provided by default a set of around 30 rules (some borrowed from FxCop, some others from our experience) about the usage of the .NET Framework. We expect to release more default rules in the future, also relative to popular framework usage such as NHibernate or Log4Net. What is really cool is that each time a developer detects a particular library usage issue, he can very easily define some custom rules about it. The rules can be named appropriately and contain further comment detailing the issue. This way its co-worker will be aware of this issue and won't redo the same error in the future. Hence such set CQL rules helps capitalizing on the experience acquired the hard-way on a library. Btw, don’t hesitate to report us the coolest rules you found on any popular library (.NET framework included).

     

     

    Separation of Concerns and usage of tier code

    I would like to mention that a few months ago, I explored in this blog post, Dependencies and Concerns, an interesting application of controlling the usage of tier code. Basically the idea is, tell me what you are using and I can tell you with what you are concerned about. Hence, the post explains how you can write some rules, to forbid for example using some ADO.NET stuff at the same time as using UI stuff. In other words, you get a mean to write rules to check that concerns of you application are properly separated (like, the UI code shouldn’t use directly the DB code).

     

    Another interesting field to explore is to make sure that special concerns such as threading, error handling, logging, users' authentication, security permissions... are strictly used within some bounded regions of your code.

    WARN IF Count > 0 IN SELECT METHODS OUT OF NAMESPACES
    "Product.UI.ThreadingHelper",
    "Product.DB.ThreadingHelper" WHERE
    IsDirectlyUsing
    "System.Threading"

     

     

  • Listen to your users

     

    While listening the users to improve the usability of your product seems to be a controversial issue, I am leaning toward the Listen to user camp. My opinion is biased by the fact that I have the exceptional chance to share the same job and same concerns with users of the product I am responsible for. As all of us in the NDepend team, our users are developers and architects with a solid interest in software quality and maintainability.

     

    I recently got a nice occurrence of the Listen to your users tenet. It is about the Structural Dependency Matrix, one of the most polished feature of NDepend. It has been released 2 years ago and since then, we are continuously improving it. We believe that the matrix is a unique tool to detect code structural flaws, to understand and explore the real architecture of a code base and to forecast the cost and feasibility of large refactoring. The matrix works hand-in-hand with the graph view, the CQL language and most other features of NDepend. It comes with dozens of options categorized with the Make the simple things simple and hard things possible idea in mind. It has been profiled to the point where every action execute in real time, even on hundreds of thousand’ cells matrix representing inter-dependencies of millions of lines of code. We dog-food it daily to improve the structure of the NDepend code itself. Despite all this work and expertise, a user came to us recently with a very useful option that we never thought about! The ice on the cake is that everything was here to let us implement and test it readily.

     

    The idea consists in letting the user remove empty rows/columns. It is especially useful when the dependency matrix is sparse. Here is a matrix representing the coupling between the 113 types of the namespace System.Drawing and the 205 methods of the class System.String. This matrix is sparse and not very handy to browse the who uses who in this coupling.

     

     

     

    Here is the same coupling where empty rows and columns have been removed. The sparse (113x205) matrix became a dense (27x29) matrix. It is now much easier to gather useful information about the System.Drawing to System.String.

     

     

     

     

    I believe that such a dense matrix represents the optimal way to browse complex coupling. The following coupling graph that shows the exact same information is partially unreadable:

     


     

     

    The questions are why we didn’t think of this feature before and why no user asked for it so far? The answer is that actually this feature already existed more or less, through the possibility of opening a dependency within the matrix by keeping only involved code elements. Here is an illustration of this possibility: