Category Archives: Language

编程语言漫谈

写在前边:我们知道现有语言的编程范式有:过程式,面向对象,函数式,逻辑式。随着软件工业化程度的普及,以及软件的复杂度越来越高,编程语言的发展历程 也是从最初的过程式(命令式)语言c,发展到以java语言为代表的面向对象编程语言。而逻辑编程语言(以prolog为代表)和函数式语言(lisp系 列)还多用在学术和人工智能领域中。近几年,随着多核,云计算时代的到来。函数式编程语言逐渐浮出水面,最经典的语言以scheme,common- lisp,ml,clojure,go为代表.而且最近的jdk8也逐步加入了functional,closure,lambda等语法,而且 scala的作者也越来越推崇编码者以函数式语言的思维来coding。可见编程语言的发展也是满足时代的变化不断变化着。从其中的发展历程中我们可以看 到:技术的发展都是在围绕着解决“软件的复杂度”这个基本的需求而发展的。

一、 编程语言概述

编程语言是计算机的符号,是人和计算机交流的语言。我们学习一门新的编程语言时,应该观察这门语言的那些特性呢?《SICP》一书的作者列举了一下三点:
* primitive elements. (基本元素)
* means of combination. (组合手段)
* means of abstraction. (抽象手段)
以上3个特性,基本上涵盖了所有编程语言的特性,并且也是一个语言设计者从开始就要考虑的。我对这三点的理解:primitive elements表示语言的基本符号(基本数据类型,关键字等)也就是词法部分。means of combination利用基本元素通过组合的过程构建大型程序的手段,不同的语言提供的组合手段是不同的。means of abstraction表示抽象,抽象是解决软件复杂度的重要手段,让软件的可读性,可扩展,可重复利用等得到提升。下边会从组合元素和抽象手段来对比语言特性。

二、 组合手段

汇编语言:算是最简单的词法和语法形式了,被称作低级语言。汇编器通过直译的过程将汇编代码翻译为native code(cpu指令集)。 提供的primitive elements有:数字,字符,-,+,*,/ , case,if, break, go,指令等基本元素;通过这些元素组合成计算机执行序列符号。
c语言相比汇编语言更高级,让程序员脱离和cpu,寄存器,内存直接打交道的工作,提供了更多的组合手段:比如数组,结构体等数据结构。
java语言自称是面向对象语言,所以比c语言更进一步,通过强大的类型系统手段来组合属性和方法。
go语言和ML语言非常相仿,“接口”,”高阶函数“,”闭包“,”duck type”,”返回多值”,并提供goroutine等来组合。
prolog语言完全是模式匹配的逻辑语言。他的思想基于:世间所有的定理都给予一个最简单的定理推理而来,就像数学的基础是“1+1=2”,然后才能推导出“万有引力”,“相对论”等定律。
lisp方言以s-expression(著名的S表达式)来组合数据和函数。在lisp中不区分数据和函数,一切皆为数据。
题外话:lisp方言是和图灵机的思想一脉相承的,编码的时候完全感受不到计算机体系结构。而其他语言更多的是冯·洛伊曼的计算,存储思想而设计,要么是“堆栈”结构,要么是“寄存器”结构;

三、 抽象手段

从c语言开始,以函数为单元提供了对程序的抽象。这样大大的提高了程序的可复用,模块化等。让团队合作编码也成为可能。
面向对象编程:基本上隐藏了计算机的细节,开发者通过对象来抽象具体业务。但严格意义上来说java也属于imperative-lang的范畴而且都是传值调用。相比来说,python,ruby更面向对象一些,python融合了面向对象和函数式编程范式,更接近自然语言些。
以lisp为代表的函数式语言:以函数为基本和唯一的抽象;common-lisp也基于宏开发了一套object-oriented的编程方式。我比较倾向于函数式编程理念:函数的无副作用(不用考虑线程安全,特别是对于变态的Haskell),高阶函数,闭包,lambda等。

四、 类型系统

大家平时经常会说:javascript是一个弱类型的语言,java语言是强类型的语言。将编程语言从类型系统的角度区分语言也很有趣。一般来说弱类型语言更偏向自然语言一些,语法也很自由活泼些。而现今语言的走势也趋向与弱类型方向.
计算机是结构化很强的,堆栈上一个二进制位的错误就会导致溢出,bus等错误。所以语言层面的自由得益于编译器或者解释器的功劳。比如java,c等语言有很强的编译时类型检测机制,强类型的好处驱使编码人员写出很少有语法,语义错误的代码,对IDE的支持也便捷,是大型技术团队的合作基石。
弱类型语言让我们获取了自由(不需要类型信息),让程序员少敲了许多键盘。自由是有代价的,编译器或解释器中内含类型推理(infer type); (类型推理是利用归一方法,基于上下文中的变量显式类型,操作符,返回值等信息,利用栈和逐渐替换的过程来推到出类型。) 弱类型虽然可以轻松编译通过(或者不需要编译而是解释执行),但也是有类型检查过程的,只是将此过程延迟到运行时了。所以弱类型语言结构化不强,编码时很难确保类型无误,IDE支持较难。但是通过一些分析器可以不断的检测语法,语义错误,相当于达到了强类型语言的IDE效果。近几年语言的方向朝着逐渐脱离计算机体系结构,向自然语言方向在演进,程序员像艺术家一样用代码自由描绘。

五、编译/解释

java语言是解释型还是编译型的呢? 这个很难说,从java source code -> class byte code 的过程式javac编译器的过程。但是byte code 在jvm上执行的过程可能是解释执行也可能是编译执行的。解释型和编译型的内部都遵从编译原理的过程:词法分析-> 语法分析->语义解析 -> 编译器后端-> native code的过程。 但有各自的优点:
解释器:加载code速度快;解释器需要维护运行时上下文等信息。所以加载必要的代码,片段解释执行。但是对于相同的代码都经过编译过程就很多余,造成时间浪费。
编译器:执行速度快。而且编译器后端也更容易优化中间代码,因为优化过程式一个结构化过程:往往需要遍历整个中间代码,整体优化代码,提高运行效率。
runtime:一般来说解释型语言需要在内存维护运行时上下文信息,服务于运行过程中变量的查找,绑定,scope等。 而编译语言基于寄存器堆栈模型执行代码,基本上数据绑定都在栈结构中完成,运行速度稍微快一些;
hotspot-jvm结合了解释和编译的各自优点;最先解释执行过程,如果方法被频繁执行,而且达到热点(hotspot),jvm会启动编译过程,将次代码编译为native-code,然后缓存起来,下一次的调用直接执行即可。hotspot-jvm执行基于堆栈指令bytecode, 这一点也是基于跨平台的考虑而牺牲了寄存器指令; (而基于android操作系统上的dalvik虚拟机是基于寄存器指令的);
所以说,语言的设计往往是一个权衡过程;获取的“自由”越多,”牺牲“也更大。

六、 总结:

最初从图灵为了解决莱布尼茨提出的:是否存在一个通用模型来解决一切计算任务这个命题,发明了图灵机理论。到冯洛伊曼仿真人脑神经元思考过程产生第一台基 于存储器,运算器的计算机ENIAC,至今,计算机硬件技术并没有实质性的变化,只是随着摩尔定律的破灭,人们发展了多级高速缓存,多核,多cpu技术来 支撑越来越大的计算任务。 在这个过程中,随着人们对逻辑学,符号学,算法的不断研究,用来和计算机交互的语言也越来越抽象和丰富。我们通过这个形象的符号来抽象时间和空间,通过这 个形象的符号来解决软件的复杂性问题,通过这个符号来和计算机传达我们的思想。

from:http://tech.youzan.com/programming-language/

Dynamic Type Using Reflection.Emit

Table of Contents

Hi Folks, Its long since I am writing this for you. I have been writing technical blogs though for you but really wanted to share one tutorial which would help the database. Finally thought of starting one.

 

After putting my efforts with Reflection classes, I thought I could make some research on code generation. I took the CodeDom being the best alternative to generate code. Sooner or later, I found out, CodeDom actually allows you to build your assembly but it is does not allow you to dynamically compile a part of the assembly at runtime, but rather it invokes the compiler to do that. So rather than CodeDom, I thought there must be something else which fruits my needs.

Next I found out one, using Expression Trees. If you are already following me, I think you know, few days back I have already written about Expression Trees and Lamda Decomposition. So it is not a good time to recap the same. Later on, I did some research on MSIL, and found it worth learning. If you are going to grow with .NET, it would be your added advantage if you know about MSIL. Hence I started looking at the MSIL. Finally I found out a number of classes which might help you to build a Type dynamically. Let me share the entire thing with you.

Introduction

Reflection.Emit like CodeDom allows you to build your custom assembly and provides you a number of Builder classes which might be compiled during Runtime, and hence invoke DLR capabilities of C#. The library also exposes one ILGenerator which might be used later to produce the actual MSIL by putting efforts to emit Operation codes. So finally after you write your OpCodes correctly, you could easily able to compile the type dynamically during runtime. In this post, I would use ILDASM to see the IL generated from our own class, that I define, and later on I would try to build the same class dynamically.

What is Reflection?

If you are struck with Reflection, then you need to really gear yourself a bit to go further. Let me give a brief explanation of Reflection. Reflection is actually a technique to read a managed dll which are referenced or not being referenced from the application and invoke its types. In other words, it is a mechanism to discover the types and call its properties at runtime. Say for instance, you have an external dll which writes logger information and sends to the server. In that case, you have two options.

 

  • You refer to the assembly directly and call its methods.
  • You use Reflection to load the assembly and call it using interfaces.

If you want to build really a decoupled architecture for your application, something like which could be plugged in later in the application, it is always better to choose the 2nd option. Let me clarify a bit more, say you want your customer to download the logging dll from your server and plugin to the application when needed. Believe me, there is no other alternative than using Reflection. Reflection classes allows you to load an external assembly to your application and call its types at run time.

To know more try Reflection Overview.

What is Reflection.Emit?

Being a part of Reflection, Reflection.Emit namespace list you a number of classes which you can use to build your type. As I have already told you, Reflection.Emit actually provides you some Builder classes like AssemblyBuilder, ModuleBuilder, ConstructorBuilder, MethodBuilder, EventBuilder, PropertyBuilder etc. which allows you to build your IL dynamically during run time. The ILGenerator provides you the capabilities to generate your IL and place the same for a method. Generally, it is very rare that a developer need these capabilities to generate an assembly at runtime, but it is great to find these capabilities present in the framework.

 

Now lets see what is required to build an assembly.

Steps to generate an Assembly

Now lets jump back to identify the steps to create the assembly :

  1. Create an Assembly in an Application Domain.AssemblyBuilder will help you in that.
  2. Create a Module inside the Assembly
  3. Create a number of Type inside a Module
  4. Add Properties, Methods, Events etc inside the Type.
  5. Use ILGenerator to write inside the Properties, Methods etc.

Basically these are the common steps to create your own dynamically created Assembly.

structure.JPG

From the above figure, you can clear how the entire structure of an CLR assembly goes. The AppDomain is the root of the hierarchy which creates Assembly, then Module, and then Type. If you see the IL more thoroughly, you will understand, Delegate is also a class inherited from System.MultiCastDelegate and Struct is derived from System.ValueTypes. Each type might contain its members, and each member Method or Properties can have its OPCodes, Locals and Parameters. Locals define the local variables you define inside a method body and OpCodes are the instruction codes for the IL.

Steps to create Dynamic Assembly

Now lets go step by step with IL to build one dynamic assembly yourself.

STEP 1. Create an Assembly Dynamically

Hide   Copy Code
public AssemblyBuilder GetAssemblyBuilder(string assemblyName)
{
    AssemblyName aname = new AssemblyName(assemblyName);
    AppDomain currentDomain = AppDomain.CurrentDomain; // Thread.GetDomain();     AssemblyBuilder builder = currentDomain.DefineDynamicAssembly(aname, 
                               AssemblyBuilderAccess.Run);
    return builder;
}

To create an assembly you need to have the name of the assembly to uniquely identify the assembly. I have used AssemblyName class to name the assembly for us. AppDomain is the place where the Assembly will be created. This is very important, as application mght have issues while calling cross domain objects. To make it more simplistic, rather than creating a new AppDomain, I am using CurrentDomain where the program is running. Finally I have created an object of AssemblyBuilder, which eventually builds up the Assembly with unique name aname. The AssemblyBuilderAccess specifies the accessibility of the assembly to us. As I did, if you use Run, that means the assembly could only be run dynamically using Reflection, it cannot be saved for future use. Browse through each of the values to see the output.

Note : If you have defined a custom attribute for the assembly, you can easily go for SetCustomAttriburte to add your custom attribute to the assembly.

Few features of AssemblyBuilder (For further reading)

AssemblyBuilder allows you to define a number of features like :

  • AddResourceFile : Allows you to specify a file to be added as resource to the assembly.
  • DefineUnmanagedResource / DefineResource : Adds one Unmanaged Resource for the assembly
  • EntryPoint/SetEntryPoint : A special subroutine / method to be defined which will be called automatically when the Assembly is invoked.
  • SetCustomAttribute : Lets you specify Attributes for the assembly.
  • DefineDynamicModule : Defines the Module for the assembly where the actual code will contain.

There are lot more flexibility while building the assembly. .NET put everything that we could have with the library and exposed methods to ensure we can do that from Reflection.Emit. You can try MSDN to read more about the methods it exposes.

STEP 2 : Create a Module

Module is a part of the object where the actual classes will remain. A module is container for all the classes we place therein. Let us create a module for us.

Hide   Copy Code
public ModuleBuilder GetModule(AssemblyBuilder asmBuilder)
{
    ModuleBuilder builder = asmBuilder.DefineDynamicModule("EmitMethods", 
                                           "EmitMethods.dll");
    return builder;
}

Thus the method is actually expecting a ModuleName, a unique module name and the Filename to which the assembly will be exported to.

Few useful methods

Module exposes few methods like :

  • DefineEnum : Lets you define an Enum, it returns back an EnumBuilder.
  • DefineType : Lets you to define a type / class.
  • DefineManifestResource : A dll contains a binary manifest. This method lets you define the manifest for you.
  • DefinePInvokeMethod : Allows you to define a PInvoke method (COM) for the assembly.

STEP 3 : Create a Type

This is the main thing. To create a class, structure, delegate etc you need to define a TypeBuilder. Now from here onwards I will look into the actual IL generated using ILDASM and then produce the same output for you.

Hide   Copy Code
public TypeBuilder GetType(ModuleBuilder modBuilder, string className)
{
    TypeBuilder builder = modBuilder.DefineType(className, TypeAttributes.Public);
    return builder;
}

public TypeBuilder GetType(ModuleBuilder modBuilder, string className, 
                                  params string[] genericparameters)
{
    TypeBuilder builder = modBuilder.DefineType(className, TypeAttributes.Public);
    GenericTypeParameterBuilder[] genBuilders = builder.DefineGenericParameters(
                                                    genericparameters);

    foreach (GenericTypeParameterBuilder genBuilder in genBuilders) 
       // We take each generic type T : class, new()     {
        genBuilder.SetGenericParameterAttributes(
                 GenericParameterAttributes.ReferenceTypeConstraint | 
                              GenericParameterAttributes.DefaultConstructorConstraint);
        //genBuilder.SetInterfaceConstraints(interfaces);     }

    return builder;
}

The above GetType method has two overloads. As you can see the first one is simple one where I have just specified the name of the class and the ModuleBuilder and the method returns the TypeBuilder.

In the second overload, I have put an additional param array of string which defines each Generic type for the class. GenericTypeParameterBuilder allows you to define the GenericTypeParameter. Once you define the GenericTypeParameters and set its constraint attributes, you can the builder.

Few useful methods

Compared to classes, TypeBuilder allows you to define full fledged structure with all the options you have. Some of them are :

  • DefineField / DefineMethod / DefineProperties / DefineEvent : You might use these methods to generate class members.
  • DefineMethodOverride : Allows you to override an existing method when the Type is inherited from another base type
  • DefineConstructor / DefineDefaultConstructor : Specifies the constructor for the current type.
  • AddInterfaceImplementation : Allows you to implement the current type from another interface.

STEP 4 : Create Method

Method are the building block of any program. We will define a number of Methods to clear up the concepts on how easily you could build a method from IL. For the time being, lets we create a dynamic method using MethodBuilder.

Hide   Shrink   Copy Code
public MethodBuilder GetMethod(TypeBuilder typBuilder, string methodName)
{
    MethodBuilder builder = typBuilder.DefineMethod(methodName, 
                        MethodAttributes.Public | MethodAttributes.HideBySig);
    return builder;
}
public MethodBuilder GetMethod(TypeBuilder typBuilder, string methodName, 
                      Type returnType, params Type[] parameterTypes)
{
    MethodBuilder builder = typBuilder.DefineMethod(methodName, 
                      MethodAttributes.Public | MethodAttributes.HideBySig, 
                             CallingConventions.HasThis, returnType, parameterTypes);
    return builder;
}

public MethodBuilder GetMethod(TypeBuilder typBuilder, string methodName, 
                   Type returnType, string[] genericParameters, params Type[] 
                                                                  parameterTypes)
{
    MethodBuilder builder = typBuilder.DefineMethod(methodName, 
         MethodAttributes.Public | MethodAttributes.HideBySig, 
                     CallingConventions.HasThis, returnType, parameterTypes);

    GenericTypeParameterBuilder[] genBuilders = 
                           builder.DefineGenericParameters(genericParameters);

    foreach (GenericTypeParameterBuilder genBuilder in genBuilders) 
                  // We take each generic type T : class, new()     {
        genBuilder.SetGenericParameterAttributes(
                   GenericParameterAttributes.ReferenceTypeConstraint | 
                       GenericParameterAttributes.DefaultConstructorConstraint);
        //genBuilder.SetInterfaceConstraints(interfaces);     }
    return builder;
}

So the above methods will return you the MethodBuilder which lets you to define your IL code. You can see I have specified 3 overloads for this. The overloads allows you to put parameters and also to put Generic Type parameters for the methods.

Now, after building your type you need to create Locals (local variables) and use OpCode instructions.

Using ILGenerator to Emit OpCodes

To define your OpCodes you will need ILGenerator. ILGenerator allows you to Emit IL for your method body and hence lets you create the instruction set for the method that you are about to build. Let me first introduce some of the few instruction sets that helps you to add two integer variables passed into the method and return a floating value as a result.

Hide   Copy Code
public void CreateMethod()
{
    AppDomain currentDomain = AppDomain.CurrentDomain;
    AssemblyBuilder asmbuilder = this.GetAssemblyBuilder("MyAssembly");
    ModuleBuilder mbuilder = this.GetModule(asmbuilder);
    TypeBuilder tbuilder = this.GetTypeBuilder(mbuilder, "MyClass");

    Type[] tparams = { typeof(System.Int32), typeof(System.Int32) };
    MethodBuilder methodSum = this.GetMethod(tbuilder, "Sum", typeof(System.Single), 
                                                                 tparams);

    ILGenerator generator = methodSum.GetILGenerator();

    generator.DeclareLocal(typeof(System.Single));  
    generator.Emit(OpCodes.Ldarg_1);    
    generator.Emit(OpCodes.Ldarg_2);    
    generator.Emit(OpCodes.Add_Ovf);    
    generator.Emit(OpCodes.Conv_R4);    
    generator.Emit(OpCodes.Stloc_0);    

    generator.Emit(OpCodes.Ldloc_0);    
    generator.Emit(OpCodes.Ret);        
}

If you minutely see the above code, the code actually creates an Assembly in CurrentDomain and having a dynamically created type MyClass in it. The class hence created will contain the method Sum in it.

The lines with ILGenerator.Emit actually emits the IL to the method body Sum. Every method must declare local stack element to run its data. In IL, we declare Local variables before calling any instruction codes. Just like this, I have used DeclareLocal to declare a float32 Local in the method. The DeclareLocal method actually returns a LocalBuilder which you might use as well to manipulate the index of this variable. After we declare all the locals, we first load argument list Ldarg_1 and Ldarg_2(as first argument is implicit object this). The Add_Ovf actually adds the two loaded arguments and pass it to the local variable Stloc_0 (which represents the top element of the Stack or the Local variable we created at Index 0). Next the Ldloc_0 pops the value and returns it back to the external world.

Now here is the most easiest sample of producing your own Type. Now let me go a bit further to build a more concrete Type

A more concrete example

As we have already built our own Type, its time to give you an example in building a more concrete Type. Before we demonstrate let me show you the code which we are going to create dynamically during runtime and call its method to get the output. Please note that, I have tried to simplify the code in an extent so that it helps you to understand the code better..

Hide   Shrink   Copy Code
public interface IBuilder
{
    float Sum(int firstnum, int secondnum);
    float Substract(int firstnum, int secondnum);
    float Multiply(int firstnum, int secondnum);
    float Divide(int firstnum, int secondnum);
}

public class Builder : IBuilder
{

    # region Event
    public delegate void BuilderDelegate(string message);

    public event BuilderDelegate InvokeMessage;

    public virtual void OnInvokeMessage(string message)
    {
        if (this.InvokeMessage != null)
            this.InvokeMessage(message);
    }

    # endregion

    # region Fields
    private int firstNum, secondNum;
    public int FirstNum
    {
        [DebuggerStepThrough()]
        get { return this.firstNum; }
        set { this.firstNum = value; }
    }

    [DebuggerBrowsable(DebuggerBrowsableState.Never)]
    public int SecondNum
    {
        get { return this.secondNum; }
        set { this.secondNum = value; }
    }

    # endregion

    # region Constructors
    public Builder(int firstnum, int secondnum)
    {
        this.FirstNum = firstnum;
        this.SecondNum = secondnum;
    }

    # endregion

    #region IBuilder Members

    public float Sum(int firstnum, int secondnum)
    {
        return firstnum + secondnum;
    }

    public float Substract(int firstnum, int secondnum)
    {
        return firstnum - secondnum;
    }

    public float Multiply(int firstnum, int secondnum)
    {
        return firstnum * secondnum;
    }

    public float Divide(int firstnum, int secondnum)
    {
        try
        {
            return firstnum / secondnum;
        }
        catch (DivideByZeroException ex)
        {
            Console.WriteLine("ZeroDivide exception : {0}", ex.Message);
            return 0;
        }
    }

    #endregion

    # region Methods

    public float GetProduct()
    {
        return this.Multiply(this.FirstNum, this.secondNum);
    }
    public override string ToString()
    {
        return string.Format("FirstNum : {0}, SecondNum : {1}", 
                                           this.FirstNum, this.SecondNum);
    }

    # endregion

}

In the above class, I have actually declared an Interface IBuilder which later I have implemented to produce a class Builder. The class contains few methods, events, properties etc. to allow you understand each and every flexibilities you have with it.

ILDASM1.JPG

After you see the code lets open ILDASM and see how the IL looks like the picture above. Basically it contains two type .class

  1. .class interface for IBuilder
  2. .class for Builder implementing IBuilder.

Other than that you will see another type for the Main method for my console application which you can omit for time being as we are going to concentrate on the Type.

To build a dynamic type, the most important thing that we have discussed already are Builder classes. The BCL exposes a number of Builder classes that enables us to generate MSIL code dynamically during runtime and hence you can compile the same to produce the output.

From the above figure I have put some of the most important Builder classes marked in Red. But ultimately, you need instructions to run for your application. To write your IL Expression, Reflection.Emit provides a class call ILGenerator. ILGenerator (marked in blue) enables you to write your IL for a method or property. OpCodes are Operation code that determined Computer instructions. So while writing your instructions you need to pass your OpCodes and generate the instruction set for the Method.

Now going further with our example, let me demonstrate the code one by one so that you could easily build your own Code generator.

Implement IBuilder for your Assembly

Lets move to the actual implementation of IBuilder interface. As per our discussion, IBuilder contains 4 members, Sum, Divide, Multiply & substract.

Hide   Copy Code
public interface IBuilder
{
    float Sum(int firstnum, int secondnum);
    float Substract(int firstnum, int secondnum);
    float Multiply(int firstnum, int secondnum);
    float Divide(int firstnum, int secondnum);
}

So, as I am new to IL, lets use ILDASM to see what exactly the IL looks like and hence we will try to implement the same for our assembly.

Hmm… After I build and Open ILDASM to disassemble my assembly the IL it produces looks like

Hide   Copy Code
.class interface public abstract auto ansi EmitMethodSample.IBuilder
{
    .method public hidebysig newslot abstract virtual 
                    instance float32  Divide(int32 firstnum,
                        int32 secondnum) cil managed
    {
    }
    .method public hidebysig newslot abstract virtual 
                    instance float32  Sum(int32 firstnum,
                    int32 secondnum) cil managed
    {
    }
    .method public hidebysig newslot abstract virtual 
                    instance float32  Multiply(int32 firstnum,
                        int32 secondnum) cil managed
    {
    }
    .method public hidebysig newslot abstract virtual 
                    instance float32  Substract(int32 firstnum,
                        int32 secondnum) cil managed
    {
    }

}

Let me explain the IL a bit.

  • In MSIL any type is defined with .class, hence our Type IBuilder gets one .class for it.
  • interface, abstract keyword identifies the Type to be abstract and hence you cannot create object of the same.
  • Auto specifies the LPSTR(Long Pointer to string) interpreted automatically.
  • Ansi specifies the LPSTR(Long Pointer to String) interpreted as ANSI.
  • Methods in IBuilder is identified using .method keyword.
  • Because of being member of an interface, the methods appear as abstract virtual.
  • instance keyword specifies the object to be non-static
  • hidebysig specifies the method can be hidden by both name and signature. Any normal method you define gets this flexibility in .NET.
  • NewSlot makes the member to get a slot in vtable. (A vtable is the memory area for the whole object. So whenever an object is created, a vtable is created each for each object and any object created within it gets an entry of vtable. The first member being the hidden pointer can be used to find members of the vtable)
  • cil managed is used to determine that the method is implemented in managed environment.

Now as you now understand the IL generated by the Interface, its time to get through to create the Type IBuilder. Lets build the code for it :

Hide   Shrink   Copy Code
private Type CreateIBuilder(ModuleBuilder mbuilder)
{

    TypeBuilder tbuilder = mbuilder.DefineType("IBuilder", TypeAttributes.Interface | 
        TypeAttributes.Public | 
        TypeAttributes.Abstract | 
        TypeAttributes.AutoClass | 
        TypeAttributes.AnsiClass);

    //Define Divide
    Type[] tparams = { typeof(System.Int32), typeof(System.Int32) };
    MethodBuilder metDivide = tbuilder.DefineMethod("Divide", MethodAttributes.Public | 
        MethodAttributes.Abstract | 
        MethodAttributes.Virtual |
        MethodAttributes.HideBySig | 
        MethodAttributes.NewSlot, 
        CallingConventions.HasThis, 
        typeof(System.Single), tparams);
    metDivide.SetImplementationFlags(MethodImplAttributes.Managed);

    MethodBuilder metSum = tbuilder.DefineMethod("Sum", MethodAttributes.Public |
        MethodAttributes.Abstract |
        MethodAttributes.Virtual |
        MethodAttributes.HideBySig |
        MethodAttributes.NewSlot,
        CallingConventions.HasThis,
        typeof(System.Single), tparams);
    metSum.SetImplementationFlags(MethodImplAttributes.Managed);

    MethodBuilder metMultiply = tbuilder.DefineMethod("Multiply", 
        MethodAttributes.Public |
        MethodAttributes.Abstract |
        MethodAttributes.Virtual |
        MethodAttributes.HideBySig |
        MethodAttributes.NewSlot,
        CallingConventions.HasThis,
        typeof(System.Single), tparams);
    metMultiply.SetImplementationFlags(MethodImplAttributes.Managed);

    MethodBuilder metSubstract = tbuilder.DefineMethod("Substract", 
        MethodAttributes.Public |
        MethodAttributes.Abstract |
        MethodAttributes.Virtual |
        MethodAttributes.HideBySig |
        MethodAttributes.NewSlot,
        CallingConventions.HasThis,
        typeof(System.Single), tparams);
    metSubstract.SetImplementationFlags(MethodImplAttributes.Managed);

    Type tIBuilder = tbuilder.CreateType();

    return tIBuilder;
}

In the above code we first create the TypeBuilder from ModuleBuilder.DefineType. You should note, I have added the TypeAttributes in the same way as it was in MSIL. After we create the TypeBuilder, we can next add up the methods. The DefineMethod method helps in building the methods as we define the MethodAttributes correctly. CallingConvensions.HasThis will make the method as instance method.. We also need to mention the ReturnType and argument types specifically. In this case I have specified the ReturnType as System.Single(float) and arguments as integers for our code. It should be noted, we need to use SetImplementationFlags to specify the methods to be cil managed.

So as our IBuilder interface is ready, its time to build our actual Type.

Implementing the Builder Class

Hmm, now its time to go final workaround with this. First I will create the basic class with methods only required to implement the interface IBuilder, later on we will add the delegate, event, a new method, a static method etc.

To create the basic Builder class, the first thing that we need is a Constructor. But as our constructor also adds few lines to initialize properties FirstNum and SecondNum, let me define them first.

1. Implementing the Type

In IL most of the type is .class, as the whole body depends on the Type header, it is good to start with by building the Type signature. In terms of IL, as in ILDASM it looks like:

Hide   Copy Code
.class public auto ansi beforefieldinit EmitMethodSample.Builder
    extends [mscorlib]System.Object
    implements EmitMethodSample.IBuilder
{
}

So basically the class extends System.Object (as for any class which does not inherit form other class) and in our case it implements IBuilder.

I guess, building the type using TypeBuilder should be easy for you, let me build it again for you :

Hide   Copy Code
Type[] interfaces = { parentBuilder };
TypeBuilder tbuilder = mbuilder.DefineType("Builder", TypeAttributes.Public |
    TypeAttributes.AutoClass |
    TypeAttributes.AnsiClass |
    TypeAttributes.BeforeFieldInit,
    typeof(System.Object),
    interfaces);

So here in the code I have implemented the Builder from parentBuilder which is the Type object of IBuilder interface. If you focus on the code, you will see I have specified BeforeFieldInit for the type, which means you can call the static members without initializing the object. I have also implemented the Type from System.Object according to IL.

2. Implementing the Field & Properties

As our type is ready now, let us add some field and properties on the Type. As shown in the Builder Type we have two fields to store numeric values each of which is wrapped around using property wrappers. Lets see what exactly I am talking about :

Hide   Copy Code
private int firstNum, secondNum;
public int FirstNum
{
    get { return this.firstNum; }
    set { this.firstNum = value; }
}

public int SecondNum
{
    get { return this.secondNum; }
    set { this.secondNum = value; }
}

So FirstNum and SecondNum are the two Properties that we need to declare for our type. Now if you look back for the IL implementation, it looks like :

Hide   Shrink   Copy Code
.field private int32 firstNum
.property instance int32 FirstNum()
{
    .set instance void EmitMethodSample.Builder::set_FirstNum(int32)
    .get instance int32 EmitMethodSample.Builder::get_FirstNum()
}
.method public hidebysig specialname instance int32 
        get_FirstNum() cil managed
{
    .custom instance void 
         [mscorlib]System.Diagnostics.DebuggerStepThroughAttribute::.ctor() = 
                                                             ( 01 00 00 00 ) 
    // Code size 12 (0xc)     .maxstack  1
    .locals init (int32 V_0)
    IL_0000:  nop
    IL_0001:  ldarg.0
    IL_0002:  ldfld      int32 EmitMethodSample.Builder::firstNum
    IL_0007:  stloc.0
    IL_0008:  br.s       IL_000a
    IL_000a:  ldloc.0
    IL_000b:  ret
}
.method public hidebysig specialname instance void 
        set_FirstNum(int32 'value') cil managed
{
    // Code size 9 (0x9)     .maxstack  8
    IL_0000:  nop
    IL_0001:  ldarg.0
    IL_0002:  ldarg.1
    IL_0003:  stfld      int32 EmitMethodSample.Builder::firstNum
    IL_0008:  ret
}

So considering the IL, property seems to be a wrapper for two methods one with get_PropertyName and another with set_PropertyName where get_PropertyName returns the value and set_PropertyName sets the value. So according to IL defination if you are going to implement the code it will look like :

Hide   Copy Code
FieldBuilder fFirst = tbuilder.DefineField("firstNum", typeof(System.Int32), 
   FieldAttributes.Private);
PropertyBuilder pFirst = tbuilder.DefineProperty("FirstNum", 
        PropertyAttributes.HasDefault, typeof(System.Int32), null);
//Getter MethodBuilder mFirstGet = tbuilder.DefineMethod("get_FirstNum", MethodAttributes.Public | 
    MethodAttributes.SpecialName | 
    MethodAttributes.HideBySig, typeof(System.Int32), Type.EmptyTypes);
ILGenerator firstGetIL = mFirstGet.GetILGenerator();

firstGetIL.Emit(OpCodes.Ldarg_0);
firstGetIL.Emit(OpCodes.Ldfld, fFirst);
firstGetIL.Emit(OpCodes.Ret);

//Setter MethodBuilder mFirstSet = tbuilder.DefineMethod("set_FirstNum", MethodAttributes.Public |
    MethodAttributes.SpecialName |
    MethodAttributes.HideBySig, null, new Type[] { typeof(System.Int32) });

ILGenerator firstSetIL = mFirstSet.GetILGenerator();

firstSetIL.Emit(OpCodes.Ldarg_0);
firstSetIL.Emit(OpCodes.Ldarg_1);
firstSetIL.Emit(OpCodes.Stfld, fFirst);
firstSetIL.Emit(OpCodes.Ret);

pFirst.SetGetMethod(mFirstGet);
pFirst.SetSetMethod(mFirstSet);

Ohh, its huge…. .Yes, let me explain. First I have added a Field firstNum which is a numeric private variable. A FieldBuilder helps you to add a field into the IL. To define a property you need to first define the property itself and then you have to define two methods one for Getter and one for Setter such that the Getter returns System.Int32 and the Setter takes System.Int32 as argument.

The OpCodes provide the entire expression set. ldarg loads the argument and Ldfld and Stfld loads and sets the fields into the field fFirst.

A Nice thing to talk in this regard

So you must understand now, a property actually reserves the methods with get_Property and set_Property with the same signature. Say for instance you define a class with the following :

Hide   Copy Code
private string first;
public string First { get { return this.first; } set { this.first = value; } }

public string get_First()
{
    return this.first;
}
public void set_First(string value)
{
    this.first = value;
}

The class will not compile, as get_First and set_First is already reserved and the compiler throws warning doing this.

weird.JPG

Isn’t it interesting to know ?

3. Implementing the Constructor

If your class does not define a constructor in it, the C# compiler automatically puts in a default constructor for you. How it does? Actually for any class, the default constructor for System.Object is automatically inherited to the object and hence you do not need to define a default constructor in such case. It will be written in IL only when you define the default constructor explicitly.

In our case, I have explicit declaration of a parametrized constructor, as it is good to show you the code for that.

Hide   Copy Code
public Builder(int firstnum, int secondnum)
{
    this.FirstNum = firstnum;
    this.SecondNum = secondnum;
}

To do this let me quickly show you how the constructor looks like in terms of IL.

Hide   Copy Code
.method public hidebysig specialname rtspecialname 
        instance void  .ctor(int32 firstnum,
                int32 secondnum) cil managed
{
    // Code size 26 (0x1a)     .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  nop
    IL_0007:  nop
    IL_0008:  ldarg.0
    IL_0009:  ldarg.1
    IL_000a:  call       instance void EmitMethodSample.Builder::set_FirstNum(int32)
    IL_000f:  nop
    IL_0010:  ldarg.0
    IL_0011:  ldarg.2
    IL_0012:  call       instance void EmitMethodSample.Builder::set_SecondNum(int32)
    IL_0017:  nop
    IL_0018:  nop
    IL_0019:  ret
}

To build a constructor, we need to create object of ConstructorBuilder which you can get from DefineConstructor method of TypeBuilder. If you see the IL, you can see, the IL actually calls the constructor of System.Object first. This is needed, as any object internally inherited from Base System.Object.

The setFirstNum and set_SecondNum is called from the IL to set the values for FirstNum and SecondNum of the class.

Hide   Copy Code
Type[] parameters = { typeof(System.Int32), typeof(System.Int32) };
ConstructorBuilder cBuilder = tbuilder.DefineConstructor(MethodAttributes.Public | 
    MethodAttributes.HideBySig | 
    MethodAttributes.SpecialName | 
    MethodAttributes.RTSpecialName, 
    CallingConventions.Standard, 
    parameters);

ConstructorInfo conObj = typeof(object).GetConstructor(new Type[0]);

ILGenerator cil = cBuilder.GetILGenerator();
cil.Emit(OpCodes.Ldarg_0);
cil.Emit(OpCodes.Call, conObj);
cil.Emit(OpCodes.Nop);
cil.Emit(OpCodes.Nop);
cil.Emit(OpCodes.Ldarg_0);
cil.Emit(OpCodes.Ldarg_1);
cil.Emit(OpCodes.Call, mFirstSet);
cil.Emit(OpCodes.Nop);
cil.Emit(OpCodes.Ldarg_0);
cil.Emit(OpCodes.Ldarg_1);
cil.Emit(OpCodes.Call, mSecondSet);
cil.Emit(OpCodes.Nop);
cil.Emit(OpCodes.Nop);
cil.Emit(OpCodes.Ret);

The SpecialName in MethodAttribute for Constructor lets the method to be special to CLR. Hence the name of the method ctor makes it a constructor of the class.

To call the constructor of System.Object, we need to fetch the constructor of the object. I have used Reflection to get ConstructorInfo from Type and passed in to Call OpCode. We emit the code as specified in the IL and hence the constructor will be generated.

An Interesting thing to remember

One interesting thing to remember about Reflection.Emit is that, it internally sends a hidden object to every method it calls. This is the implicit object call which we identify as “this” in C# or “Me” in Vb. Thus when we call Ldarg_0 for OpCodes, we are actually mentioning to the implicit object passed into the constructor as first argument. So any parameter we specify starts with Index 1.

The only difference between the Constructor and a normal method is that, a Constructor does not return a value. In CLR, a method actually returns the top element in the stack immediately if OpCodes.Ret is received. So if your stack loads a value into stack before calling Ret, you will get “Invalid Program” exception when you create object of the type. So in such cases Nop should be invoked before calling Ret to consume a processing cycle.

Now that we have defined the constructor let me move forward to define the methods.

4. Implementing the Methods from IBuilder

Now as our constructor is ready, its time to implement the IBuilder object and define the Methods for us. As we are going through with code, I think it must be clear to you how to build your own custom objects. Lets try out the Divide method of IBuilder and implement the same for us.

The Divide method that we have declared earlier looks like

Hide   Copy Code
public float Divide(int firstnum, int secondnum)
{
    try
    {
        return firstnum / secondnum;
    }
    catch (DivideByZeroException ex)
    {
        Console.WriteLine("ZeroDivide exception : {0}", ex.Message);
        return 0;
    }
}

So basically from this method, you can understand how you could call an external member function from an object like what I have done using Console.WriteLine here, and also lets you understand how you could use try/Catch block during your Code generation. So, without wasting time, lets Open up ILDASM again and see how different the code looks like

Hide   Shrink   Copy Code
.method public hidebysig newslot virtual final 
            instance float32  Divide(int32 firstnum,
                        int32 secondnum) cil managed
    {
        // Code size 39 (0x27)         .maxstack  2
        .locals init (class [mscorlib]System.DivideByZeroException V_0,
                float32 V_1)
        IL_0000:  nop
        .try
        {
        IL_0001:  nop
        IL_0002:  ldarg.1
        IL_0003:  ldarg.2
        IL_0004:  div
        IL_0005:  conv.r4
        IL_0006:  stloc.1
        IL_0007:  leave.s    IL_0024
        }  // end .try         catch [mscorlib]System.DivideByZeroException 
        {
        IL_0009:  stloc.0
        IL_000a:  nop
        IL_000b:  ldstr      "ZeroDivide exception : {0}"
        IL_0010:  ldloc.0
        IL_0011:  callvirt   instance string [mscorlib]System.Exception::get_Message()
        IL_0016:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                        object)
        IL_001b:  nop
        IL_001c:  ldc.r4     0.0
        IL_0021:  stloc.1
        IL_0022:  leave.s    IL_0024
        }  // end handler         IL_0024:  nop
        IL_0025:  ldloc.1
        IL_0026:  ret
    }

Now the implmentation states that the method loads 1st and 2nd argument and use Div operation to divide the values and Conv.r4 actually converts the result to float32 value. The Local stack element declared as float32 is used here and the converted result is pushed back to the stack again using stloc.1. If everything works fine, the applicaion passes its control to IL_0024 resulting the method to return the local stack value in 1st position.

So let us implement the same using Builder objects

Hide   Shrink   Copy Code
MethodBuilder mDivide = tbuilder.DefineMethod("Divide", MethodAttributes.Public |
    MethodAttributes.HideBySig |
    MethodAttributes.NewSlot |
    MethodAttributes.Virtual |
    MethodAttributes.Final,
    CallingConventions.Standard,
    typeof(System.Single),
    new Type[] { typeof(System.Int32), typeof(System.Int32) });
mDivide.SetImplementationFlags(MethodImplAttributes.Managed);
ILGenerator dil = mDivide.GetILGenerator();

dil.Emit(OpCodes.Nop);
Label lblTry = dil.BeginExceptionBlock();

dil.Emit(OpCodes.Nop);
dil.Emit(OpCodes.Ldarg_1);
dil.Emit(OpCodes.Ldarg_2);
dil.Emit(OpCodes.Div);
dil.Emit(OpCodes.Conv_R4); // Converts to Float32
dil.Emit(OpCodes.Stloc_1);
dil.Emit(OpCodes.Leave, lblTry);

dil.BeginCatchBlock(typeof(DivideByZeroException));
dil.Emit(OpCodes.Stloc_0);
dil.Emit(OpCodes.Nop);
dil.Emit(OpCodes.Ldstr, "ZeroDivide exception : {0}");
dil.Emit(OpCodes.Ldloc_0);
MethodInfo minfo = typeof(DivideByZeroException).GetMethod("get_Message");
dil.Emit(OpCodes.Callvirt, minfo);
MethodInfo wl = typeof(System.Console).GetMethod("WriteLine", new Type[] 
                                      { typeof(string), typeof(object) });
dil.Emit(OpCodes.Call, wl);
dil.Emit(OpCodes.Nop);
dil.Emit(OpCodes.Ldc_R4, 0.0);
dil.Emit(OpCodes.Stloc_1);
dil.Emit(OpCodes.Leave_S, lblTry);

dil.EndExceptionBlock();
dil.Emit(OpCodes.Nop);
dil.Emit(OpCodes.Ldloc_1);
dil.Emit(OpCodes.Ret);

To open a Try block in IL, you need to use BeginExceptionBlock. Be sure to store the Label so that you could move to specific IL instruction code whenever required. Now Before starting the Catch block which is notified using BeginCatchBlock we need to Leave the Try block using OpCodes.Leave with the LabelName. This will ensure that the application maintains Scope.

You can see, we could have more than one catch block, and each of them will be identified by the Type passed into BeginCatchBlock. Hence, we just Load the string inside Catch block and call the Method WriteLine of Console to show the string. Finally, again before calling EndExceptionBlock, we leave the Try/catch block.

You should note, whenever you are about to call a method, you need to use a MethodInfo object.

Building a Delegate

As it is easy to create the other methods for your Type, lets move further to create a Delegate for you. It would be a good idea to show you how to build a Delegate for a class. Building a delegate differs from building other members. Lets say we need to declare a delegate for our Type as :

Hide   Copy Code
public delegate void BuilderDelegate(string message);

Now after seeing the one liners, dont think that the declaration of IL would be as simple as this. The IL looks like :

Hide   Copy Code
.class auto ansi sealed nested public BuilderDelegate
    extends [mscorlib]System.MulticastDelegate
{
    .method public hidebysig specialname rtspecialname 
        instance void  .ctor(object 'object',
                        native int 'method') runtime managed
    {
    }
    .method public hidebysig newslot virtual 
        instance class [mscorlib]System.IAsyncResult 
        BeginInvoke(string message,
            class [mscorlib]System.AsyncCallback callback,
            object 'object') runtime managed
    {
    }
    .method public hidebysig newslot virtual 
        instance void  EndInvoke(class [mscorlib]System.IAsyncResult result) 
                                                               runtime managed
    {
    }
    .method public hidebysig newslot virtual 
        instance void  Invoke(string message) runtime managed
    {
    }
}

Oh my god, a delegate is actually defined as a nested Type (.class) inherited from System.MulticastDelegate inside the actual Type Builder. It also declares methods like Invoke, BeginEnvoke and EndEnvoke inside the new Type declaration. So lets Build the Type to help you in this even though I think you can create this yourself now.

Hide   Copy Code
TypeBuilder tdelegate = tbuilder.DefineNestedType("", TypeAttributes.AutoClass | 
    TypeAttributes.AnsiClass | 
    TypeAttributes.Sealed | 
    TypeAttributes.Public, typeof(System.MulticastDelegate));

MethodBuilder methodBeginInvoke = tdelegate.DefineMethod("BeginInvoke",
    MethodAttributes.Public |
    MethodAttributes.HideBySig |
    MethodAttributes.NewSlot |
    MethodAttributes.Virtual,
    typeof(IAsyncResult), new Type[] { typeof(string), typeof(AsyncCallback), 
      typeof(object) });
methodBeginInvoke.SetImplementationFlags(MethodImplAttributes.Runtime | 
MethodImplAttributes.Managed);

MethodBuilder methodEndInvoke = tdelegate.DefineMethod("EndInvoke", 
 MethodAttributes.Public | 
    MethodAttributes.HideBySig | 
    MethodAttributes.NewSlot | 
    MethodAttributes.Virtual,null, new Type[] { typeof(IAsyncResult)});
methodEndInvoke.SetImplementationFlags(MethodImplAttributes.Runtime | 
 MethodImplAttributes.Managed);

MethodBuilder methodInvoke = tdelegate.DefineMethod("Invoke", MethodAttributes.Public |
    MethodAttributes.HideBySig |
    MethodAttributes.NewSlot | MethodAttributes.Virtual, CallingConventions.Standard, 
                   null, new Type[] { typeof(string) });

Now moving further, I would recommend you to try out other methods too, you can try using EventBuilder class to build events, CustomAttributeBuilder to build your own user defined Custom Attributes etc.

Wrapping up the Whole thing

Now after we have implemented all the methods, let me check whether I have correctly produced the IL or not.

Hide   Copy Code
//Step 1 : Create the Assembly AssemblyBuilder asmBuilder = this.GetAssemblyBuilder("MyBuilder");

//Step 2: Add A Module to the Assembly ModuleBuilder mbuilder = this.GetModule(asmBuilder);

//Step 3: Add the Type IBuilder Type iBuilder = this.CreateIBuilder(mbuilder);

//Step 4 : Implement IBuilder to create Builder Type Builder = this.CreateBuilderImpl(mbuilder, iBuilder);

dynamic variable = Activator.CreateInstance(Builder, new object[] { 20, 10 });
float result = variable.Sum(30, 40);
Console.WriteLine("Result for Sum(30, 40) : {0}", result);
result = variable.Substract(50, 25);
Console.WriteLine("Result for Substract(50, 25) : {0}", result);
result = variable.Multiply(3, 5);
Console.WriteLine("Result for Multiply(3, 5) : {0}", result);
result = variable.Divide(30, 5);
Console.WriteLine("Result for Divide(30, 5) : {0}", result);

You should note, I have used dynamic to avoid unnecessary usage of Reflection classes again.

So it looks fine and after I compile I get output like

ilcode.JPG

So the IL generated works great for us. You can also save the IL as Assembly using

Hide   Copy Code
asmBuilder.Save("MyBuilder.dll");

Dealing with Bad IL

While building your IL, you may often come accross a situation that your IL does not compile.

invalidprogram.JPG

Never worry, .NET comes with a free tool which gets installed with your Visual Studio will help you rescue from these scenarios. The common type of exception that takes place is “Common Language Runtime detected an invalid program”. The tool is named as PeVerify.exe.

Once you install the Visual Studio, you can open console and try invoking the following statement

peverify.exe <assemblypath>\yourassembly.dll After it compiles the assembly again, it will give you the actual exception that took place while building the Type. You can read more about PEVerify from : MSDN Reference for PEVerify [^]

References

There are quite a number of references over internet which might help you in this topic. Lets enumerate them for you

History

Initial Post – 26th October 2010

Conclusion

Its fun creating the article for you. Even I am excited to put this effort to write one article for you. Its all new to me as well, but I tried my level best to put as many as I can. I hope you would appreciate my post.

If you think I did any mistake, please let me know, as I am no masterpiece in this, so that we could make this article better. Thank you for reading the article.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

from:http://www.codeproject.com/Articles/121568/Dynamic-Type-Using-Reflection-Emit

王垠:如何掌握程序语言

学习程序语言是每个程序员的必经之路。可是这个世界上有太多的程序语言,每一种都号称具有最新的“特性”。所以程序员的苦恼就在于总是需要学习各种稀奇古怪的语言,而且必须紧跟“潮流”,否则就怕被时代所淘汰。

作为一个程序语言的研究者,我深深的知道这种心理产生的根源。程序语言里面其实有着非常简单,永恒不变的原理。看到了它们,就可以在很短的时间之内就能学会并且开始使用任何新的语言,而不是花费很多功夫去学习一个又一个的语言。

对程序语言的各种误解

学习程序语言的人,经常会出现以下几种心理,以至于他们会觉得有学不完的东西,或者走上错误的道路。以下我把这些心理简要分析一下。

1. 程序语言无用论。这是国内大学计算机系的教育常见的错误。教授们常常对学生灌输:“用什么程序语言不重要,重要的是算法。”而其实,程序语言却是比算法更加精髓的东西。任何算法以及它的复杂度分析,都是相对于某种计算模型,而程序语言就是描述这种计算模型的符号系统。算法必须用某种语言表述出来,通常算法设计者使用伪码,这其实是不严谨的,容易出现推理漏洞。算法设计再好,如果不懂得程序语言的原理,也不可能高效的实现。即使实现了,也可能会在模块化和可扩展性上面有很大问题。某些算法专家或者数学家写出来的程序极其幼稚,就是因为他们忽视了程序语言的重要性。

2. 追求“新语言”。基本的哲学告诉我们,新出现的事物并不一定是“新事物”,它们有可能是历史的倒退。事实证明,新出现的语言,可能还不如早就存在的。其实,现代语言的多少“新概念”不存在于最老的一些语言里呢?程序语言就像商品,每一家都为了拉拢程序员作广告,而它们绝大多数的设计都可能是肤浅而短命的。如果你看不透这些东西的设计,就会被它们蒙蔽住。很多语言设计者其实并不真的懂得程序语言设计的原理,所以常常在设计中重复前人的错误。但是为了推销自己的语言和系统,他们必须夸夸其谈,进行宗教式的宣传。

3. “存在即是合理”。记得某人说过:“不能带来新的思维方式的语言,是没有必要存在的。”他说的是相当正确的。世界上有这么多的语言,有哪些带来了新的思维方式呢?其实非常少。绝大部分的语言给世界带来的其实是混乱。有人可能反驳说:“你怎么能说A 语言没必要存在?我要用的那个库L,别的语言不支持,只能用A。”但是注意,他说的是存在的“必要性”。如果你把存在的“事实”作为存在的“必要性”,那就逻辑错乱了。就像如果二战时我们没能打败希特勒,现在都做了他的奴隶,然后你就说:“希特勒应该存在,因为他养活了我们。”你的逻辑显然有问题,因为如果历史走了另外一条路(即希特勒不存在),我们会过上自由幸福的生活,所以希特勒不应该存在。对比一个东西存在与不存在的两种可能的后果,然后做出判断,这才是正确的逻辑。按照这样的推理,如果设计糟糕的A 语言不存在,那么设计更好的B 语言很有可能就会得到更多的支持,从而实现甚至超越L 库的功能。

4. 追求“新特性”。程序语言的设计者总是喜欢“发明”新的名词,喜欢炒作。普通程序员往往看不到,大部分这些“新概念”其实徒有高深而时髦的外表,却没有实质的内涵。常常是刚学会一个语言A,又来了另一个语言B,说它有一个叫XYZ 的新特性。于是你又开始学习B,如此继续。在内行人看来,这些所谓的“新特性”绝大部分都是新瓶装老酒。很多人写论文喜欢起这样的标题:《XYZ:A Novel Method for …》。这造成了概念的爆炸,却没有实质的进步。

5. 追求“小窍门”。很多编程书喜欢卖弄一些小窍门,教你如何让程序显得“短小”。比如它们会跟你讲 “(i++) –(++i)” 应该得到什么结果;或者追究运算符的优先级,说这样可以少打括号;要不就是告诉你“if 后面如果只有一行代码就可以不加花括号”,等等。殊不知这些小窍门,其实大部分都是程序语言设计的败笔。它们带来的不是清晰的思路,而是是逻辑的混乱和认知的负担。比如C 语言的++ 运算符,它的出现是因为C 语言设计者们当初用的计算机内存小的可怜,而 “i++” 显然比 “i=i+1″ 少2 个字符,所以他们觉得可以节省一些空间。现在我们再也不缺那点内存,可是++ 运算符带来的混乱和迷惑,却流传了下来。现在最新的一些语言,也喜欢耍这种语法上的小把戏。如果你追求这些小窍门,往往就抓不住精髓。

6. 针对“专门领域”。很多语言没有新的东西,为了占据一方土地,就号称自己适合某种特定的任务,比如文本处理,数据库查询,WEB编程,游戏设计,并行计算。但是我们真的需要不同的语言来干这些事情吗?其实绝大部分这些事情都能用同一种通用语言来解决,或者在已有语言的基础上做很小的改动。只不过由于各种政治和商业原因,不同的语言被设计用来占领市场。就学习而言,它们其实是无关紧要的,而它们带来的“学习负担”,其实差不多掩盖了它们带来的好处。其实从一些设计良好的通用语言,你可以学会所有这些“专用语言”的精髓,而不用专门去学它们。

7. 宗教信仰。很多人对程序语言有宗教信仰。这跟人们对操作系统有宗教信仰很类似。其实如果你了解程序语言的本质,就会发现其实完全没必要跟人争论一些事情。某个语言有缺点,应该可以直接说出来,却被很多人忌讳,因为指出缺点总是招来争论和憎恨。这原因也许在于程序语言的设计不是科学,它类似于圣经,它没法被“证伪”。没有任何实验可以一下子断定那种语言是对的,那种是错的。所以虽然你觉得自己有理,却很难让人信服。没有人会去争论哪家的汉堡更好,却有很多人争论那种语言更好。因为很多人把程序语言当成自己的神,如果你批评我的语言,你就是亵渎我的神。解决的办法也许是,不要把自己正在用的语言看得太重要。你现在认为是对的东西,也许不久就会被你认为是错的,反之亦然。

如何掌握程序语言

看到了一些常见的错误心理,那么我们来谈一下什么样的思维方式会更加容易的掌握程序语言。

1. 专注于“精华”和“原理”。就像所有的科学一样,程序语言最精华的原理其实只有很少数几个,它们却可以被用来构造出许许多多纷繁复杂的概念。但是人们往往忽视了简单原理的重要性,匆匆看过之后就去追求最新的,复杂的概念。他们却没有注意到,绝大部分最新的概念其实都可以用最简单的那些概念组合而成。而对基本概念的一知半解,导致了他们看不清那些复杂概念的实质。比如这些概念里面很重要的一个就是递归。国内很多学生对递归的理解只停留于汉诺塔这样的程序,而对递归的效率也有很大的误解,认为递归没有循环来得高效。而其实递归比循环表达能力强很多,而且效率几乎一样。有些程序比如解释器,不用递归的话基本没法完成。

2. 实现一个程序语言。学习使用一个工具的最好的方式就是制造它,所以学习程序语言的最好方式就是实现一个程序语言。这并不需要一个完整的编译器,而只需要写一些简单的解释器,实现最基本的功能。之后你就会发现,所有语言的新特性你都大概知道可以如何实现,而不只停留在使用者的水平。实现程序语言最迅速的方式就是使用一种像Scheme 这样代码可以被作为数据的语言。它能让你很快的写出新的语言的解释器。我的GitHub 里面有一些我写的解释器的例子(比如这个短小的代码实现了Haskell 的lazy 语义)。

几种常见风格的语言

下面我简要的说一下几种常见风格的语言以及它们的问题。

1. 面向对象语言

事实说明,“面向对象”这整个概念基本是错误的。它的风靡是因为当初的“软件危机”(天知道是不是真的存在这危机)。设计的初衷是让“界面”和“实现”分离,从而使得下层实现的改动不影响上层的功能。可是大部分面向对象语言的设计都遵循一个根本错误的原则:“所有的东西都是对象(Everything is an object)。”以至于所有的函数都必须放在所谓的“对象”里面,而不能直接被作为参数或者变量传递。这导致很多时候需要使用繁琐的设计模式(design patterns) 来达到甚至对于C 语言都直接了当的事情。而其实“界面”和“实现”的分离,并不需要把所有函数都放进对象里。另外的一些概念,比如继承,重载,其实带来的问题比它们解决的还要多。

“面向对象方法”的过度使用,已经开始引起对整个业界的负面作用。很多公司里的程序员喜欢生搬硬套一些不必要的设计模式,其实什么好事情也没干,只是使得程序冗长难懂。

那 么如何看待具备高阶函数的面向对象语言,比如Python, JavaScript, Ruby, Scala? 当然有了高阶函数,你可以直截了当的表示很多东西,而不需要使用设计模式。但是由于设计模式思想的流毒,一些程序员居然在这些不需要设计模式的语言里也采用繁琐的设计模式,让人哭笑不得。所以在学习的时候,最好不要用这些语言,以免受到不必要的干扰。到时候必要的时候再回来使用它们,就可以取其精华,去其糟粕。

2. 低级过程式语言

那么是否C 这样的“低级语言”就会好一些呢?其实也不是。很多人推崇C,因为它可以让人接近“底层”,也就是接近机器的表示,这样就意味着它速度快。这里其实有三个问题:

1) 接近“底层”是否是好事?

2)“速度快的语言”是什么意思?

3) 接近底层的语言是否一定速度快?

对于第一个问题,答案是否定的。其实编程最重要的思想是高层的语义(semantics)。语义构成了人关心的问题以及解决它们的算法。而具体的实现(implementation),比如一个整数用几个字节表示,虽然还是重要,但却不是至关重要的。如果把实现作为学习的主要目标,就本末倒置了。因为实现是可以改变的,而它们所表达的本质却不会变。所以很多人发现自己学会的东西,过不了多久就“过时”了。那就是因为他们学习的不是本质,而只是具体的实现。

其次,谈语言的“速度”,其实是一句空话。语言只负责描述一个程序,而程序运行的速度,其实绝大部分不取决于语言。它主要取决于1)算法和2)编译器的质量。编译器和语言基本是两码事。同一个语言可以有很多不同的编译器实现,每个编译器生成的代码质量都可能不同,所以你没法说“A 语言比B 语言快”。你只能说“A 语言的X 编译器生成的代码,比B 语言的Y 编译器生成的代码高效”。这几乎等于什么也没说,因为B 语言可能会有别的编译器,使得它生成更快的代码。

我举个例子吧。在历史上,Lisp 语言享有“龟速”的美名。有人说“Lisp 程序员知道每个东西的值,却不知道任何事情的代价”,讲的就是这个事情。但这已经是很久远的事情了,现代的Lisp 系统能编译出非常高效的代码。比如商业的Chez Scheme 编译器,能在5秒钟之内编译它自己,编译生成的目标代码非常高效。它可以直接把Scheme 程序编译到多种处理器的机器指令,而不通过任何第三方软件。它内部的一些算法,其实比开源的LLVM 之类的先进很多。

另外一些函数式语言也能生成高效的代码,比如OCaml。在一次程序语言暑期班上,Cornell 的Robert Constable 教授讲了一个故事,说是他们用OCaml 重新实现了一个系统,结果发现OCaml 的实现比原来的C 语言实现快了50 倍。经过C 语言的那个小组对算法多次的优化,OCaml 的版本还是快好几倍。这里的原因其实在于两方面。第一是因为函数式语言把程序员从底层细节中解脱出来,让他们能够迅速的实现和修改自己的想法,所以他们能够迅速的找到更好的算法。第二是因为OCaml 有高效的编译器实现,使得它能生成很好的代码。

从上面的例子,你也许已经可以看出,其实接近底层的语言不一定速度就快。因为编译器这种东西其实可以有很高级的“智能”,甚至可以超越任何人能做到的底层优化。但是编译器还没有发展到可以代替人来制造算法的地步。所以现在人需要做的,其实只是设计和优化自己的高层算法。

3. 高级过程式语言

很早的时候,国内计算机系学生的第一门编程课都是Pascal。Pascal 是很不错的语言,可是很多人当时都没有意识到。上大学的时候,我的Pascal 老师对我们说:“我们学校的教学太落后了。别的学校都开始教C 或者C++ 了,我们还在教Pascal。”现在真正理解了程序语言的设计原理以后我才真正的感觉到,原来Pascal 是比C 和C++ 设计更好的语言。它不但把人从底层细节里解脱出来,没有面向对象的思维枷锁,而且有一些很好的设计,比如强类型检查,嵌套函数定义等等。可是计算机的世界真是谬论横行,有些人批评Pascal,把优点都说成是缺点。比如Brain Kernighan 的这篇《Why Pascal is Not My Favorite Programming Language》,现在看来真是谬误百出。Pascal 现在已经几乎没有人用了。这并不很可惜,因为它被错怪的“缺点”其实已经被正名,并且出现在当今最流行的一些语言里:Java, Python, C#, ……

4. 函数式语言

函数式语言相对来说是当今最好的设计,因为它们不但让人专注于算法和对问题的解决,而且没有面向对象语言那些思维的限制。但是需要注意的是并不是每个函数式语言的特性都是好东西。它们的支持者们经常把缺点也说成是优点,结果你其实还是被挂上一些不必要的枷锁。比如  OCaml 和SML,因为它们的类型系统里面有很多不成熟的设计,导致你需要记住太多不必要的规则。

5. 逻辑式语言

逻辑式语言(比如Prolog)是一种超越函数式语言的新的思想,所以需要一些特殊的训练。逻辑式语言写的程序,是能“反向运行”的。普通程序语言写的程序,如果你给它一个输入,它会给你一个输出。但是逻辑式语言很特别,如果你给它一个输出,它可以反过来给你所有可能的输入。其实通过很简单的方法,可以不费力气的把程序从函数式转换成逻辑式的。但是逻辑式语言一般要在“pure”的情况下(也就是没有复杂的赋值操作)才能反向运行。所以学习逻辑式语言最好是从函数式语言开始,在理解了递归,模式匹配等基本的函数式编程技巧之后再来看Prolog,就会发现逻辑式编程简单了很多。

从何开始

可是学习编程总要从某种语言开始。那么哪种语言呢?就我的观点,首先可以从Scheme 入门,然后学习一些Haskell (但不是全部),之后其它的也就触类旁通了。你并不需要学习它们的所有细枝末节,而只需要学习最精华的部分。所有剩余的细节,会在实际使用中很容易的被填补上。现在我推荐几本比较好的书。

《The Little Schemer》(TLS):我觉得Dan Friedman 的The Little Schemer 是目前最好,最精华的编程入门教材。这本书很薄,很精辟。它的前身叫《The Little Lisper》。很多资深的程序语言专家都是从这本书学会了Lisp。虽然它叫“The Little Schemer”,但它并不使用Scheme 所有的功能,而是忽略了Scheme 的一些毛病,直接进入最关键的主题:递归和它的基本原则。

《Structure and Interpretation of Computer Programs | 计算机程序的构造和解释》(SICP):The Little Schemer 其实是比较难的读物,所以我建议把它作为下一步精通的读物。SICP 比较适合作为第一本教材。但是我需要提醒的是,你最多只需要看完前三章。因为从第四章开始,作者开始实现一个Scheme 解释器,但是作者的实现并不是最好的方式。你可以从别的地方更好的学到这些东西。不过也许你可以看完SICP 第一章之后就可以开始看TLS。

《A Gentle Introduction to Haskell》:对于Haskell,我最开头看的是A Gentle Introduction to Haskell,因为它特别短小。当时我已经会了Scheme,所以不需要再学习基本的函数式语言的东西。我从这个文档学到的只不过是Haskell 对于类型和模式匹配的概念。

过度到面向对象语言

那么如果从函数式语言入门,如何过渡到面向对象语言呢?毕竟大部分的公司用的是面向对象语言。如果你真的学会了函数式语言,就会发现面向对象语言已经易如反掌。函数式语言的设计比面向对象语言简单和强大很多,而且几乎所有的函数式语言教材(比如SICP)都会教你如何实现一个面向对象系统。你会深刻的看到面向对象的本质以及它存在的问题,所以你会很容易的搞清楚怎么写面向对象的程序,并且会发现一些窍门来避开它们的局限。你会发现,即使在实际的工作中必须使用面向对象语言,也可以避免面向对象的思维方式,因为面向对象的思想带来的大部分是混乱和冗余。

深入本质和底层

那么是不是完全不需要学习底层呢?当然不是。但是一开头就学习底层硬件,就会被纷繁复杂的硬件设计蒙蔽头脑,看不清楚本质上简单的原理。在学会高层的语言之后,可以进行“语义学”和“编译原理”的学习。

简言之,语义学(semantics) 就是研究程序的符号表示如何对机器产生“意义”,通常语义学的学习包含lambda calculus 和各种解释器的实现。编译原理(compilation) 就是研究如何把高级语言翻译成低级的机器指令。编译原理其实包含了计算机的组成原理,比如二进制的构造和算术,处理器的结构,内存寻址等等。但是结合了语义学和编译原理来学习这些东西,会事半功倍。因为你会直观的看到为什么现在的计算机系统会设计成这个样子:为什么处理器里面有寄存器(register),为什么需要堆栈(stack),为什么需要堆(heap),它们的本质是什么。这些甚至是很多硬件设计者都不明白的问题,所以它们的硬件里经常含有一些没必要的东西。因为他们不理解语义,所以经常不明白他们的硬件到底需要哪些部件和指令。但是从高层语义来解释它们,就会揭示出它们的本质,从而可以让你明白如何设计出更加优雅和高效的硬件。

这就是为什么一些程序语言专家后来也开始设计硬件。比如Haskell 的创始人之一Lennart Augustsson 后来设计了BlueSpec,一种高级的硬件描述语言,可以100% 的合成(synthesis) 为硬件电路。Scheme 也被广泛的使用在硬件设计中,比如Motorola, Cisco 和曾经的Transmeta,它们的芯片设计里面含有很多Scheme 程序。

这基本上就是我对学习程序语言的初步建议。以后可能会就其中一些内容进行更加详细的阐述。

 

转自:http://www.2cto.com/kf/201208/147722.html

类型的本质和函数式实现

在上一篇文章《二叉树迭代器算法》中,我介绍了一种基于栈的二叉树迭代器实现。程序设计语言和Haskell大牛@九瓜 在看过之后评论到:

这里用了 stack 来做,有点偷懒,所以错失了一个抽象思考机会。如果我们能够理解二叉树到线性表的转换过程,完全可以把 Iterator 当作抽象的线性表来看,只要定义了关于 Iterator 的 empty, singleton, 还有 append 操作,实现二叉树的 Iterator 就变得非常直观。

“错失了一个抽象思考机会”是什么意思呢?我理解九瓜的意思是基于栈的实现虽然是正确的,但它缺乏对于迭代器类型本质的理解,不具有通用性。如果能对迭代器进行合适地抽象就可以像二叉树递归遍历一样自然地得出二叉树迭代器,甚至其他更复杂的数据结构,只要我们能写出它的遍历算法,迭代器算法都可以自然推出。

类型的本质

九瓜提到了通过empty, singleton和append操作对Iterator进行抽象,我本来打算直接根据这个思路介绍函数式的二叉树迭代器实现,但是考虑到其实首要的问题在于理解类型的本质,而并不是所有人都具备这个基础,不如先普及一下类型基础再进入具体实现。那么下面我们就先来认识一下类型到底是什么?我们先以来看看表示元素对的Pair类型,可能有人一提到Pair类型马上就会在脑海中浮现出下面的结构:

 

struct Pair {
    int left;
    int right;
}

其实,这种理解是非本质的,Pair完全可以用2个元素的数组来表示,第一个元素表示left,第二个元素表示right:

struct Pair {
    int elements[2];
}

上面的两种不同表示是类型的不同实现,而类型的本质是由操作(Operation)和操作间的关系或不变式(Invariant)所定义的,我们称之为类型规范(Type Specification)。比如,Pair类型是这样定义的:

Type Pair:
    Operations:
        Pair make_pair(int x, int y)
        int get_left(Pair pair)
        int get_right(Pair pair)
    Invariants:
        get_left(make_pair(x, y)) == x  //对x, y构造的Pair取左元素等于x
        get_right(make_pair(x, y)) == y  //对x, y构造的Pair取右元素等于y

也就是说只要是满足Pair类型规范,即定义了make_pair,get_left, get_right这3种操作,并且这些操作满足上面两个不变式,那么它这就是Pair类型。我们再来看看稍微复杂一点的Stack类型:

Type Stack:
    Operations:
        Stack make_stack()  //构造空栈
        Stack push(Stack stack, int x)  //将元素x压栈,返回新栈
        int top(stack)  //获取栈顶元素
        Stack pop(Stack stack)  //将栈顶元素弹栈,返回新栈
    Invariants:
        top(push(stack, x)) == x  //栈顶元素为最后一次压栈值
        pop(push(stack, x)) == stack  //对stack压栈后再弹栈等于原来的stack

Stack类型规范简言之就是FILO(先入后出),如果要形式化就是上面的不变式。为了加深理解,我们现在切换到测试视角来看一看,如果请你来编写一个Stack类的单元测试用例,你应该怎么写呢?许多朋友都不清楚单元测试到底测什么?怎么测?我见过不少人用一个测试用例单独对push做测试,用另一个测试用例对pop单独做测试,其主要原因就在于缺乏对类型本质的理解。其实,只要理解了类型的本质我们就知道孤立地看push或pop是没有任何意义的,它们的意义是在FILO关系下相互解释的,所以测试当然是基于类型规范来测试FILO不变式!这种基于类型规范的测试是一种黑盒测试,与类型的内部实现细节无关,只要单元测试覆盖了类型所定义的所有操作和不变式,那么不管内部怎么实现或优化,测试用例都不需要调整。反之,如果深入到了类型的内部实现做白盒测试,那这样的测试用例实际上就不再是反映其类型规范了,它会随着实现细节的调整而失效。

更深一层来看,不仅是在Pair,Stack这样的微观层面,在一个系统的宏观层面同样可以采用类型视角,即考察系统定义了哪些操作?这些操作之间有什么样的关系或不变式?比如,你如何从类型的角度来看待MySQL这样一套数据库系统?MySQL系统定义了哪些操作?这些操作之间必须满足怎样的关系和不变式?不仅如此,类型视角除了可以应用于计算机系统,甚至还可以应用于生活中的事物,比如,你到超市购物可以寄存自己的包,在寄包的时候会获得一张密码条,取包时可以通过密码条打开箱子。你能从超市寄包这个例子中看出类型来吗?如果你看出来了,说明你对类型的理解真正融会贯通了!

类型的函数式实现

上面我们介绍了类型的本质在于操作和操作间的关系,下面我们要关注的是类型的实现。在上面的例子中,Pair的内部结构到底是什么,是一个left和一个right成员?还是一个两元素的数组?没有讲,也没关系,就好像Windows的Handle和Linux的FileDescriptor一样,它们都是一个标识,你并不需要关心它的值本身,你只需要用几个相关的函数创建和操作它就行了(上面超市寄包例子中的密码条和Windows中的Handle是什么关系,你看出来了吗?你需要理解密码条的内容吗?)。换句话说,只要满足类型规范,具体实现是可变的,使用者只依赖于类型规范而不依赖于其具体实现。这在面向对象语言中意味着接口保持不变而具体实现可以变化(这里把public方法视为一种广义的接口)。

下面,我们还会看到的是不仅类型的内部实现可以变化,而且可以根本没有什么所谓的内部实现。这是什么意思呢?让我们来思考一下,是不是Pair内部一定要有什么东西来保存构造函数传入的left和right?我们能跳出这个定势吗?在函数式编程中,我们能做到:

//Javascript
function make_pair(x, y) {
    // 返回一个支持get_left和get_right操作的闭包(Closure)
    return {
        get_left : function() { return x },
        get_right : function() { return y }
    }
}
function get_left(pair) {
    return pair.get_left();
}
function get_right(pair) {
    return pair.get_right();
}
// Test case
console.log(get_left(make_pair(1, 2))) //1
console.log(get_right(make_pair(1, 2))) //2

上面的关键代码在于make_pair的内部返回的不是一种具体的数据结构,而是一个支持get_left和get_right操作的闭包(Closure),将来可以通过get_left和get_right来提取x, y。这种基于闭包的实现和我们通常采用的基于数据结构的实现的本质区别在哪里呢?不难发现,基于闭包的实现和类型规范是直接对应的,它并没有引入类型规范之外的东西,而基于数据结构的实现则隐藏了实现的细节。换句话说,如果要验证实现代码的正确性,对于前者只需要比对着类型规范,对于后者我们可能需要去仔细理解推敲其所采用的数据结构。对于Pair这样简单的结构二者差别不大,甚至基于数据结构的实现更简单,但是对于复杂的类型就容易体现出闭包实现的优势了。为了加深理解,我们再来看一个Stack的函数式实现:

//Javascript
function make_stack() {
    return null
}
function push(stack, x) {
    return {
        top : function() { return x },
        pop : function() { return stack }
    }
}
function top(stack) {
    return stack.top()
}
function pop(stack) {
    return stack.pop()
}
// Test case
var stack = make_stack()
stack = push(stack, 1)
stack = push(stack, 2)
stack = push(stack, 3)
console.log(top(stack)) //3
stack = pop(stack)
console.log(top(stack)) //2
stack = push(stack, 4)
console.log(top(stack)) //4

上面的所有函数都是采用了无副作用的纯函数式设计,可能习惯面向对象编程的朋友不是很习惯,不过这不影响我们对类型的讨论,而且它也很容易改造成面向对象的风格,感兴趣的朋友可以自己尝试对上面的代码进行简单的包装让它看起来像面向对象的样子。

函数式二叉树迭代器

上面我们介绍了类型的本质和函数式实现,下面我们再来看看Iterator类型又该如何定义和实现呢? 思路当然还是从操作入手,考虑Iterator类型对应了哪些操作,它们的关系是什么?上面九瓜提示了Iterator类型可以抽象为线性表List类型,或者说Iterator本质上是一个List。为什么呢?其实,只要跳出“如何表示数据结构”的思维,从类型角度思考就很容易理解,因为Iterator和List都定义了相同的操作,Iterator的使用者完全不关心也不知道它背后到底是链表还是二叉树,你对Iterator的操作和一个List的操作完全一样。正是这个原因,STL等范型库才能通过Iterator将算法和数据结构解耦。

怎么定义一个List类型呢?九瓜提到的empty(), singleton()和append()实际上就是和List打交道最多的Lisp语言的经典定义方式。Lisp是基于s-expression的,s-expression既可以视为线性表又可以视为树,本质上Lisp为List类型了构造、取首元素和取剩余元素等几种操作:

Type List:
    Operations:
        List empty()  //构造空表,通常由()这个文字量表示
        List singleton(Element e)  //构造包含一个元素的表,通常由(e)这个文字量表示
        Element first(List list)   //取list的第一个元素,通常又叫car操作
        List rest(List list)  //取list除第一个元素外的剩余部分,通常又叫cdr操作
        List append(List list1, List list2) //连接两个表
    Invariants:
        append(empty(), list) == list  //空表和表list连接后等于表list
        append(list, empty()) == list  //空表和表list连接后等于表list
        first(singleton(e)) == e  //对singleton(e)取首元素等于e
        rest(singleton(e)) == empty()  //对singleton(e)取首元素外的剩余部分的结果为空表
        append(first(list), rest(list)) == list  //对list的首尾两部分进行连接等于list本身
        if list1 is not empty then
            first(append(list1, list2)) == first(list1)  //对非空表list1于表list2的连接取首元素等于对非空表list1取首元素
        if list1 is not empty then
            rest(append(list1, list2)) == append(rest(list1), list2)  //对非空表list1于表list2的连接取首元素等于对非空表list1取首元素

有了上面的分析,我们相应地写出下面的List实现:

//Javascript
function empty() {
    return null
}
function singleton(e) {
    return {
        first: function() { return e },
        rest: function() { return null }
    }
}
function first(list) {
    return list.first()
}
function rest(list) {
    return list.rest()
}
function append(list1, list2) {
    if (null == list1) return list2
    if (null == list2) return list1

    return {
        first : function() { return first(list1) },
        rest : function() { return append(rest(list1), list2) }
    }
}

在此基础上可以进一步实现二叉树迭代器:

function make_binary_tree_iterator(node) {
    return {
        first : function() {
            return null != node.left ? first(make_binary_tree_iterator(node.left)) : node
        },
        rest : function() {
            var left_it = (null == node.left ? null : make_binary_tree_iterator(node.left))
            var root_it = singleton(node)
            var right_it = (null == node.right ? null : make_binary_tree_iterator(node.right))
            var it = append(append(left_it, root_it), right_it)
            return rest(it)
        }
    }
}
//======== Test case ========
var tree = {
    value : 1,
        left : {
            value : 2,
            left : { value : 4, left : null, right : null },
            right : null
        },
        right : {
            value : 3,
            left : null,
            right : { value : 7, left : null, right : null }
    }
}
for (var it = make_binary_tree_iterator(tree); null != it; it = rest(it)) {
    console.log(first(it).value)
}

上面的make_binary_tree_iterator在List类型的基础上按照二叉树遍历过程构造了一个List。不知道你是否注意到了,为什么它不像下面这个例子一样直接返回一个List,而要构造一个闭包呢?

function make_binary_tree_iterator(node) {
    var left_it = (null == node.left ? null : make_binary_tree_iterator(node.left))
    var root_it = singleton(node)
    var right_it = (null == node.right ? null : make_binary_tree_iterator(node.right))
    return append(append(left_it, root_it), right_it)
}

这里关键的区别在于闭包是惰性求值的,也就是说只有当真正开始迭代遍历的时候才会逐渐对各个函数进行求值,而上面的函数递归调用是非惰性的,会从一开始就把所有结点展开成线性表。如果你对这一点还不能很好地理解,可以尝试在各个函数中加log跟踪函数的调用过程。

总结

本文介绍了类型的本质在于它所定义的操作以及操作之间的关系和不变式。类型的实现关键在于满足类型规范的要求,而具体实现是可以变化的,使用者和测试用例都应该只依赖于类型规范而不依赖于具体实现。函数式的类型实现往往和类型规范是直接对应的,简单通用,但可能有性能问题,而命令式的类型实现往往会引入复杂的内部数据结构,但是其优点是高效。这两种实现并不是完全互斥的,有时候可以将二者相结合达到简单与高效的结合。

from:http://coolshell.cn/articles/10169.html  酷 壳 – CoolShell.cn