Reading Thinking Writing

Saturday, May 9, 2009

Event-driven Architecture 初探

最近在看MINA的源码，里面的IO处理采用的是Event-driven的方式。对于Event-driven虽然在公司的构架中用过，但是对其原理一直了解不深。这次借这个机会读了不少相关的文献，算是有个比较清晰的认识了吧。但是对于Event-Driven方式为什么效率高，消耗资源少，仍然没有找到确切的答案。 Nginx也是采用Event-Driven 的方式来处理请求的，而取得了非常好的效果。

下面部分是从Wikipedia提取的信息，英文是原文，中文是我对这个的理解。
Event-driven Architecture
event a significant change in state

Event structure
An event can be made of two parts,header and body.The event header might include information such as event name, timestamp for the event, and type of event. The event body is the part that describes the fact that has happened in reality.
事件内部的数据组织结构，包括一些描述事情本身的信息（如名称，时间，类型，状态等）和需要进行业务处理的数据。

事件驱动构架的分层结构
Event flow layers
1.Event generator

用来生成 Event，生成一个包括当前fact的Event.需要将许多不同的应用程序产生的数据转换成标准化的数据是这一层设计和实现的一个难题。

2.Event channel
An event channel is a mechanism whereby the information from an event generator is transferred to the event engine or sink.
事件传播的一个渠道，实现方式可以有许多种。

Event processing engine 事件处理引擎
The event processing engine is where the event is identified, and the appropriate reaction is selected and executed.

事件处理引擎，接收事件，按照事件的类型来选择不同的处理策略，并处理事件。

downstream event-driven activity
This is where the consequences of the event are shown.

Event processing styles

Simple event processing

Simple event processing concerns events that are directly related to specific, measurable changes of condition.
这种方式是接收到事件后，根据事件的类型直接进行处理，也就是事件分类与处理放在一起处理。

the application has a main loop which is clearly divided down to two sections: the first is event selection (or event detection), and the second is event handling

这种方式把事件的分类与真正的业务处理分开。事件分类可以看成是一个事件的分发器，将事件分发到相应的处理器上进行处理。这种方式会有更好的效率，而且更易扩展。
下面图片来自http://eventdrivenpgm.sourceforge.net/

下面另外一个系列文章中整理出来的一些key points.
Architectures are composed of the following:
1. Components: The elements that form a system. For example, the components of a bridge are beams, cables, and trusses.
2. Compositional operators: The mechanisms for plugging components together to get other components. For example, smaller trusses can be welded together to form larger trusses.
3. Contracts: The specifications for what users of components can expect from them.

EDA has two components: streams and agents (also called processes). A stream is a sequence of messages that have a common schema. Streams are either event streams or control streams. Messages in event streams contain information about the system state.

There are three types of agents:
* Sensors a sensor is a source of data。
* Responders
* Processing agents (or EPAs for event processing agents)

Sensors monitor the environment and generate event messages that contain information about the attributes that they monitor.
A responder receives an event stream and modifies the state of the environment based on the stream.
One can think of responders in two ways: either as a software module that receives an event stream and controls a device based on the events it receives, or as the device itself.
Event-processing agents receive multiple streams of messages, process them, and generate message streams in turn

In animals, sensors are eyes and other sensory organs, the nervous system is a network of EPAs, and muscles are responders.

A key point in such an event-driven system is the absence of messages conveys information.

references:
http://www.developer.com/design/article.php/3490671
http://www.developer.com/design/article.php/10925_3499031_1
http://elementallinks.typepad.com/bmichelson/2006/02/eventdriven_arc.html

Thursday, April 30, 2009

LDAP安装过程

前段时间在看LDAP方面的东西，最近重装了Ubuntu之后开始在自己的机器上装一个OpenLDAP。
装的过程中遇到不少问题，不过通过Google的帮忙，全部得到解决。下面装安装过程记录如下:

Berkeley DB 安装需要注意的地方
今天在装OpenLdap，结果要求先装上Berkeley DB，于是就先装BDB了。
按照Linux下安装软件三部曲，进行安装，但是在执行第一步configure时就报出下面这个错误：

Berkeley DB should not be built in the top-level or dist directories.

加上-prefix=/usr/local/BerkeleyDB之后问题同样存在，于是到网上去找了一个解决方案。具体方法如下：
# tar xvfz db-4.4.16.tar.gz
# cd db-4.4.16/build_unix/
# ../dist/configure -prefix=/usr/local/BerkeleyDB
# make
# sudo make install

BerkeleyDB安装完毕之后继续安装OpenLdap，但是configure时又出如下问题：
configure: error: BDB/HDB: BerkeleyDB not available

BerkeleyDB已经安装好了，为什么仍然出现这个错误呢？继续Google之，发现解决方法如下：
$export CPPFLAGS="-I/usr/local/BerkeleyDB/include"
$export LDFLAGS="-L/usr/local/BerkeleyDB/lib"

需要注意的是上面两个命令需要以root身份执行。
执行完之后，继续configurate，这次还是有错，这次错误变成了：
configure: error: Berkeley DB version mismatch

增加下面一行之后，安装正确.
export LD_LIBRARY_PATH="/usr/local/ssl/lib:/usr/local/BerkeleyDB/lib"

[update in 2008-12-15]
今天重装Openldap再次遇到问题，通过Google找到一位兄弟的解决方案，特地补充进来，希望对于其它遇到相同问题朋友有用。
使用berkeleydb 4.7作为后端数据库时，安装Openldap过程中make这个过程会出现如下错误：
/bin/sh ../../..//libtool --tag=disable-shared --mode=compile cc -g -O2 -I../../../include -I../../../include -I.. -I./.. -I/ceno/lab/berkeleydb/include -c init.c
cc -g -O2 -I../../../include -I../../../include -I.. -I./.. -I/ceno/lab/berkeleydb/include -c init.c -o init.o
init.c: In function `bdb_db_open':
init.c:509: structure has no member named `lk_handle'
init.c: In function `bdb_back_initialize':
init.c:752: warning: passing arg 1 of `db_env_set_func_yield' from incompatible pointer type
make[3]: *** [init.lo] Error 1
make[3]: Leaving directory `/work/ceno/lab/openldap-2.4.11/servers/slapd/back-bdb'
make[2]: *** [.backend] Error 1
make[2]: Leaving directory `/work/ceno/lab/openldap-2.4.11/servers/slapd'
make[1]: *** [all-common] Error 1
make[1]: Leaving directory `/work/ceno/lab/openldap-2.4.11/servers'
make: *** [all-common] Error 1

现在的邮件列表有一个解决方法，不过这不是官方给出的解决方法，可能会在使用过程中引起其它问题：
http://www.openldap.org/lists/openldap-bugs/200805/msg00154.html

上述错误具体原因应该是berkeleydb 4.7中已经没有了lk_handle这个structure，而在Openldap中没有更新，导致该问题。
具体方法如下：
cd openldap-2.4.11/servers/slapd/back-bdb/
将init.c 和 cache.c中

#if DB_VERSION_FULL >= 0x04060012
改为
#if 0 && DB_VERSION_FULL >= 0x04060012
之后再make 就能通过了。

Openldap安装指南：http://www.openldap.org/doc/admin24/install.html

update 2009/03/13
今天查了些资料，把在安装过程中设置的几个参数搞明白了。
CPPFLAGS，LD_LIBRARY_PATH，LDFLAGS等参数需要在configure时指定，该参数会影响生成的Makefile文件。各参数的具体意义如下：

CPPFLAGS C/C++预处理器的命令行参数。
LDFLAGS 链接器的命令行参数。

LD_LIBRARY_PATH等PATH的意义如下：

CLASSPATH = specifies where the computer searches for java class libraries

LD_LIBRARY_PATH = specifies where the computer looks for dynamically-loaded libraries

PATH = specifies where the computer looks for executables

LIBPATH = also specifies where the computer looks for dynamically-loaded libraries (usually set just in case something doesn't support LD_LIBRARY_PATH)

update:
将文章从javaeye博客转移到目前这个博客中。

一个巧妙的Index

在一个论坛上看到一个我认为很精妙的创建Index的方法，后面的讨论更加精彩，于是把这个贴子里的精髓部分整理一下。
下面是一个创建索引的语句：
create index idx_prodd on prodd (status,'1'）;

创建这个索引的意义是什么呢？原来这是DBA为了让status为null时，Where子句中包括status时，SQL语句也能利用索引进行快速查找。
Oracle中创建B-tree索引时，这个column上为null值时是不会被索引，这时候以这个字段is null为条件查询时必须进行table full scan。
现在用这个语句创建一个联合索引，保证status为null时，这个值仍然会被索引，因此保证能够通过索引去查询。而另外一个方面另外一辅助常量字段1，对于索引的大小影响也不大，确实是一个很聪明的做法。
这样看来，在对表的字段进行设计时对于会出现在查询条件中的值应该尽量避免null值。如果某些字段可以为null的值，给它设定一个default值，如‘NULL’，这样能有效保证查询的数据。
下面这个例子，可以考虑采用下面的方法达到相同效果：
alter table prodd modify status not null ;
alter table prodd modify status default ‘NULL‘;
然后将SQL的 where status is null 改成 where status = ’NULL‘。
虽然这样做会增加一点数据存储的空间，但是获得的好处远远大于这些存储开销。

LDAP中的objectClass与Attribute

初学LDAP时关于objectClass和Attribute之间的关系总是困扰着我，找过许多的中文资料都没有得到答案。最近终于彻底弄明白了这个问题，决定记录下来，以让后学者少走弯路。非常奇妙的是他们之间的关系与Java里面的一些概念很相似，接下来我会结合Java 来讲讲LDAP中的objectClass与Attribute。
LDAP中每一个Entry必须属于某一个objectClass，用Java的方式来理解这个Entry对应着一个Instance，而 objectClass自然就是class了。
在Java中Class大致可以分为Abstract，concrete两种，只有concrete Class才能生成instance。而在LDAP中objectClass分为三种：Abstract，Structural，AUXIALIARY。具体定义如下：
* Abstract object classes are only intended to be extended by other object classes. An entry must not contain any abstract class unless it also contains a structural or auxiliary class that dervies from that abstract class (i.e., includes a non-abstract object class which has the abstract class in its inheritance chain). All entries must contain at least the "top" abstract object class, in the inheritance chain for their structural class. They may or may not contain other abstract classes in the inheritance chains for their structural class or any of their auxiliary classes.

* Structural object classes are intended to define the crux of what an entry represents. Every entry must include exactly one structural object class chain, and the root of that chain must ultimately be the "top" abstract object class. The structural object class for an entry is not allowed to be changed.

* Auxiliary object classes are intended to define additional qualities of entries. An entry may contain zero or more auxiliary classes, and the set of auxiliary classes associated with an entry may change over time.

简单描述就是：Abstract只用来被其它object class继承，只有当其被Structural object class继承时才出现。要定义一个Entry必须有且只有一个Structural类型的ObjectClass。 Top是一个顶级Abstract ObjectClass，里面定义了一个MUST Attribute：ObjectClass，这就决定了必须有一个其它的Structural ObjectClass才能定义一个Entry.其中ObjectClass又可以存在继承关系，该继承关系于Java中有点相似，子ObjectClass会继承父ObjectClass中的全部Attributes.

接下来看一看ObjectClass与Attribute的关系。
如同Java里面的一个类可以包括多个Field，在业务上可能会定义某些Field是必须的，另外一些是可选的。在LDAP中也存在类似关系，每一个ObjectClass都定义了一些Attribute，其Attribute仍然可以是ObjectClass。在这些Attriubte中分为两种类型MUST，MAY， MUST表示这个Entry必须包括的属性，MAY为可选。一个ObjectClass的Attribute也包括所有继承自父ObjectClass和自身定义的ObjectClass。
下面用一个类型进行说明：
objectclass ( 2.5.6.0 NAME 'top' ABSTRACT
MUST objectClass )
objectclass ( 1.3.6.1.4.1.1466.344 NAME 'dcObject'
DESC 'RFC2247: domain component object'
SUP top AUXILIARY
MUST dc )
上面是两个objectclass的定义，其中top为ABSTRACT，dcObject为AUXILIARY，这两个类型都不能定义Entry.下面这个LDIF文件在导入到LDAP时会出错：
dn: dc=java,dc=com
objectClass:dcObject
dc: java.com

要定义这个Entry必须找到一个STRUCTURAL类型的ObjectClass。
objectClasses: ( 2.5.6.4 NAME 'organization'
DESC 'RFC2256: an organization' SUP top STRUCTURAL
MUST o
MAY ( userPassword $ searchGuide $ seeAlso $ businessCategory
$ x121Address $ registeredAddress $ destinationIndicator
$ preferredDeliveryMethod $ telexNumber $ teletexTerminalIdentifier
$ telephoneNumber $ internationaliSDNNumber $ facsimileTelephoneNumber
$ street $ postOfficeBox $ postalCode $ postalAddress
$ physicalDeliveryOfficeName $ st $ l $ description ) )
这个objectClass的类型为STRUCTURAL，因此可以用来定义Entry.具体定义如下
dn: dc=java,dc=com
objectClass:dcObject
objectClass:organization
dc: java.com
o: java.com

其中dc：java.com为dcObject的MUST Attribute，o: java.com为organization的MUST Attribute。

ps:下文原来是放在javaeye的博客中，现在那边不再更新，将其转移到这里。

Wednesday, April 29, 2009

解决字符编码问题

字符编码集的问题

这两天和一个第三方平台进行联调，碰到一个字符编码的问题，经过自己的分析并与对方进行沟通，最终问题得到妥善解决。字符编码让人非常恼火，只有真正理解了其中的机制之后才能快速的找出问题的根源。幸亏去年读过好些关于字符编码的文章，让我对这个问题有很清晰的认识，这也是这次能快速解决问题的根本。
记录一下解决这个问题时我的分析过程，也反省一下还有些什么可以提高的。
问题背景：与对方平台的接口采用HTTP POST方式传递XML报文方式交换数据，其中报文体BODY是以DES加密的字符串。由于项目需求，在加密前进行了一次字符串转换，将字符串的byte[]转成Hex字符再进行DES。

问题现象：在公司测试环境上，发现对方传递过来的报文能正确解析出XML结构并正常处理，但是查看最终生成的数据中发现其中的中文为乱码。但是另外有个发现是我在生产测试机上在终端看到了正确的中文字符。

问题分析解决过程：由于报文解析正常，业务处理也正常，存在中文出现乱码问题，因此我认定是自己内部处理出现错误。由于内部处理过程中也存在两个不同服务的交互，于是先是假定问题出在两个服务传递数据时的问题（当时对于这个猜测，我并没找到充足的理论依据，事实证明这种猜想是错误的），于是debug接收数据的服务。Debug中发现接收到的数据已经是乱码（值全部为 65533）。问过做内部两个服务数据交互接口的开发人员，他说是通过Socket传递数据的，不可能存在编码问题，因此把我问题定位在解析这个过程上。于是针对解析部分进行Debug，发现DES解密出现的串中，中文已经是乱码（值全部为65533）。于是查看解密的代码，在查看解码代码时发现了问题，正是解决该字符集问题的关键！在这里要提一下，幸亏对方给了我们解密与加密的实现代码，否则这问题更难定位。代码中在将一个String转换成byte[]时采用str.getBytes()方法，该方法采用JVM默认的编码将一个String转换成 byte[]。这时让我想起之前在HP-UX终端打印的看到正确的中文，而在公司测试环境下的Linux下面却是乱码的。我断定是乱码的问题是由于对方加密时JVM的编码与我们测试系统中解密时采用的编码方式不一致造成的。由于对方工作人员已经下班，没办法直接问他们的字符编码方式。但是想到HP-UX能正常解析，只要找到HP-UX上JVM启动时设置的编码就能知道问题原因了。但是该死的HP-UX并没有像Linux那样的/porc文件系统来记录进程信息，而且ps 命令只能显示很少的命令行字符，折腾了半天也没能看清楚cmdline的全貌。这时想起在Linux下面启动时应该和HP-UX下JVM启动参数设置是一致的，于是查看Linux下的cmdline。但是发现JVM启动时并没有指定file.encoding参数，这表明JVM采用操作系统默认字符集了。查看后发现Linux下是C，HP-UX不熟悉，折腾半天也没搞明白，自己找到编码方式的尝试失败。于是决定等第二天直接问对方采用的字符编码方式。
第二天早上和对方开发人员沟通后，对方告诉我他们是采用GBK编码。我马上在JVM启动时加上一个启动参数-Dfile.encoding=GBK,再次发送报文测试，新生成的数据中中文不再是乱码，问题终于解决了。

问题的反思：
问题解决了，但是对于问题解决过程的反思更重要。
解决方案中可能引入的新问题是JVM改成GBK编码会不会对该服务中其它部分产生影响？
再回顾这个问题，罪魁祸首就是用了String的getBytes()方法，或许在使用该方法时应该显示指明其编码类型？所有代码都在单个JVM中运行或许不需要关心这个问题，因为编码总是统一的。但是在多个JVM中，并且JVM之间要交换数据，使用这个方法时或许就需要注意一下了。
再回顾问题解决过程，发现在分析问题过程中还有许多可以改进的地方。比如开始就就忽略的HP-UX上面编码是正确的这个关键点，而是在折腾了半天之后，定位到是JVM编码问题时才想起这个关键点。从HP-UX与Linux 上出现不同的解密不同时就可以判定在JVM启动时没有指定默认字符集，而不需要花费很多时间去找到进程启动的cmdline. 这些都是在经过清晰，严谨的思考，只要再往前一步就能推断出答案的，而在解决问题中却没有发现。遇到问题保持头脑清醒，周密，严谨的思考比想到一半就开始实践更有效率。这些都是今后工作中需要提高的。

Monday, April 27, 2009

Something about Database Design

I read some articles about how to design a relation database. There is a note extract from the articles. it is a good guide to design a database.

Normalisation is the term used to describe how you break a file down into tables to create a database.
The targets of normalization:
1.minimization of data redundancy
2.minimization of data restructuring
3.minimization of I/O by reduction of transaction sizes
4.enforcement of referential integrity

The First Normal Form (1NF) addresses the structure of an isolated table.
The Second (2NF), Third (3NF), and Boyce-Codd (BCNF) Normal Forms address one-to-one and one-to-many relationships.
The Fourth (4NF) and Fifth (5NF) Normal Forms deal with many-to-many relationships.

Normal Form define:
A table is said to be in First Normal Form (1NF), if all entries in it are scalar-valued. Relational database tables are 1NF by construction since vector-valued entries are forbidden.
A table is in Second Normal Form (2NF) if every non-key field is a fact about the entire key. In other words, a table is 2NF if it is 1NF and all non-key attributes are functionally dependent on the entire primary key (that is, the dependency is irreducible).
A relation is in Third Normal Form (3NF) if it is 2NF and none of its attributes is a fact about another non-key field. In other words, no non-key field functionally depends on any other non-key field.

Every field in a record must depend on The Key (1NF), the Whole Key (2NF), and Nothing But The Key (3NF).

Some tips:
1.Table name all caps.
2.An attribute is a descriptive or quantitative characteristic of an entity. Initial Cap.
3.PK is a uniquely identify each instance of an entity,should not change. The Integer datatype is more effecient than CHAR datatype.
should non-intelligent.
4.A relationship is a logical link between entities. one-to-many we can use FK to implements.
5.many-to-many relationship may be resolved by creating an intermediate entity known as a cross-reference(XREF) entity.
6.FK, the value is dependency on pk.
7.identifying or non-identifying.
8.Cardinality "How many instances of the child entity relate to each instance of the parent entity?"

the process of design a database.
Table define
Attribute define
Relationship define

All the things are picked up from the articles which refered in this post:10-useful-articles-about-database.

Saturday, April 11, 2009

Thought about Program Language

I have read an article which talks about the program language.It impressed me so much. The author told a story about he has written a startup by Lisp,and explained why people say Lisp is the most powerful language. He said: the programming languages are not just technology,but what programmers think in.They are half technology and half religion. I really agree with her.

Programming language affect you thinking. When you have a problem,naturally you want to solve by you familiar language. Sometime it isn't the optima,even it's the worst solution. Each language have some good features. I'm familiar with java.So when I have a problem, spontaneous thinking is try to find a solution with java style. Because I don't familiar with other language, so I can't find another way to deal with the problem. It's my dead spot. But when I learned a little Shell, I can get thing done with Shell. Sometimes it's more simple and compact than with java. It give me an chance to think problem from other side. it's very valuable. In this lesson, I learned that the more you know,the wider you can think. The Lisp have more powerful feature than others. If you are skilled in Lisp you know more tips than you can get from other languages.

But wait a minute. Sometime the knowledge isn't good for you thinking. There is a saying:if all you have is a hammer, everything looks like a nail. The familiar language( or knowledge) can affect you thought,sometimes it bring you to the wrong road. You have a problem,you want to use you knowledge to solve it. You just want to solve the problem with familiar knowledge,even make people don't thing the essential of the problem. But we must have some knowledge to get things done. How to avoid this thinking dead spot? I think it would be Critical Thinking.When we learning a language or things, we should ask some questions:what it's advantage,what it's disadvantage,what things it hold good for, what things it isn't applicable. If we know that, we can do the right things with right tools.