Reading Thinking Writing: April 2009

Thursday, April 30, 2009

LDAP安装过程

前段时间在看LDAP方面的东西，最近重装了Ubuntu之后开始在自己的机器上装一个OpenLDAP。
装的过程中遇到不少问题，不过通过Google的帮忙，全部得到解决。下面装安装过程记录如下:

Berkeley DB 安装需要注意的地方
今天在装OpenLdap，结果要求先装上Berkeley DB，于是就先装BDB了。
按照Linux下安装软件三部曲，进行安装，但是在执行第一步configure时就报出下面这个错误：

Berkeley DB should not be built in the top-level or dist directories.

加上-prefix=/usr/local/BerkeleyDB之后问题同样存在，于是到网上去找了一个解决方案。具体方法如下：
# tar xvfz db-4.4.16.tar.gz
# cd db-4.4.16/build_unix/
# ../dist/configure -prefix=/usr/local/BerkeleyDB
# make
# sudo make install

BerkeleyDB安装完毕之后继续安装OpenLdap，但是configure时又出如下问题：
configure: error: BDB/HDB: BerkeleyDB not available

BerkeleyDB已经安装好了，为什么仍然出现这个错误呢？继续Google之，发现解决方法如下：
$export CPPFLAGS="-I/usr/local/BerkeleyDB/include"
$export LDFLAGS="-L/usr/local/BerkeleyDB/lib"

需要注意的是上面两个命令需要以root身份执行。
执行完之后，继续configurate，这次还是有错，这次错误变成了：
configure: error: Berkeley DB version mismatch

增加下面一行之后，安装正确.
export LD_LIBRARY_PATH="/usr/local/ssl/lib:/usr/local/BerkeleyDB/lib"

[update in 2008-12-15]
今天重装Openldap再次遇到问题，通过Google找到一位兄弟的解决方案，特地补充进来，希望对于其它遇到相同问题朋友有用。
使用berkeleydb 4.7作为后端数据库时，安装Openldap过程中make这个过程会出现如下错误：
/bin/sh ../../..//libtool --tag=disable-shared --mode=compile cc -g -O2 -I../../../include -I../../../include -I.. -I./.. -I/ceno/lab/berkeleydb/include -c init.c
cc -g -O2 -I../../../include -I../../../include -I.. -I./.. -I/ceno/lab/berkeleydb/include -c init.c -o init.o
init.c: In function `bdb_db_open':
init.c:509: structure has no member named `lk_handle'
init.c: In function `bdb_back_initialize':
init.c:752: warning: passing arg 1 of `db_env_set_func_yield' from incompatible pointer type
make[3]: *** [init.lo] Error 1
make[3]: Leaving directory `/work/ceno/lab/openldap-2.4.11/servers/slapd/back-bdb'
make[2]: *** [.backend] Error 1
make[2]: Leaving directory `/work/ceno/lab/openldap-2.4.11/servers/slapd'
make[1]: *** [all-common] Error 1
make[1]: Leaving directory `/work/ceno/lab/openldap-2.4.11/servers'
make: *** [all-common] Error 1

现在的邮件列表有一个解决方法，不过这不是官方给出的解决方法，可能会在使用过程中引起其它问题：
http://www.openldap.org/lists/openldap-bugs/200805/msg00154.html

上述错误具体原因应该是berkeleydb 4.7中已经没有了lk_handle这个structure，而在Openldap中没有更新，导致该问题。
具体方法如下：
cd openldap-2.4.11/servers/slapd/back-bdb/
将init.c 和 cache.c中

#if DB_VERSION_FULL >= 0x04060012
改为
#if 0 && DB_VERSION_FULL >= 0x04060012
之后再make 就能通过了。

Openldap安装指南：http://www.openldap.org/doc/admin24/install.html

update 2009/03/13
今天查了些资料，把在安装过程中设置的几个参数搞明白了。
CPPFLAGS，LD_LIBRARY_PATH，LDFLAGS等参数需要在configure时指定，该参数会影响生成的Makefile文件。各参数的具体意义如下：

CPPFLAGS C/C++预处理器的命令行参数。
LDFLAGS 链接器的命令行参数。

LD_LIBRARY_PATH等PATH的意义如下：

CLASSPATH = specifies where the computer searches for java class libraries

LD_LIBRARY_PATH = specifies where the computer looks for dynamically-loaded libraries

PATH = specifies where the computer looks for executables

LIBPATH = also specifies where the computer looks for dynamically-loaded libraries (usually set just in case something doesn't support LD_LIBRARY_PATH)

update:
将文章从javaeye博客转移到目前这个博客中。

一个巧妙的Index

在一个论坛上看到一个我认为很精妙的创建Index的方法，后面的讨论更加精彩，于是把这个贴子里的精髓部分整理一下。
下面是一个创建索引的语句：
create index idx_prodd on prodd (status,'1'）;

创建这个索引的意义是什么呢？原来这是DBA为了让status为null时，Where子句中包括status时，SQL语句也能利用索引进行快速查找。
Oracle中创建B-tree索引时，这个column上为null值时是不会被索引，这时候以这个字段is null为条件查询时必须进行table full scan。
现在用这个语句创建一个联合索引，保证status为null时，这个值仍然会被索引，因此保证能够通过索引去查询。而另外一个方面另外一辅助常量字段1，对于索引的大小影响也不大，确实是一个很聪明的做法。
这样看来，在对表的字段进行设计时对于会出现在查询条件中的值应该尽量避免null值。如果某些字段可以为null的值，给它设定一个default值，如‘NULL’，这样能有效保证查询的数据。
下面这个例子，可以考虑采用下面的方法达到相同效果：
alter table prodd modify status not null ;
alter table prodd modify status default ‘NULL‘;
然后将SQL的 where status is null 改成 where status = ’NULL‘。
虽然这样做会增加一点数据存储的空间，但是获得的好处远远大于这些存储开销。

LDAP中的objectClass与Attribute

初学LDAP时关于objectClass和Attribute之间的关系总是困扰着我，找过许多的中文资料都没有得到答案。最近终于彻底弄明白了这个问题，决定记录下来，以让后学者少走弯路。非常奇妙的是他们之间的关系与Java里面的一些概念很相似，接下来我会结合Java 来讲讲LDAP中的objectClass与Attribute。
LDAP中每一个Entry必须属于某一个objectClass，用Java的方式来理解这个Entry对应着一个Instance，而 objectClass自然就是class了。
在Java中Class大致可以分为Abstract，concrete两种，只有concrete Class才能生成instance。而在LDAP中objectClass分为三种：Abstract，Structural，AUXIALIARY。具体定义如下：
* Abstract object classes are only intended to be extended by other object classes. An entry must not contain any abstract class unless it also contains a structural or auxiliary class that dervies from that abstract class (i.e., includes a non-abstract object class which has the abstract class in its inheritance chain). All entries must contain at least the "top" abstract object class, in the inheritance chain for their structural class. They may or may not contain other abstract classes in the inheritance chains for their structural class or any of their auxiliary classes.

* Structural object classes are intended to define the crux of what an entry represents. Every entry must include exactly one structural object class chain, and the root of that chain must ultimately be the "top" abstract object class. The structural object class for an entry is not allowed to be changed.

* Auxiliary object classes are intended to define additional qualities of entries. An entry may contain zero or more auxiliary classes, and the set of auxiliary classes associated with an entry may change over time.

简单描述就是：Abstract只用来被其它object class继承，只有当其被Structural object class继承时才出现。要定义一个Entry必须有且只有一个Structural类型的ObjectClass。 Top是一个顶级Abstract ObjectClass，里面定义了一个MUST Attribute：ObjectClass，这就决定了必须有一个其它的Structural ObjectClass才能定义一个Entry.其中ObjectClass又可以存在继承关系，该继承关系于Java中有点相似，子ObjectClass会继承父ObjectClass中的全部Attributes.

接下来看一看ObjectClass与Attribute的关系。
如同Java里面的一个类可以包括多个Field，在业务上可能会定义某些Field是必须的，另外一些是可选的。在LDAP中也存在类似关系，每一个ObjectClass都定义了一些Attribute，其Attribute仍然可以是ObjectClass。在这些Attriubte中分为两种类型MUST，MAY， MUST表示这个Entry必须包括的属性，MAY为可选。一个ObjectClass的Attribute也包括所有继承自父ObjectClass和自身定义的ObjectClass。
下面用一个类型进行说明：
objectclass ( 2.5.6.0 NAME 'top' ABSTRACT
MUST objectClass )
objectclass ( 1.3.6.1.4.1.1466.344 NAME 'dcObject'
DESC 'RFC2247: domain component object'
SUP top AUXILIARY
MUST dc )
上面是两个objectclass的定义，其中top为ABSTRACT，dcObject为AUXILIARY，这两个类型都不能定义Entry.下面这个LDIF文件在导入到LDAP时会出错：
dn: dc=java,dc=com
objectClass:dcObject
dc: java.com

要定义这个Entry必须找到一个STRUCTURAL类型的ObjectClass。
objectClasses: ( 2.5.6.4 NAME 'organization'
DESC 'RFC2256: an organization' SUP top STRUCTURAL
MUST o
MAY ( userPassword $ searchGuide $ seeAlso $ businessCategory
$ x121Address $ registeredAddress $ destinationIndicator
$ preferredDeliveryMethod $ telexNumber $ teletexTerminalIdentifier
$ telephoneNumber $ internationaliSDNNumber $ facsimileTelephoneNumber
$ street $ postOfficeBox $ postalCode $ postalAddress
$ physicalDeliveryOfficeName $ st $ l $ description ) )
这个objectClass的类型为STRUCTURAL，因此可以用来定义Entry.具体定义如下
dn: dc=java,dc=com
objectClass:dcObject
objectClass:organization
dc: java.com
o: java.com

其中dc：java.com为dcObject的MUST Attribute，o: java.com为organization的MUST Attribute。

ps:下文原来是放在javaeye的博客中，现在那边不再更新，将其转移到这里。

Wednesday, April 29, 2009

解决字符编码问题

字符编码集的问题

这两天和一个第三方平台进行联调，碰到一个字符编码的问题，经过自己的分析并与对方进行沟通，最终问题得到妥善解决。字符编码让人非常恼火，只有真正理解了其中的机制之后才能快速的找出问题的根源。幸亏去年读过好些关于字符编码的文章，让我对这个问题有很清晰的认识，这也是这次能快速解决问题的根本。
记录一下解决这个问题时我的分析过程，也反省一下还有些什么可以提高的。
问题背景：与对方平台的接口采用HTTP POST方式传递XML报文方式交换数据，其中报文体BODY是以DES加密的字符串。由于项目需求，在加密前进行了一次字符串转换，将字符串的byte[]转成Hex字符再进行DES。

问题现象：在公司测试环境上，发现对方传递过来的报文能正确解析出XML结构并正常处理，但是查看最终生成的数据中发现其中的中文为乱码。但是另外有个发现是我在生产测试机上在终端看到了正确的中文字符。

问题分析解决过程：由于报文解析正常，业务处理也正常，存在中文出现乱码问题，因此我认定是自己内部处理出现错误。由于内部处理过程中也存在两个不同服务的交互，于是先是假定问题出在两个服务传递数据时的问题（当时对于这个猜测，我并没找到充足的理论依据，事实证明这种猜想是错误的），于是debug接收数据的服务。Debug中发现接收到的数据已经是乱码（值全部为 65533）。问过做内部两个服务数据交互接口的开发人员，他说是通过Socket传递数据的，不可能存在编码问题，因此把我问题定位在解析这个过程上。于是针对解析部分进行Debug，发现DES解密出现的串中，中文已经是乱码（值全部为65533）。于是查看解密的代码，在查看解码代码时发现了问题，正是解决该字符集问题的关键！在这里要提一下，幸亏对方给了我们解密与加密的实现代码，否则这问题更难定位。代码中在将一个String转换成byte[]时采用str.getBytes()方法，该方法采用JVM默认的编码将一个String转换成 byte[]。这时让我想起之前在HP-UX终端打印的看到正确的中文，而在公司测试环境下的Linux下面却是乱码的。我断定是乱码的问题是由于对方加密时JVM的编码与我们测试系统中解密时采用的编码方式不一致造成的。由于对方工作人员已经下班，没办法直接问他们的字符编码方式。但是想到HP-UX能正常解析，只要找到HP-UX上JVM启动时设置的编码就能知道问题原因了。但是该死的HP-UX并没有像Linux那样的/porc文件系统来记录进程信息，而且ps 命令只能显示很少的命令行字符，折腾了半天也没能看清楚cmdline的全貌。这时想起在Linux下面启动时应该和HP-UX下JVM启动参数设置是一致的，于是查看Linux下的cmdline。但是发现JVM启动时并没有指定file.encoding参数，这表明JVM采用操作系统默认字符集了。查看后发现Linux下是C，HP-UX不熟悉，折腾半天也没搞明白，自己找到编码方式的尝试失败。于是决定等第二天直接问对方采用的字符编码方式。
第二天早上和对方开发人员沟通后，对方告诉我他们是采用GBK编码。我马上在JVM启动时加上一个启动参数-Dfile.encoding=GBK,再次发送报文测试，新生成的数据中中文不再是乱码，问题终于解决了。

问题的反思：
问题解决了，但是对于问题解决过程的反思更重要。
解决方案中可能引入的新问题是JVM改成GBK编码会不会对该服务中其它部分产生影响？
再回顾这个问题，罪魁祸首就是用了String的getBytes()方法，或许在使用该方法时应该显示指明其编码类型？所有代码都在单个JVM中运行或许不需要关心这个问题，因为编码总是统一的。但是在多个JVM中，并且JVM之间要交换数据，使用这个方法时或许就需要注意一下了。
再回顾问题解决过程，发现在分析问题过程中还有许多可以改进的地方。比如开始就就忽略的HP-UX上面编码是正确的这个关键点，而是在折腾了半天之后，定位到是JVM编码问题时才想起这个关键点。从HP-UX与Linux 上出现不同的解密不同时就可以判定在JVM启动时没有指定默认字符集，而不需要花费很多时间去找到进程启动的cmdline. 这些都是在经过清晰，严谨的思考，只要再往前一步就能推断出答案的，而在解决问题中却没有发现。遇到问题保持头脑清醒，周密，严谨的思考比想到一半就开始实践更有效率。这些都是今后工作中需要提高的。

Monday, April 27, 2009

Something about Database Design

I read some articles about how to design a relation database. There is a note extract from the articles. it is a good guide to design a database.

Normalisation is the term used to describe how you break a file down into tables to create a database.
The targets of normalization:
1.minimization of data redundancy
2.minimization of data restructuring
3.minimization of I/O by reduction of transaction sizes
4.enforcement of referential integrity

The First Normal Form (1NF) addresses the structure of an isolated table.
The Second (2NF), Third (3NF), and Boyce-Codd (BCNF) Normal Forms address one-to-one and one-to-many relationships.
The Fourth (4NF) and Fifth (5NF) Normal Forms deal with many-to-many relationships.

Normal Form define:
A table is said to be in First Normal Form (1NF), if all entries in it are scalar-valued. Relational database tables are 1NF by construction since vector-valued entries are forbidden.
A table is in Second Normal Form (2NF) if every non-key field is a fact about the entire key. In other words, a table is 2NF if it is 1NF and all non-key attributes are functionally dependent on the entire primary key (that is, the dependency is irreducible).
A relation is in Third Normal Form (3NF) if it is 2NF and none of its attributes is a fact about another non-key field. In other words, no non-key field functionally depends on any other non-key field.

Every field in a record must depend on The Key (1NF), the Whole Key (2NF), and Nothing But The Key (3NF).

Some tips:
1.Table name all caps.
2.An attribute is a descriptive or quantitative characteristic of an entity. Initial Cap.
3.PK is a uniquely identify each instance of an entity,should not change. The Integer datatype is more effecient than CHAR datatype.
should non-intelligent.
4.A relationship is a logical link between entities. one-to-many we can use FK to implements.
5.many-to-many relationship may be resolved by creating an intermediate entity known as a cross-reference(XREF) entity.
6.FK, the value is dependency on pk.
7.identifying or non-identifying.
8.Cardinality "How many instances of the child entity relate to each instance of the parent entity?"

the process of design a database.
Table define
Attribute define
Relationship define

All the things are picked up from the articles which refered in this post:10-useful-articles-about-database.

Saturday, April 11, 2009

Thought about Program Language

I have read an article which talks about the program language.It impressed me so much. The author told a story about he has written a startup by Lisp,and explained why people say Lisp is the most powerful language. He said: the programming languages are not just technology,but what programmers think in.They are half technology and half religion. I really agree with her.

Programming language affect you thinking. When you have a problem,naturally you want to solve by you familiar language. Sometime it isn't the optima,even it's the worst solution. Each language have some good features. I'm familiar with java.So when I have a problem, spontaneous thinking is try to find a solution with java style. Because I don't familiar with other language, so I can't find another way to deal with the problem. It's my dead spot. But when I learned a little Shell, I can get thing done with Shell. Sometimes it's more simple and compact than with java. It give me an chance to think problem from other side. it's very valuable. In this lesson, I learned that the more you know,the wider you can think. The Lisp have more powerful feature than others. If you are skilled in Lisp you know more tips than you can get from other languages.

But wait a minute. Sometime the knowledge isn't good for you thinking. There is a saying:if all you have is a hammer, everything looks like a nail. The familiar language( or knowledge) can affect you thought,sometimes it bring you to the wrong road. You have a problem,you want to use you knowledge to solve it. You just want to solve the problem with familiar knowledge,even make people don't thing the essential of the problem. But we must have some knowledge to get things done. How to avoid this thinking dead spot? I think it would be Critical Thinking.When we learning a language or things, we should ask some questions:what it's advantage,what it's disadvantage,what things it hold good for, what things it isn't applicable. If we know that, we can do the right things with right tools.

Friday, April 3, 2009

一个批量下载文件的脚本

最近在重新学习英语发音，李笑来老师的博客上提供了很多相关方面的资料。其中有些页面有很多单词发音的wav音频文件，但是我的浏览器无法播放这些音频文件。于是准备把他下载下来，但是呢一个一个地下载实在是太慢了，而且也感觉比较笨。于是自然想到写一个脚本来干这件事情。
先去看了一下包含有音频文件的HTML页面的源码，音频文件的行非常有规律。
对于这样的文件使用Shell非常容易处理。下面就是我写的一个简单的脚本：
#!/bin/bash
wget $1 -O wav.html
egrep ".wav\>" wav.html | awk -F '"' '{print $2}' > wav.list
wget -i wav.list

怎么使用呢？只要这样就行：

star@star-desktop:~/Documents/English/pronounce/words$ ./getWav.sh http://www.xiaolai.net/index.php/archives/4047.html

其实还可以再改一改，比如把下载文件名也改成通过参数传递进去。不过这个已经够我用了。
一转眼功夫就下载完了所有文件，实在是高效啊。
Linux下通过Shell脚本高效地完成一些繁琐的事情实在是太酷了。

Reading Thinking Writing

About Me

Followers

Blog Archive