如何得到一个空的Hadoop Configuration

都怪JDK, File类的delete()方法竟然不能删除非空的目录。所以,我就想用Hadoop的API,于是有了下面一段程序

1
2
3
4
5
6
7
val fileSystem = FileSystem.newInstance(new Configuration())
val warehousePath = new Path("spark-warehouse")
if(fileSystem.exists(warehousePath))
fileSystem.delete(warehousePath, true)
val metastoreDB = new Path("metastore_db")
if (fileSystem.exists(metastoreDB))
fileSystem.delete(metastoreDB)

问题是,new Configuration()默认会从classpath里找到core-site.xmlcore-default.xml来加载,所以我想,万一以后不小心把这些文件加到classpath里呢?比如哪天我想要测试连接别的机器上的HDFS。还好,Configuration类有个方法来禁止对这俩文件的加载, 正如这个类的注释所说的

1
2
3
4
Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:

1. `core-default.xml`: Read-only defaults for hadoop.
2. `core-site.xml`: Site-specific configuration for a given hadoop installation.

我搞了个core-site.xml到classpath下,于是这段代码就会报错说

in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: cdh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:320)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:2691)
at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:420)
at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:428)
at sleepy.spark.SparkHiveExample$.main(SparkHiveExample.scala:45)
at sleepy.spark.SparkHiveExample.main(SparkHiveExample.scala)
Caused by: java.net.UnknownHostException: cdh
... 14 more

看起来程序去连接外部的HDFS时,发现无法识别cdh, 实际上它也并非是域名,而是dfs.nameservices的值。

好的,那就用Configuration(boolean)这个构造器, 这的文档是这样说的

A new configuration where the behavior of reading from the default resources can be turned off. If the parameter loadDefaults is false, the new instance will not load resources from the default files.

但是呢,执行的时候仍然在报错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
Exception in thread "main" java.io.IOException: failure to login
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:841)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:777)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:650)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2828)
at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:2690)
at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:420)
at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:428)
at sleepy.spark.SparkHiveExample$.main(SparkHiveExample.scala:47)
at sleepy.spark.SparkHiveExample.main(SparkHiveExample.scala)
Caused by: javax.security.auth.login.LoginException: java.lang.IllegalArgumentException: Illegal principal name foo@FOO.COM: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to foo@FOO.COM
at org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule.commit(UserGroupInformation.java:201)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
at javax.security.auth.login.LoginContext.login(LoginContext.java:588)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:815)
... 8 more
Caused by: java.lang.IllegalArgumentException: Illegal principal name foo@FOO: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to foo@FOO
at org.apache.hadoop.security.User.<init>(User.java:51)
at org.apache.hadoop.security.User.<init>(User.java:43)
at org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule.commit(UserGroupInformation.java:199)
... 20 more
Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to foo@FOO
at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:400)
at org.apache.hadoop.security.User.<init>(User.java:48)
... 22 more

究其原因是在初始化FileSystem的时候,化调用到UserGroupInformation#ensureIntialized()

1
2
3
4
5
6
7
8
9
private static void ensureInitialized() {
if (conf == null) {
synchronized(UserGroupInformation.class) {
if (conf == null) { // someone might have beat us
initialize(new Configuration(), false);
}
}
}
}

这里直接调用了new Configuration, 而这个对象是加载了classpath里的配置文件的。

所以,直接给UGI指定个configuration就行了

1
2
3
val conf = new Configuration(false)
UserGroupInformation.setConfiguration(conf)
val fileSystem = FileSystem.newInstance(conf)