1.Oozie是state-less,状态存储在DB中。 2.借助LoadBalancer、Virtual IP、或者DNS Round-Robin实现对外单一的host封装。 3.利用Zookeeper实现多Server在被用户访问同一个job的Distributed Locking。(实际上仅仅注册了Server,没有注册job-id的状态到ZK,因为后端有统一的DB存储所有的作业的状态。通过Zookeeper,每一个Oozie Server知道当前有几个正在执行的instances,使用mod算法,每一个oozie Server选取部分的Coordinator jobs来进行materialize。)Materialization一个Coordinator上的workflow,是从无到有(WAITING),在从有到RUNNING的过程。 4.支持到任意Server查询任何的job的log,目前通过Log Streaming(HTTP),后续可能会考虑MapReduce JobHistoryServer的方案,将已经完成作业的log存储到HDFS文件夹中. Refer to: Oozie-615 Cloudera-Blog-O


Conclusion_201310-201403 最近在Y公司的Hadoop Team,这几个月对我的影响还是挺大的,感觉收获了不少: 1) Get catch of Hadoop/Yarn Again. Before I graduated from ICT, I had spent lots of time in researching Distributed computing framework, then due to my first job as to HBase, they were set aside. Now when I get back to the subject, I feel very excited and a little unfamiliar. Hadoop Security, Yarn log policy, Container lifetime, etc. 2) Acquire a lot of practice how to maintain or manager large number of normal users, especially in multi-tenant data-sharing environment. 3) Touch and become familiar with some other components in hadoop-ecosystem. Oozie – a very useful work flow and co