Wednesday 26 April 2017

'No such file or directory' on starting your domain

Today I was triggered by this question. Earlier I had a similar problem, where I searched, and searched and searched and found the last section on this blogpost. Yes indeed, that is how it works when you blog: you might find your self finding your own blogposts again.

What is it about? Last year I wrote a nice set of scripts on installing Fusion Middleware and creating Fusion Middleware Weblogic domains.

I created the domain-creation-script based on scripts of Edwin Biemond, to get the credits straight.

However, in those scripts the  managed servers as wel as the admin server get Java Arguments set.

Those arguments refer to the logsHome property. I found that if you set JavaArgs you need to set redirects for weblogic.Stdout and weblogic.Stderr as well, like:
'-XX:PermSize=256m -XX:MaxPermSize=512m -Xms1024m -Xmx1532m -Dweblogic.Stdout='+logsHome+'AdminServer.out -Dweblogic.Stderr='+logsHome+'AdminServer_err.out'

Where logFolder should be an absolute path. This has to do with the context in which the nodemanager is starting the server. From that context the relative reference apparently does not evaluate to the proper location. You can however, leave the Java args empty.

So I changed my scripts to not use the getServerJavaArgs function, anymore, but get them from the property file. I replaced the xxxJavaArgsBase with xxxJavaArgs variables. And left them empty. Or you could simply hash-out the lines:
  #cd('ServerStart/'+server)
  #print ('. Set Arguments to: '+javaArgs)
  #set('Arguments' , javaArgs)

for both the AdminServer and ManagedServers.
The configurator doesn't set the JavaArgs, it leaves them over to the setDomain.sh/.cmd and setStartupEnv.sh/.cmd or setUserOverrides.sh/.cmd . If you do so, you can use a relative path, and the servers will start properly.

Tuesday 18 April 2017

New and improved (re-)start and stop scripts for Fusion MiddleWare.

Last year I created a set of start and stop scripts for Weblogic/Fusion MiddleWare. Although I was quite happy with them at the time, I found that there was quite a lot of duplicate code. And I think I could improve them by combining at least the wlst scripts into one, making it a lot better maintainable, but also opening up for restart functionality. And doing so make the scripts more generic. I took my (re-)start and stop script for OHS components as an example and came up with an improved set of start/stop scripts.

For the property files (fmw12_env.sh, fmw.properties, user key files, etc. ) I refer to  start and stop scripts for Weblogic/Fusion MiddleWare.


WLST

As said I created just one wlst script, startStopDomain.py:
#############################################################################
# Start, Stop, Restart FMW Domain
#
# @author Martien van den Akker, Darwin-IT Professionals
# @version 1.1, 2017-04-20
#
#############################################################################
# Modify these values as necessary
import sys, traceback
scriptName = sys.argv[0]
#
#
lineSeperator='__________________________________________________________________________________'
#
#
def usage():
  print 'Call script as: '
  print 'Windows: wlst.cmd '+scriptName+' -loadProperties localhost.properties'
  print 'Linux: wlst.sh '+scriptName+' -loadProperties environment.properties'
  print 'Property file should contain the following properties: '
  print "adminUrl=localhost:7001"
  print "adminUser=weblogic"
  print "adminPwd=welcome1"
# 
#
def connectToNM():
  try:
    wlsDomainHome = wlsDomainsHome+'/'+wlsDomainName
    print(lineSeperator)
    print('Try to connect to the Node Manager')
    try:
      nmConnect(userConfigFile=usrCfgFile, userKeyFile=usrKeyFile, host=nmHost, port=nmPort, domainName=wlsDomainName, domainDir=wlsDomainHome, nmType=nmType)
    except NameError, e:
      print('Apparently user config properties usrCfgFile and usrKeyFile not set.')
      print('Try to connect to the NodeManager adminUser and adminPwd properties')
      nmConnect(username=adminUser, password=adminPwd, host=nmHost, port=nmPort, domainName=wlsDomainName, domainDir=wlsDomainHome, nmType=nmType)
    print('Connected to the Node Mananger')
  except WLSTException:
    message='Apparently NodeManager not Started!'
    print (message)
    raise Exception(message)
#
# Start Admin Server
def startAdminServer(adminServerName):
  # Set wlsDomainHome
  print ('Connect to the Node Manager')
  connectToNM()
  print('Start AdminServer')
  nmStart(adminServerName)
#
# Connect To the AdminServer
def connectToAdminServer(adminUrl, adminServerName):
  try:
    print(lineSeperator)
    print('Try to connect to the AdminServer')
    try:
      connect(userConfigFile=usrCfgFile, userKeyFile=usrKeyFile, url=adminUrl)
    except NameError, e:
      print('Apparently user config properties usrCfgFile and usrKeyFile not set.')
      print('Try to connect to the AdminServer adminUser and adminPwd properties')
      connect(adminUser, adminPwd, adminUrl)
  except WLSTException:
    print('Apparently AdminServer not Started!')
    startAdminServer(adminServerName)
    print('Retry to connect to the AdminServer')
    try:
      connect(userConfigFile=usrCfgFile, userKeyFile=usrKeyFile, url=adminUrl)
    except NameError, e:
      print('Apparently user config properties usrCfgFile and usrKeyFile not set.')
      print('Try to connect to the AdminServer adminUser and adminPwd properties')
      connect(adminUser, adminPwd, adminUrl)
#
# Get the Name of the AdminServer of Domain 
def getAdminServerName():
  serverConfig()
  cd('/')
  adminServerName=cmo.getAdminServerName()
  return adminServerName
#
# Get the Servers of Domain 
def getDomainServers():
  print(lineSeperator)
  print('\nGet Servers from domain')
  serverConfig()
  cd("/")
  servers = cmo.getServers()
  return servers
#
# Get the Servers of Cluster 
def getClusterServers(clustername):
  #Cluster config to be fetched from ServerConfig
  print(lineSeperator)
  print('\nGet Servers from cluster '+clustername)
  serverConfig()
  cluster = getMBean("/Clusters/" + clustername)
  if cluster is None:
    errorMsg= "Cluster " + clustername + " does not appear to exist!"
    print errorMsg
    raise(Exception(errorMsg))
  print "Found cluster "+ clustername+ "."
  servers = cluster.getServers()
  return servers
#
# Get the Domain Clusters 
def getClusters():
  #Cluster config to be fetched from ServerConfig
  print(lineSeperator)
  print('\nGet Clusters')
  cd('/')
  clusters = cmo.getClusters()
  return clusters
#
# Get status of a server
def serverStatus(serverName):
  serverRuntime=getMBean('/ServerRuntimes/'+serverName)
  if serverRuntime is None:
    print("Server Runtime for  " + serverName + " not found, server apparently SHUTDOWN")
    serverState="SHUTDOWN"
  else:
    print "Found Server Runtime for "+ serverName+ "."
    serverState = serverRuntime.getState()
  return serverState
#
# Start Server
# Expected to be in domainRuntime to get to the serverRuntimes.
def startMServer(serverName):
  print(lineSeperator)
  print('ServerName: '+serverName)
  serverState = serverStatus(serverName)
  print('Server '+serverName+': '+serverState)
  if serverState=="SHUTDOWN":
    print ('Server '+serverName+' is not running so start it.')
    start(serverName,'Server')
  elif serverState=="RUNNING":
    print ('Server '+serverName+' is already running')
  else:
    print ('Server '+serverName+' in state '+serverState+', not startable!')
#
# Start servers in a cluster one by one.
def startMServers(serverList):
  print(lineSeperator)
  print ('Start servers from list: '+serverList)
  servers=serverList.split(',')
  # Need to go to domainRuntime to get to the serverRuntimes.
  domainRuntime()
  #
  for serverName in servers:
    startMServer(serverName)
  #
  print ('\nFinished starting servers.')
#
# Start servers in a cluster one by one.
def startClusterServers(clusterName):
  print(lineSeperator)
  print ('Start servers for cluster: '+clusterName)
  servers=getClusterServers(clusterName)
  # Need to go to domainRuntime to get to the serverRuntimes.
  domainRuntime()
  #
  for server in servers:
    serverName = server.getName()
    startMServer(serverName)
  #
  print ('\nFinished starting servers.')
#
# Start servers in domain one by one.
def startDomainServers():
  print(lineSeperator)
  print ('Start servers for domain')
  servers=getDomainServers()
  adminServerName=getAdminServerName()
  # Need to go to domainRuntime to get to the serverRuntimes.
  domainRuntime()
  #
  for server in servers:
    serverName = server.getName()
    if (serverName!=adminServerName):
      startMServer(serverName)
    else:
      print('Skip '+adminServerName)
  #
  print ('\nFinished starting servers.')
#
# Stop Server
# Expected to be in domainRuntime to get to the serverRuntimes.
def stopMServer(serverName):
  print(lineSeperator)
  print('ServerName: '+serverName)
  serverState = serverStatus(serverName)
  print('Server '+serverName+': '+serverState)
  if serverState=="RUNNING":
    print ('Server '+serverName+' is running so shut it down.')
    shutdown(name=serverName, entityType='Server', force='true')
  elif serverState=="SHUTDOWN":
    print ('Server '+serverName+' is already down.')
  else:
    print ('Server '+serverName+' in state '+serverState+', but try to stop it!')
    shutdown(name=serverName, entityType='Server', force='true')
#
# Stop servers in a list
def stopMServers(serverList):
  print(lineSeperator)
  print ('Stop servers from list: '+serverList)
  servers=serverList.split(',')
  # Need to go to domainRuntime to get to the serverRuntimes.
  domainRuntime()
  #
  for serverName in servers:
    stopMServer(serverName)
  #
  print ('\nFinished stopping servers.')
#
# Stop servers in a cluster one by one.
def stopClusterServers(clusterName):
  print(lineSeperator)
  print ('Stop servers for cluster: '+clusterName)
  servers=getClusterServers(clusterName)
  # Need to go to domainRuntime to get to the serverRuntimes.
  domainRuntime()
  #
  for server in servers:
    serverName = server.getName()
    stopMServer(serverName)
  #
  print ('\nFinished stopping servers.')
#
# Stop servers in a domain one by one.
def stopDomainServers():
  print(lineSeperator)
  print ('Stop servers for domain')
  servers=getDomainServers()
  adminServerName=getAdminServerName()
  # Need to go to domainRuntime to get to the serverRuntimes.
  domainRuntime()
  #
  for server in servers:
    serverName = server.getName()
    if (serverName!=adminServerName):
      stopMServer(serverName)
    else:
      print('Skip '+adminServerName)
  #
  print ('\nFinished stopping servers.')
#
# Start cluster
def startCluster(clusterName):
  print(lineSeperator)
  print ('Start cluster: '+clusterName)
  try:
    start(clusterName,'Cluster')
  except WLSTException:
    print "Apparently Cluster in incompatible state!", sys.exc_info()[0], sys.exc_info()[1]
    startClusterServers(clusterName)
  state(clusterName,'Cluster')
  #
  print ('\nFinished starting cluster '+clusterName)
#
# Start clusters in a list
def startClusters(clusterList):
  print(lineSeperator)
  print ('Start clusters from list: '+clusterList)
  clusters=clusterList.split(',')
  #
  for clusterName in clusters:
    startCluster(clusterName)
  #
  print ('\nFinished starting clusters.')
#
# Start clusters
def startDomainClusters():
  print(lineSeperator)
  print ('Start clusters')
  clusters=getClusters()
  #
  for cluster in clusters:
    clusterName = cluster.getName()
    startCluster(clusterName)
  #
  print ('\nFinished starting clusters.')
#
# Stop cluster
def stopCluster(clusterName):
  print(lineSeperator)
  print ('Stop cluster: '+clusterName)
  try:
    #shutdown(clusterName,'Cluster')
    shutdown(name=serverName, entityType='Cluster', force='true')
    state(clusterName,'Cluster')
  except WLSTException:
    print "Apparently Cluster in incompatible state!", sys.exc_info()[0], sys.exc_info()[1]
    state(clusterName,'Cluster')
    print ('Try to stop servers for cluster: '+clusterName+', one by one')
    stopClusterServers(clusterName)
  #
  print ('\nFinished stopping cluster '+clusterName)
#
# Stop clusters in a list
def stopClusters(clusterList):
  print(lineSeperator)
  print ('Stop clusters from list: '+clusterList)
  clusters=clusterList.split(',')
  #
  for clusterName in clusters:
    stopCluster(clusterName)
  #
  print ('\nFinished stopping clusters.')
#
# Stop all clusters of the domain
def stopDomainClusters():
  print(lineSeperator)
  print ('Stop clusters')
  clusters=getClusters()
  #
  for cluster in clusters:
    clusterName = cluster.getName()
    stopCluster(clusterName)
  #
  print ('\nFinished stopping clusters.')
#
# StopAdmin
def stopAdmin():
  print(lineSeperator)
  print('Stop '+adminServerName)
  shutdown(force='true')
  serverState = serverStatus(serverName)
  print('State '+adminServerName+': '+ serverState)
  print('\nFinished stopping AdminServer: '+adminServerName)
#
# Start admin Server
def startAdmin():
  print(lineSeperator)
  print('Start and connect to '+adminServerName)
  connectToAdminServer(adminUrl, adminServerName)
  print('\nFinished starting AdminServer: '+adminServerName)
#
# Stop a domain
def stopDomain():
  print (lineSeperator)
  stopDomainClusters()
  #
  stopAdmin()
  print ('\nFinished stopping domain.')
  #
#
# Start a Domain
def startDomain():
  print (lineSeperator)
  #
  startAdmin()
  #
  startDomainClusters()
  #
  print ('\nFinished starting domain.')
  #
#
# (Re-)Start or Stop Domain
def startStopDomain(startStopOption):
  if startStopOption=="stop" or startStopOption=="restart":
    stopDomain()
  if startStopOption=="start" or startStopOption=="restart":
     startDomain()
#
# (Re-)Start or Stop Admin
def startStopAdmin(startStopOption):
  if startStopOption=="stop" or startStopOption=="restart":
    stopAdmin()
  if startStopOption=="start" or startStopOption=="restart":
     startAdmin()
#
# (Re-)Start or Stop Clusters
def startStopClusters(startStopOption, clusterList):
  if startStopOption=="stop" or startStopOption=="restart":
    if clusterList is None:
      stopDomainClusters()
    else:
      stopClusters(clusterList)
  if startStopOption=="start" or startStopOption=="restart":
    if clusterList is None:
      startDomainClusters()
    else:
      startClusters(clusterList)
#
# (Re-)Start or Stop Servers
def startStopServers(startStopOption, serverList):
  if startStopOption=="stop" or startStopOption=="restart":
    if serverList is None:
      stopDomainServers()
    else:
      stopMServers(serverList)
  if startStopOption=="start" or startStopOption=="restart":
    if serverList is None:
      startDomainServers()
    else:
      startMServers(serverList)
#
# Main
def main():
  try:
    print ('Args passed: '+ str(len(sys.argv) ))
    componentList = None
    idx = 0
    for arg in sys.argv:
      if idx == 0:
        print 'ScriptName: '+arg
      elif idx == 1:
        # startStopOption: start, stop, restart
        startStopOption=arg
      elif idx == 2:
        # componentType: AdminSvr, Domain, Clusters, Servers
        componentType=arg
      elif idx == 3:
        componentList=arg
      else:
        componentList = componentList+','+arg
      idx=idx+1
    #
    #componentName=sys.argv[3]
    # Set wlsDomainHome
    wlsDomainHome = wlsDomainsHome+'/'+wlsDomainName
    #
    print(lineSeperator)
    if componentList is None:
      componentStr = componentType
    else:
      componentStr = componentType+' '+componentList
    if startStopOption=="start" or startStopOption=="restart":
      print('(Re)Start '+componentStr+' in '+wlsDomainHome+', with option '+startStopOption)
    elif startStopOption=="stop":
      print('Stop '+componentStr+' in '+wlsDomainHome+', with option '+startStopOption)
    else:
      raise Exception("Unkown startStopOption: "+ startStopOption)
    #
    print(lineSeperator)
    print('\nConnect to the AdminServer: '+adminServerName)
    connectToAdminServer(adminUrl, adminServerName)
    #
    print('Connected, so proceed with: '+startStopOption+' of '+componentType)
    #
    if componentType=="AdminSvr":
      startStopAdmin(startStopOption)
    elif componentType=="Domain":
      startStopDomain(startStopOption)
    elif componentType=="Clusters":
      startStopClusters(startStopOption, componentList)
    elif componentType=="Servers":
      startStopServers(startStopOption, componentList)
    else:
      raise Exception('Undefined componentType: '+componentType);
    #
    print('\nExiting...')
    exit()
  except NameError, e:
    print('Apparently properties not set.')
    print "Please check the property: ", sys.exc_info()[0], sys.exc_info()[1]
    usage()
  except:
    apply(traceback.print_exception, sys.exc_info())
    exit(exitcode=1)
#call main()
main()
exit()

The script can be called with two parameters and a property file:

  • startStopOption: start, stop, restart
  • component: AdminSvr, Domain, Clusters, Servers
Based on the value of component the start, stop or restart option is issued on either the AdminServer, the Domain clusters, servers or the complete domain. The main function starts with connecting to the AdminServer. If the AdminServer is not started yet, it will connect to the NodeManager to start the AdminServer.

In case of a restart or a stop the particular components are stopped. And in case of a start or restart the particular components are started. 

The advancement of this script, in contrast with other start/stop scripts I've seen is that it will give a start command to a cluster. Doing so, all the servers in a cluster will get a start command simultaneously. For stopping the server in a cluster, the WLSException that can be raised in the stop command is caught, to try to stop try to stop the servers one by one. 

Another advancement of this script is that is determined which clusters there are in a domain, and these are traversed one by one.

Update 2017-04-20: Today I added also the possiblilty to start a list of clusters or servers. If you don't provide a list of clusters or servers, all the clusters or servers in the domain are (re-)started or stopped. Except for the AdminServer, by the way.

Bash

To call the wlst script I created a set of bash scripts.

To start the AdminServer, startAdmin.sh:
#!/bin/bash
#############################################################################
# Start AdminServer using wlst and connect to it.
#
# @author Martien van den Akker, Darwin-IT Professionals
# @version 2.2, 2016-04-18
#
#############################################################################
#  
. fmw12c_env.sh
echo
echo Start AdminServer
wlst.sh ./startStopDomain.py start AdminSvr -loadProperties fmw.properties

To stop or restart the AdminServer, replace start in 'wlst.sh ./startStopDomain.py start AdminSvr' by either stop or restart. And save the file to either stopAdmin.sh or restartAdmin.sh.

To create the corresponding files for Domain or Clusters replace AdminSvr in the same line with either Domains or Clusters.

I think you'll get the drift. And of course you should adapt the comments and echo's.

But maybe a more generic (re-)start or stop script is startStopDmn.sh:

#!/bin/bash
#############################################################################
# Start Domain components using wlst
#
# @author Martien van den Akker, Darwin-IT Professionals
# @version 1.0, 2017-04-19
#
#############################################################################
#
. fmw12c_env.sh
export START_STOP_OPTION=$1
export COMPONENT_TYPE=$2
export ENV=$3
export COMPONENT_NAME=$4
echo
echo "(Re-)Start or stop Domain components"
wlst.sh ./startStopDomain.py ${START_STOP_OPTION} ${COMPONENT_TYPE} "${COMPONENT_NAME}" -loadProperties ${ENV}.properties

Then you could do a startAdmin.sh like:
./startStopDmn.sh start AdminSvr fmw

And a stopAdmin.sh like:
./startStopDmn.sh stop AdminSvr fmw

Or a restartServers.sh like:
./startStopDmn.sh restart Servers fmw $1

Here you can provide a comma-separated list of servers, between double-quotes:
./restartServers "soa_server1,soa_server2"
etc. etc.
Advantage of this approach is that it allows you to use the same script set for multiple environemts, by duplicating and adapting the fmw.properties file to other environments (like domains).

Thursday 6 April 2017

BPM BAC Subversion Server refusing connections

These days I work on setting up several development lifecycle environments for BPM, SOA and OSB. What means that we setup servers for Development and test, culminating eventually in supporting the setup of the acceptance and production servers.

Since we want to have development resemble production as much as possible, we installed a dual-node clustered BPM environment, including BAM. However, since it is development, we have the two Managed Servers per cluster on the same phsyical (actually virtual) host.

In 12c BPM introduced a new component the Process Asset Manager, as described here. But installing BPM on a dual node cluster on a single host, will give problems with the underlying BAC/Subversion Component.

Errors you could encounter in the logs are for example:

Status of Start Up operation on target /Domain_o-bpm-1_domain/o-bpm-1_domain/BPMComposer is STATE_FAILED
[Deployer:149193]Operation "start" on application "BPMComposer" has failed on "bpm_server2".
java.lang.RuntimeException: weblogic.osgi.OSGiException: Could not find bundle with Bundle Symbolic Name of:org.tmatesoft.svn.core
               at weblogic.osgi.internal.OSGiAppDeploymentExtension.prepare(OSGiAppDeploymentExtension.java:273)
               at weblogic.application.internal.flow.AppDeploymentExtensionFlow.prepare(AppDeploymentExtensionFlow.java:25)
               at weblogic.application.internal.BaseDeployment$1.next(BaseDeployment.java:727)
               at weblogic.application.utils.StateMachineDriver.nextState(StateMachineDriver.java:45)
               at weblogic.application.internal.BaseDeployment.prepare(BaseDeployment.java:239)
               at weblogic.application.internal.EarDeployment.prepare(EarDeployment.java:66)
               at weblogic.application.internal.DeploymentStateChecker.prepare(DeploymentStateChecker.java:158)
               at weblogic.deploy.internal.targetserver.AppContainerInvoker.prepare(AppContainerInvoker.java:65)
               at weblogic.deploy.internal.targetserver.operations.ActivateOperation.createAndPrepareContainer(ActivateOperation.java:229)
               at weblogic.deploy.internal.targetserver.operations.StartOperation.createAndPrepareContainer(StartOperation.java:95)
               at weblogic.deploy.internal.targetserver.operations.StartOperation.doPrepare(StartOperation.java:108)
               at weblogic.deploy.internal.targetserver.operations.AbstractOperation.prepare(AbstractOperation.java:241)
               at weblogic.deploy.internal.targetserver.DeploymentManager.handleDeploymentPrepare(DeploymentManager.java:794)
               at weblogic.deploy.internal.targetserver.DeploymentManager.prepareDeploymentList(DeploymentManager.java:1340)
               at weblogic.deploy.internal.targetserver.DeploymentManager.handlePrepare(DeploymentManager.java:267)
               at weblogic.deploy.internal.targetserver.DeploymentServiceDispatcher.prepare(DeploymentServiceDispatcher.java:177)
               at weblogic.deploy.service.internal.targetserver.DeploymentReceiverCallbackDeliverer.doPrepareCallback(DeploymentReceiverCallbackDeliverer.java:186)
               at weblogic.deploy.service.internal.targetserver.DeploymentReceiverCallbackDeliverer.access$000(DeploymentReceiverCallbackDeliverer.java:14)
               at weblogic.deploy.service.internal.targetserver.DeploymentReceiverCallbackDeliverer$1.run(DeploymentReceiverCallbackDeliverer.java:47)
               at weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:666)
               at weblogic.invocation.ComponentInvocationContextManager._runAs(ComponentInvocationContextManager.java:348)
               at weblogic.invocation.ComponentInvocationContextManager.runAs(ComponentInvocationContextManager.java:333)
               at weblogic.work.LivePartitionUtility.doRunWorkUnderContext(LivePartitionUtility.java:54)
               at weblogic.work.PartitionUtility.runWorkUnderContext(PartitionUtility.java:41)
               at weblogic.work.SelfTuningWorkManagerImpl.runWorkUnderContext(SelfTuningWorkManagerImpl.java:640)
               at weblogic.work.ExecuteThread.execute(ExecuteThread.java:406)
               at weblogic.work.ExecuteThread.run(ExecuteThread.java:346)
Caused by: java.lang.RuntimeException: Could not find bundle with Bundle Symbolic Name of:org.tmatesoft.svn.core
               at weblogic.osgi.internal.OSGiAppDeploymentExtension.attachToApplicationBundleClassloader(OSGiAppDeploymentExtension.java:161)
               at weblogic.osgi.internal.OSGiAppDeploymentExtension.prepare(OSGiAppDeploymentExtension.java:249)
               ... 26 more
 
[Deployer:149034]An exception occurred for task [Deployer:149026]start application BPMComposer on bpm_cluster.: weblogic.osgi.OSGiException: Could not find bundle with Bundle Symbolic Name of:org.tmatesoft.svn.core.

Or during deployment of OracleBPMBACServerApp on one of the managed servers:
[2015-02-04T07:04:05.848+00:00] [BPM_Ms1_2] [ERROR] [] [oracle.bpm.bac.svnserver] [tid: [ACTIVE].ExecuteThread: '29' for queue: 'weblogic.kernel.Default (self-tuning)'] [userId: ] [ecid: 88957d95-82b2-4365-aaea-ac834a54599d-00000003,0] [APP: OracleBPMBACServerApp] Unexpected error starting BAC SVN Server.[[
oracle.bpm.bac.subversion.server.exception.ServerException: java.net.BindException: Address already in use

WLS version: 12.1.3.0.0
authentication-provider: default
Servers: AdminServer+ local_bpel_ms1 + local_bpel_ms2
Cluster: local_bpel_cluster, cluster-messaging-mode: unicast

Configuration manager:
Protocol: PLAIN
Topology: ACTIVE_ACTIVE_CLUSTER


As described in Doc ID 1987120.1 on Oracle Suppport, this is due to the fact that by default in oracle.bpm.bac.server.mbeans.metadata MBean the Bac Admin node and the BacNode use the same ports. (I think the cause is a bit mis-phrased in the note....)

To solve this:
  1. Login to Enterprise Manager console : http://host:port/em
  2. Expand WebLogic Domain
  3. Right click on the domain and click on System MBean Browser
  4. In System MBean Browser search for: oracle.bpm.bac.server.mbeans.metadata and expand it
  5. Expand the first BacNode and check that the AdminPort and also the BacNode port are different:
  6. Do the same thing for the second BacNode:
  7. Make sure that all 4 ports are different.
  8. Do a netstat on the machine and check also that the ports used by BAC servers are not already in use. Change them if necessary.
Port check can be done by:
[oracle@darlin-vce-bpm logs]$ netstat -tulpn |grep 8323
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.1.239:8323      0.0.0.0:*               LISTEN      23940/java
tcp        0      0 192.168.1.240:8323      0.0.0.0:*               LISTEN      23988/java
[oracle@darlin-vce-bpm logs]$ netstat -tulpn |grep 8424
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.1.240:8424      0.0.0.0:*               LISTEN      23988/java
[oracle@darlin-vce-bpm logs]$ netstat -tulpn |grep 8423
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 0.0.0.0:8423            0.0.0.0:*               LISTEN      23940/java
[oracle@darlin-vce-bpm logs]$
You see that the latest port, the AdminPort belonging to the empty AdminAddress on the bpm_server1 is listening on 0.0.0.0, in affect on every address. Apparently, it is not possible to set an explicit address. Since it is on one actual host, the bpm servers are attached to different virtual IP addresses.

Now, this enabled us to start both the OracleBPMBACServerApp  as well as the BPMComposer.
But when we started using the BPM Composer and opening a project and process, there were issues. We found that when you got in bpm_server2, all went well. But when the load balancer directed you to bpm_server1, you could get a Internal server error

We encountered the following errors in the diagnostic log on bpm_server1:
[2017-04-05T03:25:27.270+02:00] [bpm_server1] [NOTIFICATION] [] [oracle.bpm.bac.svnserver.configuration] [tid: Active Sync Thread [/b91abb78-6b3d-4448-af6a-e82125f261f0/]] [userId: <anonymous>] [ecid: 41a5a471-1881-4527-8638-c344af778e7c-0000000a,0:497] [APP: OracleBPMBACServerApp] [partition-name: DOMAIN] [tenant-name: GLOBAL] Default repository path: /u02/oracle/products/fmw/user_projects/domains/o-bpm-1_domain/bpm/bac/bpm_server2/repositories
[2017-04-05T03:25:27.273+02:00] [bpm_server1] [NOTIFICATION] [] [oracle.bpm.bac.svnserver.configuration] [tid: Active Sync Thread [/b91abb78-6b3d-4448-af6a-e82125f261f0/]] [userId: <anonymous>] [ecid: 41a5a471-1881-4527-8638-c344af778e7c-0000000a,0:497] [APP: OracleBPMBACServerApp] [partition-name: DOMAIN] [tenant-name: GLOBAL] Default repository path: /u02/oracle/products/fmw/user_projects/domains/o-bpm-1_domain/bpm/bac/bpm_server1/repositories
[2017-04-05T03:25:27.277+02:00] [bpm_server1] [ERROR] [] [oracle.bpm.bac.svnserver.replication] [tid: Active Sync Thread [/b91abb78-6b3d-4448-af6a-e82125f261f0/]] [userId: <anonymous>] [ecid: 41a5a471-1881-4527-8638-c344af778e7c-0000000a,0:497] [APP: OracleBPMBACServerApp] [partition-name: DOMAIN] [tenant-name: GLOBAL] org.tmatesoft.svn.core.SVNException: svn: E210003: connection refused by the server[[
oracle.bpm.bac.subversion.server.repository.exceptions.RepositoryException: org.tmatesoft.svn.core.SVNException: svn: E210003: connection refused by the server
        at oracle.bpm.bac.subversion.server.repository.exceptions.RepositoryException.wrap(RepositoryException.java:56)
        at oracle.bpm.bac.subversion.server.repository.SVNKitRepositorySession.getRepositoryUUID(SVNKitRepositorySession.java:98)
        at oracle.bpm.bac.subversion.server.repository.RepositorySVNSync.sync(RepositorySVNSync.java:74)
        at oracle.bpm.bac.subversion.server.repository.RepositorySVNSync.sync(RepositorySVNSync.java:59)
        at oracle.bpm.bac.subversion.server.repository.ha.aa.ActiveAARepository$Synchronizer.runImpl(ActiveAARepository.java:360)
        at oracle.bpm.bac.subversion.server.repository.ha.aa.ActiveAARepository$Synchronizer.run(ActiveAARepository.java:304)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.tmatesoft.svn.core.SVNException: svn: E210003: connection refused by the server
        at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:85)
        at org.tmatesoft.svn.core.internal.wc.SVNErrorManager.error(SVNErrorManager.java:69)
        at org.tmatesoft.svn.core.internal.io.svn.SVNPlainConnector.open(SVNPlainConnector.java:62)
        at org.tmatesoft.svn.core.internal.io.svn.SVNConnection.open(SVNConnection.java:77)
        at org.tmatesoft.svn.core.internal.io.svn.SVNRepositoryImpl.openConnection(SVNRepositoryImpl.java:1252)
        at org.tmatesoft.svn.core.internal.io.svn.SVNRepositoryImpl.testConnection(SVNRepositoryImpl.java:95)
        at org.tmatesoft.svn.core.io.SVNRepository.getRepositoryUUID(SVNRepository.java:280)
        at oracle.bpm.bac.subversion.server.repository.SVNKitRepositorySession.getRepositoryUUID(SVNKitRepositorySession.java:95)
        ... 5 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at org.tmatesoft.svn.core.internal.util.SVNSocketFactory.connect(SVNSocketFactory.java:112)
        at org.tmatesoft.svn.core.internal.util.SVNSocketFactory.createPlainSocket(SVNSocketFactory.java:68)
        at org.tmatesoft.svn.core.internal.io.svn.SVNPlainConnector.open(SVNPlainConnector.java:53)
        ... 10 more

]]

This occurs just on one of the two servers, in our case bpm_server1. Apparently, the Bac server can't run more than once on the same host. Or, in an active-active configuration, one of the bpm servers does not know how to connect. I don't know yet, how it is supposed to work internally.

Toggling the configuration to Single node won't work. Restarting the servers will cause the bpm composer unable to start at all. But switching to active passive in the BacConfigurationManager helped:

Then restarted the bpm servers and the BPM Composer and the underlying PAM/Bac/Subversion were working on both servers.


Tuesday 4 April 2017

Back from PaaSForum

So, and now we're back in business after a splendid week in Split, Croatia, to meet with about 200 great, enthusiastic con-colleagues from the fields of SOA and Weblogic. It was the yearly OPN Partner Community Forum, ran by the great Jürgen Kress. He did a formidable job to collect many informatic sessions on the latest news and products from Oracle. We could meet with product managers, eat  and drink with them. To my surprise I could shake hands with a Product Manager of PCS, with whom I worked closely on a try-out project at the end of last year.

It was at a great venu, Le Meridien, a short drive out side of Split. A nice hotel with fantastic views on the Adriatic Sea.

Last year on the 'OFM Forum' in Valencia, PaaS (Platform as a Service) as a 'MiddleWare Cloud' was an important subject. Oracle was heavily promoting  Cloud. This is why you could expect that this years forum was named 'PaaSForum'. And thus in one of the keynotes this was stressed by stating:


Now, our Dutch comedian Arjan Lubach made a video to introduce our country to president Trump:

So that made me introducing the statement:
Fun aside, I heard many nice things. To sum up a view:

  • On ICS side, there would be a REPL CLI script to move iar's (ICS Archives) between environments, created by the A-Team. I should look it up, but don't know if it already released it.
  • There was an introduction of the API platform, with also a workshop on it.
  • PCS and ICS are suggested to be combined into the Integration Cloud. Which would be a very logical idea, bringing much better, uniformed user experience. Because, at my latest cloud project, last year, we actually defined the practice to always integrate via ICS not directly from PCS.
  • There would also be a script, created by a Product Manager, to 'massage' BPM projects to transform them into a PCS compliant project. Hope to get my hands on that too, that could certainly be helpful. The otherway around would be not so bad either.
  • Then there was the mention of AI Apps, or Application Intelligent apps. Intelligence from verticals. This could drive PCS in a more intelligent way, based on knowledge from different verticals. 
But most interesting I found was the introduction of CMMN in PCS. Or otherwise put: Dynamic Processes in PCS. Where conventional BPMN processes in PCS run in a strict predefined way, with Dynamic Processes Case Management capabilities are introduced in PCS. Much like Adaptive Case Management in BPM Suite. However, build in a new way with a graphical modeller in the composer:

So here you see vertical lanes that represent phases in the process. In each phase you can define activities that can represent a Human Task, or for instance, a (sub-)process:
Activities can be repeatable, required or Manually Activated. Much like we can do in ACM.

Then using Decision rules you can decide under which condition the case is transferred to another phase, activities are activated or deactivated, etc. Or this can be done by the knowledge-worker self.


So I'm very much looking forward to get my fingers on this new functionality.

Another interesting new development is the introduction of Chat bots. They were so heavily mentioned and demoed, there hardly weren't any presentations without mentioning them. I almost felt I was one of the few that liked them, although the heavily relying on Facebook messenger. Even the, in my perception, breeding ground of chat bots, Twitter, wasn't free of criticism on the usecase of chatbots. I, on the other hand found a great non-functional use of chatbots: a text based adventure:

If we could combine PCS's dynamic processes with the Oracle Chat bot cloud service, we could develop a nice text adventure where you could chase beacons with in the process. Each phase in the process could represent a location, different processes can represent several areas on the map. But if we can have the process instances interact with eachother, for instance by registering their locations in a DBCS, or sending correlated signals (much nicer), then you could make it into a multiplayer text adventure. My colleague Rob introduced the idea to be able to upload pictures, of real-life locations. You could readout the Exif-data to check if the foto really represent a GPS-location.

So, that might be the next rolling-gag or app on the next forum in 2018, for which I already introduced the hash-tag: #ChatBotForum. In the aftermath of the week, at Jürgens afterparty, one of the others also mentioned text adventures. So, apparently, it wasn't such a far-fetched idea.

There's so much to think about, have it land, and much is already said. I'm looking forward to see how it all works out in the next year. The sun died on the #PaaSForum'17:


Working up to #ChatBotForum'18...