ETL scenario language specification

From Toolsverse Wiki
Jump to: navigation, search

ETL scenarios must be written in special XML-based language.

Contents

Language

  • ETL scenario must be a well formed XML. Check out official XML specification
  • For special characters which are not allowed in the well formed XML use correct XML substations
  • All node and attribute names are case sensitive
  • Values generally are not case sensitive except class names
  • Class names must be fully qualified. For example com.toolsverse.etl.driver.oracle.OracleDriver
  • Order of nodes in the node list matters. For examples if there are multiple <task> nodes under <tasks> they will be executed in order
  • For comments use
    <!-- text -->

Specification rules

  • If attribute has multiple possible values (for example parallel="false|true") the pipe ("|") character is used as a delimiter
  • First value in the pipe delimited string is a default value. For example in the pair "false|true" "false" is a default.
  • Most nodes and attributes are not required. The required nodes and attributes specifically marked using bold-italic font
  • If parent node is not required (for example <task>) but child is (for example <name>) it just means that you can skip entire section (for example <task>) but if it is there certain nodes and attributes must be there as well (for example <name>)
  • If node (or attribute) is not required but it is present and there is no value, the default value will be used. If there is no default value the node or attribute will be skipped
  • Some nodes have short and long forms, for example <driver>class_name</driver> vs <driver attributes />. You can use either short or long form but not both
  • Some nodes and attributes have multiple synonyms, for example <tasks|beforetasks> (node) and sql|code (attribute)
  • Attribute value "int" means that attribute value should be an integer
  • Attribute value "sql" means that attribute value should be SQL
  • Attribute value "code" means that attribute value should be code in one of the supported scripting languages (for example JavaScript) or SQL
  • For boolean attributes "false|true" it is also possible to use (case insensitive) yes, no, y, n, 1, 0

ETL scenario parser

ETL scenario parser is checking syntax before loading scenario. If there is a syntax error parser generates an exception. In most cases (but not all) it is able to identify position of the error.

  • If ETL scenario is not a well formed XML the parser generates an exception pointing out to the part of the XML which is not well formed
  • Parser always checks one error at the time, meaning if there are multiple errors it generates exception for the first found error
  • Parser checks all required nodes and attributes and if one of them is missing it generates an exception pointing out to the missing node or attribute
  • Parser checks all nodes and attributes defined by specification and if there is something it doesn't recognize (for example <dest> instead of <destination>) it generates an exception pointing out to the wrong node or attribute
  • In some cases (for example attributes and child nodes of the <variable_name> node) basically any attribute or parent node is allowed (including not defined by specification). In this case parser let it go and does not generate an exception
  • If wrong attribute value is used (for example varcharsize="abc" instead of varcharsize="123") or there is an attribute but there is no value (for example varcharsize="") the default value is used instead. If there is no default value the attribute is skipped
  • Is some cases, when node or attribute value is a class name the parser will try to load class immediately. If class does not exist the parser generates appropriate exception
  • If short form of the node is used (for example <driver>name</driver> instead of <driver attributes /> the default values for the attributes are used

Nodes and attributes

<?xml version="1.0" encoding="UTF-8"?>
<scenario parallel="false|true" requiresource="false|true" requiredest="false|true" notemp="false|true" >
   <name>scenario name</name>
   <description>scenario description</description>
   <script>script name</script>
   <driver>auto|driver_class_name</driver>
   <driver name="auto|driver_class_name" parent="parent_driver_class_name" literalsize="int" varcharsize="int" charsize="int" precision="int" scale="int" lineslimit="int" initsql="sql" />
   <onsave action="skip|save"/>
   <onpopulate action="skip|save"/>
   <onpersist action="skip|save"/>
   <onexecute action="|commit" />
   <allow>LOAD|EXTRACT|EXTRACT_LOAD</allow>
   <function>function_class_name<function>
   <metadata types=""/>
   <codegen>code_generator_class_name</codegen>
   <connection>connection name</connection>

   <variables>
      <variable_name name="variable name" value="value" label="label" sql="sql" type="type" global="true|false" attributes>
         <variable_nodes/>
      </variable_name>
   </variables>

   <execute>
      <scenario name="inner_scenario_name" action="LOAD|EXTRACT|EXTRACT_LOAD" parallel="false|true" loop="code" count="int" variable="var_name" pattern="pattern" field="field_name" looplang="SQL|JavaScript|Others" loop_connection="connection name" condition="code" conditionlang="SQL|JavaScript|Others" condition_connection="connection name" />
   </execute>

   <tasks|beforetasks>
      <task noconnection="false|true" commit="false|true">
         <name>task name</name>
         <tablename>table name</tablename>
         <class>task_class_name</class>
         <connection>task connection name</connection>
         <driver>auto|driver_class_name</driver>
         <driver name="auto|driver_class_name" parent="parent_driver_class_name" literalsize="int" varcharsize="int" charsize="int" precision="int" scale="int" lineslimit="int" initsql="sql" />
         <sql|code|cmd>code</sql|code|cmd>
         <using>using</using>
         <onexception action="continue|ignore|raise" mask="mask" />
         <variables>
            <variable_name name="variable name" value="value" attributes />
         </variables>
      </task>
   </tasks|beforetasks>

   <sources>
      <source independent="false|true" noconnection="false|true" empty="false|true" mandatory="false|true" encode="false|true" enabled="true|false" parallel="false|true" condition="code" conditionlang="SQL|JavaScript|Others" condition_connection="connection name" >
         <name>source name</name>
         <objectname|tablename>object or table name</objectname|tablename>
         <onpersist action="save|skip"/>
         <onpopulate action="save|skip"/>
         <source>linked source name</source>
         <extract keyname="key name" keyfield="key fields">

            <writer class="writer_class_name" attributes />
            <writer>writer_class_name</writer>

            <reader class="writer_class_name" attributes />
            <reader>reader_class_name</reader>

            <driver>auto|driver_class_name</driver>
            <driver name="auto|driver_class_name" parent="parent_driver_class_name" literalsize="int" varcharsize="int" charsize="int" precision="int" scale="int" lineslimit="int" initsql="sql" />

            <connection>source connection name</connection>
            <sql>sql</sql>
            <using>using</using>

            <onexception action="continue|ignore|raise|ignoreparseerror" mask="mask" />

            <variables>
               <variable_name name="variable name" value="value" attributes />
            </variables>
         </extract>

         <tasks>
            <task noconnection="false|true" commit="false|true" scope="after|pre|inline|before_etl">
               <name>task name</name>
               <tablename>table name</tablename>
               <class>task_class_name</class>
               <connection>task connection name</connection>
               <driver>auto|driver_class_name</driver>
               <driver name="auto|driver_class_name" parent="parent_driver_class_name" literalsize="int" varcharsize="int" charsize="int" precision="int" scale="int" lineslimit="int" initsql="sql" />
               <sql|code|cmd>code</sql|code|cmd>
               <using>using</using>
               <onexception action="continue|ignore|raise" mask="mask" />
               <variables>
                  <variable_name name="variable name" value="value" attributes />
               </variables>
            </task>
         </tasks>

      </source>
   </sources>

   <destinations>
      <destination encode="false|true" empty="false|true" tolerate="false|true" noconnection="false|true" type="regular|procedure|function|wait|table" enabled="true|false" parallel="false|true" scope="global|single" condition="code" conditionlang="SQL|JavaScript|Others" condition_connection="connection name">
         <name>destination name</name>
         <source>linked source name</source>
         <objectname|tablename>object or table name</objectname|tablename>
         <cursor table="table name" sql="sql" type="regular|temp|temporary" onfinish="keep|drop" />
         <metadata>true|false</metadata>
         <metadata indexes="false|true" suffix="suffix"/>
         <load stream="false|true" key="keys" action="insert|update|delete|merge">
            <connection>destination connection name</connection>

            <driver>auto|driver_class_name</driver>
            <driver name="auto|driver_class_name" parent="parent_driver_class_name" literalsize="int" varcharsize="int" charsize="int" precision="int" scale="int" lineslimit="int" initsql="sql" />

            <condition>sql</condition>
            <then>sql</then>
            <else>sql</else>
            <after>sql</after>
            <sql>sql</sql>

            <onexception action="continue|ignore|merge|raise" mask="mask" key="keys" savepoint="false|true" />

            <reader class="writer_class_name" attributes />
            <reader>reader_class_name</reader>

            <writer class="writer_class_name" attributes />
            <writer>writer_class_name</writer>

            <variables>
               <variable_name name="variable name" tablename="table name" function="function" class="function_class_name" field="field" value="value" sql|code="sql|code" lang="SQL|JavaScript|Others" type="sql" linked="linked var name" label="label" param="params" scope="before|after|runtime" destination="linked destination name" tolerate="false|true" include="true|false" exclude="false|true" add="false|true" global="false|true" declare="sql" sqltype="sqltype" nativetype="nativetype" attributes />
            </variables>

            <cache>cache_class_name</cache>
         </load>

         <tasks>
            <task noconnection="false|true" commit="false|true" scope="after|pre|inline|before_etl" >
               <name>task name</name>
               <tablename>table name</tablename>
               <class>task_class_name</class>
               <connection>task connection name</connection>
               <driver>auto|driver_class_name</driver>
               <driver name="auto|driver_class_name" parent="parent_driver_class_name" literalsize="int" varcharsize="int" charsize="int" precision="int" scale="int" lineslimit="int" initsql="sql" />
               <sql|code|cmd>code</sql|code|cmd>
               <using>using</using>
               <onexception action="continue|ignore|raise" mask="mask" />
               <variables>
                  <variable_name name="variable name" value="value" attributes />
               </variables>
            </task>
         </tasks>

      </destination>
   </destinations>

   <aftertasks>
      <task noconnection="false|true" commit="false|true">
         <name>task name</name>
         <tablename>table name</tablename>
         <class>task_class_name</class>
         <connection>task connection name</connection>
         <driver>auto|driver_class_name</driver>
         <driver name="auto|driver_class_name" parent="parent_driver_class_name" literalsize="int" varcharsize="int" charsize="int" precision="int" scale="int" lineslimit="int" initsql="sql" />
         <sql|code|cmd>code</sql|code|cmd>
         <using>using</using>
         <onexception action="continue|ignore|raise" mask="mask" />
         <variables>
            <variable_name name="variable name" value="value" attributes />
         </variables>
      </task>
   </aftertasks>

</scenario>

Personal tools
Namespaces

Variants
Actions
Navigation
Tools