IBMStreams
diff --git a/‎_includes/docHeader.html‎
Lines changed: 1 addition & 1 deletion b/‎_includes/docHeader.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/4.3/qse-getting-started.markdown‎
Lines changed: 7 additions & 18 deletions b/‎docs/4.3/qse-getting-started.markdown‎
Lines changed: 7 additions & 18 deletions
diff --git a/‎docs/python/1.6/python-appapi-devguide-4.md‎
Lines changed: 206 additions & 2 deletions b/‎docs/python/1.6/python-appapi-devguide-4.md‎
Lines changed: 206 additions & 2 deletions
diff --git a/‎docs/python/1.6/python-appapi-devguide-6.md‎
Lines changed: 3 additions & 0 deletions b/‎docs/python/1.6/python-appapi-devguide-6.md‎
Lines changed: 3 additions & 0 deletions
@@ -12,7 +12,7 @@
 			<ul class="nav navbar-nav">
 				<li><a href="http://ibmstreams.github.io" target="_blank">GitHub</a>
 				</li>
-				<li><a href="https://developer.ibm.com/streamsdev/"
+				<li><a href="https://ibm.biz/streams-community"
 					target="_blank">Community</a></li>
 				<li><a
 					href="http://www.ibm.com/support/knowledgecenter/SSCRJU/SSCRJU_welcome.html"
 
@@ -123,26 +123,24 @@ You can connect to external data sources using toolkits.  A toolkit is a reusabl
 
 Streams includes toolkits that support the most popular systems like [HDFS](https://github.com/IBMStreams/streamsx.hdfs), [HBase](https://github.com/IBMStreams/streamsx.hbase), [Kafka](https://ibmstreams.github.io/streamsx.kafka/docs/user/overview/), Active MQ and more. 
 
-Refer to the [Product Toolkits Overview](https://developer.ibm.com/streamsdev/docs/product-toolkits-overview/) for a full list of toolkits included in Streams.
+Refer to the [Product Toolkits Overview](https://www.ibm.com/support/knowledgecenter/de/SSCRJU_4.3.0/com.ibm.streams.ref.doc/doc/spltoolkits_intro.html) for a full list of toolkits included in Streams.
 
 **Find more toolkits on GitHub**
 
 In addition to the toolkits included in the install, [IBMStreams on GitHub](https://github.com/ibmstreams) includes open sour a platform that enables Streams to rapidly add support for emerging technologies.  It also includes sample applications and helpful utilities.  
 
-For a list of open-source projects hosted on GitHub, see: [IBM Streams GitHub Projects Overview](https://developer.ibm.com/streamsdev/docs/github-projects-overview/).
-
 ### Streams and SPSS
 
 SPSS is analytic predictive software that enables you to build predictive models from your data.  Your application can perform real-time predictive scoring by running these predictive models using the SPSS operators.
 
-To learn about Streams can integrate with SPSS:  [Streams and SPSS Lab](https://developer.ibm.com/streamsdev/docs/spss-analytics-toolkit-lab/).
+To learn about Streams can integrate with SPSS:  [Streams and SPSS Lab](https://ibmstreams.github.io/streamsx.documentation/docs/spss/spss-analytics/).
 
 
 ### Streams and Microsoft Excel
 
 <img src="/streamsx.documentation/images/qse/BargainIndex1.jpg" alt="Streams and Excel" style="width: 60%;"/>
 
-IBM Streams integrates with Microsoft Excel, allowing you to see, analyze and visualize live streaming data in an Excel worksheet.  This article helps you get started:  [Streams for Microsoft Excel](https://developer.ibm.com/streamsdev/docs/streams-4-0-streams-for-microsoft-excel/)
+IBM Streams integrates with Microsoft Excel, allowing you to see, analyze and visualize live streaming data in an Excel worksheet.
 
 In the following demo, we demonstrate how you may build a marketing dashboard from real-time data using Excel.
 
@@ -152,20 +150,14 @@ Video:  Streams and Excel Demo
 
 ### Operational Decision Manager (ODM)
 
-IBM Streams integrates with ODM rules, allowing you to create business rules, construct rule flows, and create and deploy rules applications to analyze data and automate decisions in real-time.  This article helps you get started:  [ODM Toolkit Lab](https://developer.ibm.com/streamsdev/docs/rules-toolkit-lab/)
+IBM Streams integrates with ODM rules, allowing you to create business rules, construct rule flows, and create and deploy rules applications to analyze data and automate decisions in real-time.  This article helps you get started:  [ODM Toolkit Lab](https://community.ibm.com/community/user/cloudpakfordata/viewdocument/integrating-business-rules-in-real?CommunityKey=c0c16ff2-10ef-4b50-ae4c-57d769937235&tab=librarydocuments)
 
 
 ### Integration with IBM InfoSphere Data Governance Catalog
 
 With IBM InfoSphere Data Governance Catalog integration, developers can easily discover the data and schema that are available for use.  By building data lineage with your Streams application, you can quickly see and control how data is consumed.
-To get started, see the  [Streams Governance Quick Start Guide](../governance/governance-quickstart/).
-
-
-### SparkMLLib in Streams
-
-To get started, follow this development guide:
+To get started, see the  [Streams Governance Quick Start Guide](https://ibmstreams.github.io/streamsx.documentation/docs/4.2/governance/governance-quickstart/).
 
-* [SparkMLLib Getting Started Guide](https://developer.ibm.com/streamsdev/docs/getting-started-with-the-spark-mllib-toolkit/)
 
 ### Apache Edgent (aka Open Embedded Streams) Integration
 
@@ -178,11 +170,8 @@ Gather local, real-time analytics from equipment, vehicles, systems, appliances,
 ## Streams Community
 The following Streams resources can help you connect with the Streams community and get support when you need it:
 
-* **[Streamsdev](https://developer.ibm.com/streamsdev/)** - This resource is a developer-to-developer website maintained by the Streams Development Team.  It contains many useful articles and getting started material.  Check back often for new articles, tips and best practices to this website.
-* **[Streams Forum](https://www.ibmdw.net/answers/questions/?community=streamsdev&sort=newest&refine=none)** - This forum enables you to ask, and get answers to your questions, related to IBM Streams. If you have questions, start here.
-* **[IBMStreams on GitHub](http://ibmstreams.github.io)** - Streams is shipped with many useful toolkits out of the box.  IBMStreams on GitHub  contains many open-source toolkits.  For a list of available toolkits available on GitHub, see this web page:  [IBMStreams GitHub Toolkits](https://developer.ibm.com/streamsdev/docs/github-projects-overview/).
-* **[IBM Streams Support](http://www.ibm.com/support/entry/portal/Overview/Software/Information_Management/InfoSphere_Streams)** - This website provides information about IBM Streams downloads, technical support tools, documentation, and other resources.
-* **[IBM Streams Product Site](http://www.ibm.com/analytics/us/en/technology/stream-computing/)** - This website provides a broad range of information and resources about Streams and related topics.
+* **[Streams Community](https://ibm.biz/streams-community)** - This resource is a developer-to-developer website maintained by the Streams Development Team.  It contains many useful articles and getting started material.
+* **[IBMStreams on GitHub](http://ibmstreams.github.io)** - Streams is shipped with many useful toolkits out of the box.  IBMStreams on GitHub  contains many open-source toolkits.
 
 
 <!-- Modal -->
 
@@ -40,7 +40,7 @@ This section will discuss how to use the most common functions and transforms in
   - [Split the stream into dedicated streams](#split_func)
 * [Joining streams](#union)
 * [Sharing data between Streams applications](#publish)
-
+* [Defining a stream's schema](#schema)
 
 
 <a id="intro"></a>
@@ -292,7 +292,7 @@ Reading from a file or using a file within your Streams application can be done
 
 However, you must use `Topology.add_file_dependency` to ensure that the file or its containing directory will be available at runtime.
 
-Note: If you are using **IBM Cloud Pak for Data** , this [post discusses how to use a data set in your Streams Topology](https://developer.ibm.com/streamsdev/2019/04/23/tip-for-ibm-cloud-private-for-data-how-to-use-local-data-sets-in-your-streams-python-notebook/).
+Note: If you are using **IBM Cloud Pak for Data** , this [post discusses how to use a data set in your Streams Topology](https://community.ibm.com/community/user/cloudpakfordata/viewdocument/how-to-use-local-files-in-a-streams?CommunityKey=c0c16ff2-10ef-4b50-ae4c-57d769937235&tab=librarydocuments).
 
 ~~~python
 topo = Topology("ReadFromFile")
@@ -2255,3 +2255,207 @@ The contents of your output file look something like this:
 
 For more information, see [Publish-subscribe overview](https://streamsxtopology.readthedocs.io/en/stable/streamsx.topology.html#publish-subscribe-overview).
 
+
+<a id="schema"></a>
+## Defining a stream's schema
+
+A stream represents an unbounded flow of tuples with a declared schema so that each tuple on the stream complies with the schema.
+
+A stream's schema may be one of:
+
+* **StreamsSchema** structured schema - a tuple is a sequence of attributes, and an attribute is a named value of a specific type.
+* **Json** - a tuple is a JSON object.
+* **String** - a tuple is a string.
+* **Python** - a tuple is any Python object, effectively an untyped stream.
+
+The application below uses the `Stream.map()` callable between a data source and data sink callable: 
+
+![Stream schema](../../../../images/python/stream_schema.png)
+
+The diagram contains labels for `stream1`, `stream2` and `outputSchema` since they are used in the code block and table below. Each SPL operator output port and corresponding stream are defined by a schema. In a Python toplogy application the `CommonSchema.Python` is the default schema for Python operators.
+
+In this sample the output schema is defined with the `schema` parameter of the `map()` function. 
+
+~~~python
+outputSchema = CommonSchema.Python
+stream2 = stream1.map(lambda t:t, schema=outputSchema)
+~~~
+
+The table below contains examples of the schema definition and the corresponding SPL schema that is generated by "streamsx.topology" when creating the application.
+
+| Schema type | Schema in Python | Schema in generated SPL |
+| ------------- | ------------- | ------------- |
+| Python| `outputSchema = CommonSchema.Python` | ```tuple<blob __spl_po>``` |
+| String | `outputSchema = CommonSchema.String` | ```tuple<rstring string>``` |
+| Json| `outputSchema = CommonSchema.Json` | ```tuple<rstring jsonString>``` |
+| StreamsSchema | `outputSchema = 'tuple<int64 intAttribute, rstring strAttribute>'` | ```tuple<int64 intAttribute, rstring strAttribute>``` |
+
+So far in this *development guide*, we don't use schemas explicitly. But in a large application it is good design to define structured schema(s).
+
+And in certain cases, you must have a schema different than `CommonSchema.Python`:
+
+* when writing an application using different kinds of callables (Streams SPL operators), because the Python schema is not supported in SPL Java primitive and SPL C++ primitive operators.
+* when using **publish** and **subscribe** between different applications (if one application is **not** using Python operators)
+* when creating a job as service endpoint to consume/produce data via REST using **EndpointSink** or **EndpointSource** [streamsx.service](https://streamsxtopology.readthedocs.io/en/stable/streamsx.service.html)
+
+### Structured Schema
+
+Structured schema can be declared a number of ways:
+
+* An instance of `typing.NamedTuple`
+* An instance of `StreamSchema`
+* A string of the format `tuple<...>` defining the attribute names and types.
+* A string containing a namespace qualified SPL stream type (e.g. `com.ibm.streams.geospatial::FlightPathEncounterTypes.Observation3D`)
+
+Structured schemas provide type-safety and efficient network serialization when compared to passing a dict using Python streams.
+
+#### Topology.source()
+
+* **No** support of explicit schema definition
+* Generates **CommonSchema.Python** by **default**
+* Use type hint at the "source" callable to generate a structured schema stream
+
+In the sample below, the **type hint** `-> Iterable[SampleSourceSchema]` is added to the `__call__(self)` method in the class used as callable in your source.
+The structured schema `SampleSourceSchema` is defined a named tuple.
+
+~~~python
+from streamsx.topology.topology import Topology
+import streamsx.topology.context
+from typing import Iterable, NamedTuple
+import itertools, random
+
+class SampleSourceSchema(NamedTuple):
+    id: str
+    num: int
+
+# Callable of the Source
+class SampleSource(object):
+    def __call__(self) -> Iterable[SampleSourceSchema]: 
+        for num in itertools.count(1):
+            yield {"id": str(num), "num" : random.randint(0,num)}
+
+topo = Topology("sample-source-structured-stream")
+src = topo.source(SampleSource())
+src.print()
+streamsx.topology.context.submit("STANDALONE", topo)
+~~~
+
+#### Structured schema passing styles (dict vs. named tuple)
+
+In the former example the source callable returned a *dict*. You can also return *named tuple* objects and in both cases the downstream callable tuples are passed in *named tuple* style. 
+
+~~~python
+from streamsx.topology.topology import Topology
+import streamsx.topology.context
+from typing import Iterable, NamedTuple
+import itertools, random
+
+class SampleSourceSchema(NamedTuple):
+    id: str
+    num: int
+
+# Callable of the Source
+class SampleSource(object):
+    def __call__(self) -> Iterable[SampleSourceSchema]: 
+        for num in itertools.count(1):
+            output_event = SampleSourceSchema(
+                id = str(num),
+                num = random.randint(0,num)
+            )
+            yield output_event
+
+class SampleMapSchema(NamedTuple):
+    idx: str
+    number: int
+
+def map_namedtuple_to_namedtuple(tpl) -> SampleMapSchema:
+    out = SampleMapSchema(
+       idx = 'x-' + tpl.id,
+       number = tpl.num + 1
+    )
+    return out
+
+topo = Topology("sample-namedtuple-structured-stream1")
+stream1 = topo.source(SampleSource())
+stream2 = stream1.map(map_namedtuple_to_namedtuple)
+stream2.print()
+streamsx.topology.context.submit("STANDALONE", topo)
+~~~
+
+*Does a type hint replace the use of specifying the schema parameter when calling the map transform?*
+
+If `schema` is set, then the return type is defined by the schema parameter. Otherwise if `schema` is not set then the return type hint on `func` define the schema of the returned stream, defaulting to `CommonSchema.Python` if no type hints are present.
+
+Find below the same sample using *dict* style in "source" callable, but the type hint with *named tuple* schema causes that tuples are passed in *named tuple* style to map() callable. 
+
+~~~python
+from streamsx.topology.topology import Topology
+import streamsx.topology.context
+from typing import Iterable, NamedTuple
+import itertools, random
+
+class SampleSourceSchema(NamedTuple):
+    id: str
+    num: int
+
+# Callable of the Source
+class SampleSource(object):
+    def __call__(self) -> Iterable[SampleSourceSchema]: 
+        for num in itertools.count(1):
+            yield {"id": str(num), "num" : random.randint(0,num)}
+
+class SampleMapSchema(NamedTuple):
+    idx: str
+    number: int
+
+def map_namedtuple_to_namedtuple(tpl) -> SampleMapSchema:
+    out = SampleMapSchema(
+       idx = 'x-' + tpl.id,
+       number = tpl.num + 1
+    )
+    return out
+
+topo = Topology("sample-namedtuple-structured-stream2")
+stream1 = topo.source(SampleSource())
+stream2 = stream1.map(map_namedtuple_to_namedtuple)
+stream2.print()
+streamsx.topology.context.submit("STANDALONE", topo)
+~~~
+
+
+The following samples uses a SPL operator [streamsx.standard.utility.Sequence](https://streamsxstandard.readthedocs.io/en/latest/generated/streamsx.standard.utility.html#streamsx.standard.utility.Sequence) generating a structured schema [streamsx.standard.utility.SEQUENCE_SCHEMA](https://streamsxstandard.readthedocs.io/en/latest/generated/streamsx.standard.utility.html#streamsx.standard.utility.SEQUENCE_SCHEMA)
+Here you see the difference to the previous sample, that the tuples are passed to the Python callable in *dict* style (see `Delta()` class used in `streams1.map(Delta())`. Furthermore this sample demonstrates how to extend a structured schema with [streamsx.topology.schema.StreamSchema.extend](https://streamsxtopology.readthedocs.io/en/stable/streamsx.topology.schema.html#streamsx.topology.schema.StreamSchema.extend) function. In the `map()` callable the new attribute `d` is set. 
+
+~~~python
+from streamsx.topology.topology import Topology
+import streamsx.topology.context
+from typing import Iterable, NamedTuple
+import streamsx.standard.utility as U
+from streamsx.topology.schema import StreamSchema
+
+class Delta(object):
+    def __init__(self):
+        self._last = None
+    def __call__(self, v):
+        if v['seq'] == 0:
+            self._last = v['ts']
+            return None
+        else:
+            v['d'] = v['ts'].time() - self._last.time()
+            return v
+
+topo = Topology("sample-dict-structured-stream")
+stream1 = topo.source(U.Sequence(iterations=50, period=0.2)) # output schema: tuple<uint64 seq, timestamp ts>
+E = U.SEQUENCE_SCHEMA.extend(StreamSchema('tuple<float64 d>'))
+stream2 = stream1.map(Delta(), schema=E) # output schema: tuple<uint64 seq, timestamp ts, float64 d>
+stream2.print()
+streamsx.topology.context.submit("STANDALONE", topo)
+~~~
+
+Summary:
+ - Passing style (dict/named tuple) in your callable depends on the predecessor callable/operator.
+ - When **named tuple** schema is defined in predecessor callable/operator, then expect passing style named tuple in your Python callable.
+ - Use either **name tuple** schema or **StreamsSchema** between SPL operators and Python callables.
+
+
+
@@ -8,6 +8,9 @@ tag: py16
 prev:
   file: python-appapi-devguide-5
   title: "API features: Scalability, fault tolerance"
+next:
+  file: python-appapi-devguide-7
+  title: "Working with SPL toolkits"
 ---
 
 Depending on the problem at hand, a developer might choose to create an IBM Streams application in a particular programming language. To this end, the 'streamsx.topology' project supports APIs in Java, Scala, Python, and IBM Streams Processing Language (SPL). Regardless of the language used to develop and submit the application, however, it becomes necessary to monitor the application while it is running. By monitoring the application, you can observe runtime information regarding the application or its environment, for example: