Skip to content

Commit 101e466

Browse files
committed
Merge branch 'gh-pages' into jobs-cpd35
2 parents e340cdf + 58b137c commit 101e466

File tree

9 files changed

+334
-24
lines changed

9 files changed

+334
-24
lines changed

_includes/docHeader.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
<ul class="nav navbar-nav">
1313
<li><a href="http://ibmstreams.github.io" target="_blank">GitHub</a>
1414
</li>
15-
<li><a href="https://developer.ibm.com/streamsdev/"
15+
<li><a href="https://ibm.biz/streams-community"
1616
target="_blank">Community</a></li>
1717
<li><a
1818
href="http://www.ibm.com/support/knowledgecenter/SSCRJU/SSCRJU_welcome.html"

docs/4.3/qse-getting-started.markdown

Lines changed: 7 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -123,26 +123,24 @@ You can connect to external data sources using toolkits. A toolkit is a reusabl
123123

124124
Streams includes toolkits that support the most popular systems like [HDFS](https://github.com/IBMStreams/streamsx.hdfs), [HBase](https://github.com/IBMStreams/streamsx.hbase), [Kafka](https://ibmstreams.github.io/streamsx.kafka/docs/user/overview/), Active MQ and more.
125125

126-
Refer to the [Product Toolkits Overview](https://developer.ibm.com/streamsdev/docs/product-toolkits-overview/) for a full list of toolkits included in Streams.
126+
Refer to the [Product Toolkits Overview](https://www.ibm.com/support/knowledgecenter/de/SSCRJU_4.3.0/com.ibm.streams.ref.doc/doc/spltoolkits_intro.html) for a full list of toolkits included in Streams.
127127

128128
**Find more toolkits on GitHub**
129129

130130
In addition to the toolkits included in the install, [IBMStreams on GitHub](https://github.com/ibmstreams) includes open sour a platform that enables Streams to rapidly add support for emerging technologies. It also includes sample applications and helpful utilities.
131131

132-
For a list of open-source projects hosted on GitHub, see: [IBM Streams GitHub Projects Overview](https://developer.ibm.com/streamsdev/docs/github-projects-overview/).
133-
134132
### Streams and SPSS
135133

136134
SPSS is analytic predictive software that enables you to build predictive models from your data. Your application can perform real-time predictive scoring by running these predictive models using the SPSS operators.
137135

138-
To learn about Streams can integrate with SPSS: [Streams and SPSS Lab](https://developer.ibm.com/streamsdev/docs/spss-analytics-toolkit-lab/).
136+
To learn about Streams can integrate with SPSS: [Streams and SPSS Lab](https://ibmstreams.github.io/streamsx.documentation/docs/spss/spss-analytics/).
139137

140138

141139
### Streams and Microsoft Excel
142140

143141
<img src="/streamsx.documentation/images/qse/BargainIndex1.jpg" alt="Streams and Excel" style="width: 60%;"/>
144142

145-
IBM Streams integrates with Microsoft Excel, allowing you to see, analyze and visualize live streaming data in an Excel worksheet. This article helps you get started: [Streams for Microsoft Excel](https://developer.ibm.com/streamsdev/docs/streams-4-0-streams-for-microsoft-excel/)
143+
IBM Streams integrates with Microsoft Excel, allowing you to see, analyze and visualize live streaming data in an Excel worksheet.
146144

147145
In the following demo, we demonstrate how you may build a marketing dashboard from real-time data using Excel.
148146

@@ -152,20 +150,14 @@ Video: Streams and Excel Demo
152150

153151
### Operational Decision Manager (ODM)
154152

155-
IBM Streams integrates with ODM rules, allowing you to create business rules, construct rule flows, and create and deploy rules applications to analyze data and automate decisions in real-time. This article helps you get started: [ODM Toolkit Lab](https://developer.ibm.com/streamsdev/docs/rules-toolkit-lab/)
153+
IBM Streams integrates with ODM rules, allowing you to create business rules, construct rule flows, and create and deploy rules applications to analyze data and automate decisions in real-time. This article helps you get started: [ODM Toolkit Lab](https://community.ibm.com/community/user/cloudpakfordata/viewdocument/integrating-business-rules-in-real?CommunityKey=c0c16ff2-10ef-4b50-ae4c-57d769937235&tab=librarydocuments)
156154

157155

158156
### Integration with IBM InfoSphere Data Governance Catalog
159157

160158
With IBM InfoSphere Data Governance Catalog integration, developers can easily discover the data and schema that are available for use. By building data lineage with your Streams application, you can quickly see and control how data is consumed.
161-
To get started, see the [Streams Governance Quick Start Guide](../governance/governance-quickstart/).
162-
163-
164-
### SparkMLLib in Streams
165-
166-
To get started, follow this development guide:
159+
To get started, see the [Streams Governance Quick Start Guide](https://ibmstreams.github.io/streamsx.documentation/docs/4.2/governance/governance-quickstart/).
167160

168-
* [SparkMLLib Getting Started Guide](https://developer.ibm.com/streamsdev/docs/getting-started-with-the-spark-mllib-toolkit/)
169161

170162
### Apache Edgent (aka Open Embedded Streams) Integration
171163

@@ -178,11 +170,8 @@ Gather local, real-time analytics from equipment, vehicles, systems, appliances,
178170
## Streams Community
179171
The following Streams resources can help you connect with the Streams community and get support when you need it:
180172

181-
* **[Streamsdev](https://developer.ibm.com/streamsdev/)** - This resource is a developer-to-developer website maintained by the Streams Development Team. It contains many useful articles and getting started material. Check back often for new articles, tips and best practices to this website.
182-
* **[Streams Forum](https://www.ibmdw.net/answers/questions/?community=streamsdev&sort=newest&refine=none)** - This forum enables you to ask, and get answers to your questions, related to IBM Streams. If you have questions, start here.
183-
* **[IBMStreams on GitHub](http://ibmstreams.github.io)** - Streams is shipped with many useful toolkits out of the box. IBMStreams on GitHub contains many open-source toolkits. For a list of available toolkits available on GitHub, see this web page: [IBMStreams GitHub Toolkits](https://developer.ibm.com/streamsdev/docs/github-projects-overview/).
184-
* **[IBM Streams Support](http://www.ibm.com/support/entry/portal/Overview/Software/Information_Management/InfoSphere_Streams)** - This website provides information about IBM Streams downloads, technical support tools, documentation, and other resources.
185-
* **[IBM Streams Product Site](http://www.ibm.com/analytics/us/en/technology/stream-computing/)** - This website provides a broad range of information and resources about Streams and related topics.
173+
* **[Streams Community](https://ibm.biz/streams-community)** - This resource is a developer-to-developer website maintained by the Streams Development Team. It contains many useful articles and getting started material.
174+
* **[IBMStreams on GitHub](http://ibmstreams.github.io)** - Streams is shipped with many useful toolkits out of the box. IBMStreams on GitHub contains many open-source toolkits.
186175

187176

188177
<!-- Modal -->

docs/python/1.6/python-appapi-devguide-4.md

Lines changed: 206 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ This section will discuss how to use the most common functions and transforms in
4040
- [Split the stream into dedicated streams](#split_func)
4141
* [Joining streams](#union)
4242
* [Sharing data between Streams applications](#publish)
43-
43+
* [Defining a stream's schema](#schema)
4444

4545

4646
<a id="intro"></a>
@@ -292,7 +292,7 @@ Reading from a file or using a file within your Streams application can be done
292292

293293
However, you must use `Topology.add_file_dependency` to ensure that the file or its containing directory will be available at runtime.
294294

295-
Note: If you are using **IBM Cloud Pak for Data** , this [post discusses how to use a data set in your Streams Topology](https://developer.ibm.com/streamsdev/2019/04/23/tip-for-ibm-cloud-private-for-data-how-to-use-local-data-sets-in-your-streams-python-notebook/).
295+
Note: If you are using **IBM Cloud Pak for Data** , this [post discusses how to use a data set in your Streams Topology](https://community.ibm.com/community/user/cloudpakfordata/viewdocument/how-to-use-local-files-in-a-streams?CommunityKey=c0c16ff2-10ef-4b50-ae4c-57d769937235&tab=librarydocuments).
296296

297297
~~~python
298298
topo = Topology("ReadFromFile")
@@ -2255,3 +2255,207 @@ The contents of your output file look something like this:
22552255
22562256
For more information, see [Publish-subscribe overview](https://streamsxtopology.readthedocs.io/en/stable/streamsx.topology.html#publish-subscribe-overview).
22572257
2258+
2259+
<a id="schema"></a>
2260+
## Defining a stream's schema
2261+
2262+
A stream represents an unbounded flow of tuples with a declared schema so that each tuple on the stream complies with the schema.
2263+
2264+
A stream's schema may be one of:
2265+
2266+
* **StreamsSchema** structured schema - a tuple is a sequence of attributes, and an attribute is a named value of a specific type.
2267+
* **Json** - a tuple is a JSON object.
2268+
* **String** - a tuple is a string.
2269+
* **Python** - a tuple is any Python object, effectively an untyped stream.
2270+
2271+
The application below uses the `Stream.map()` callable between a data source and data sink callable:
2272+
2273+
![Stream schema](../../../../images/python/stream_schema.png)
2274+
2275+
The diagram contains labels for `stream1`, `stream2` and `outputSchema` since they are used in the code block and table below. Each SPL operator output port and corresponding stream are defined by a schema. In a Python toplogy application the `CommonSchema.Python` is the default schema for Python operators.
2276+
2277+
In this sample the output schema is defined with the `schema` parameter of the `map()` function.
2278+
2279+
~~~python
2280+
outputSchema = CommonSchema.Python
2281+
stream2 = stream1.map(lambda t:t, schema=outputSchema)
2282+
~~~
2283+
2284+
The table below contains examples of the schema definition and the corresponding SPL schema that is generated by "streamsx.topology" when creating the application.
2285+
2286+
| Schema type | Schema in Python | Schema in generated SPL |
2287+
| ------------- | ------------- | ------------- |
2288+
| Python| `outputSchema = CommonSchema.Python` | ```tuple<blob __spl_po>``` |
2289+
| String | `outputSchema = CommonSchema.String` | ```tuple<rstring string>``` |
2290+
| Json| `outputSchema = CommonSchema.Json` | ```tuple<rstring jsonString>``` |
2291+
| StreamsSchema | `outputSchema = 'tuple<int64 intAttribute, rstring strAttribute>'` | ```tuple<int64 intAttribute, rstring strAttribute>``` |
2292+
2293+
So far in this *development guide*, we don't use schemas explicitly. But in a large application it is good design to define structured schema(s).
2294+
2295+
And in certain cases, you must have a schema different than `CommonSchema.Python`:
2296+
2297+
* when writing an application using different kinds of callables (Streams SPL operators), because the Python schema is not supported in SPL Java primitive and SPL C++ primitive operators.
2298+
* when using **publish** and **subscribe** between different applications (if one application is **not** using Python operators)
2299+
* when creating a job as service endpoint to consume/produce data via REST using **EndpointSink** or **EndpointSource** [streamsx.service](https://streamsxtopology.readthedocs.io/en/stable/streamsx.service.html)
2300+
2301+
### Structured Schema
2302+
2303+
Structured schema can be declared a number of ways:
2304+
2305+
* An instance of `typing.NamedTuple`
2306+
* An instance of `StreamSchema`
2307+
* A string of the format `tuple<...>` defining the attribute names and types.
2308+
* A string containing a namespace qualified SPL stream type (e.g. `com.ibm.streams.geospatial::FlightPathEncounterTypes.Observation3D`)
2309+
2310+
Structured schemas provide type-safety and efficient network serialization when compared to passing a dict using Python streams.
2311+
2312+
#### Topology.source()
2313+
2314+
* **No** support of explicit schema definition
2315+
* Generates **CommonSchema.Python** by **default**
2316+
* Use type hint at the "source" callable to generate a structured schema stream
2317+
2318+
In the sample below, the **type hint** `-> Iterable[SampleSourceSchema]` is added to the `__call__(self)` method in the class used as callable in your source.
2319+
The structured schema `SampleSourceSchema` is defined a named tuple.
2320+
2321+
~~~python
2322+
from streamsx.topology.topology import Topology
2323+
import streamsx.topology.context
2324+
from typing import Iterable, NamedTuple
2325+
import itertools, random
2326+
2327+
class SampleSourceSchema(NamedTuple):
2328+
id: str
2329+
num: int
2330+
2331+
# Callable of the Source
2332+
class SampleSource(object):
2333+
def __call__(self) -> Iterable[SampleSourceSchema]:
2334+
for num in itertools.count(1):
2335+
yield {"id": str(num), "num" : random.randint(0,num)}
2336+
2337+
topo = Topology("sample-source-structured-stream")
2338+
src = topo.source(SampleSource())
2339+
src.print()
2340+
streamsx.topology.context.submit("STANDALONE", topo)
2341+
~~~
2342+
2343+
#### Structured schema passing styles (dict vs. named tuple)
2344+
2345+
In the former example the source callable returned a *dict*. You can also return *named tuple* objects and in both cases the downstream callable tuples are passed in *named tuple* style.
2346+
2347+
~~~python
2348+
from streamsx.topology.topology import Topology
2349+
import streamsx.topology.context
2350+
from typing import Iterable, NamedTuple
2351+
import itertools, random
2352+
2353+
class SampleSourceSchema(NamedTuple):
2354+
id: str
2355+
num: int
2356+
2357+
# Callable of the Source
2358+
class SampleSource(object):
2359+
def __call__(self) -> Iterable[SampleSourceSchema]:
2360+
for num in itertools.count(1):
2361+
output_event = SampleSourceSchema(
2362+
id = str(num),
2363+
num = random.randint(0,num)
2364+
)
2365+
yield output_event
2366+
2367+
class SampleMapSchema(NamedTuple):
2368+
idx: str
2369+
number: int
2370+
2371+
def map_namedtuple_to_namedtuple(tpl) -> SampleMapSchema:
2372+
out = SampleMapSchema(
2373+
idx = 'x-' + tpl.id,
2374+
number = tpl.num + 1
2375+
)
2376+
return out
2377+
2378+
topo = Topology("sample-namedtuple-structured-stream1")
2379+
stream1 = topo.source(SampleSource())
2380+
stream2 = stream1.map(map_namedtuple_to_namedtuple)
2381+
stream2.print()
2382+
streamsx.topology.context.submit("STANDALONE", topo)
2383+
~~~
2384+
2385+
*Does a type hint replace the use of specifying the schema parameter when calling the map transform?*
2386+
2387+
If `schema` is set, then the return type is defined by the schema parameter. Otherwise if `schema` is not set then the return type hint on `func` define the schema of the returned stream, defaulting to `CommonSchema.Python` if no type hints are present.
2388+
2389+
Find below the same sample using *dict* style in "source" callable, but the type hint with *named tuple* schema causes that tuples are passed in *named tuple* style to map() callable.
2390+
2391+
~~~python
2392+
from streamsx.topology.topology import Topology
2393+
import streamsx.topology.context
2394+
from typing import Iterable, NamedTuple
2395+
import itertools, random
2396+
2397+
class SampleSourceSchema(NamedTuple):
2398+
id: str
2399+
num: int
2400+
2401+
# Callable of the Source
2402+
class SampleSource(object):
2403+
def __call__(self) -> Iterable[SampleSourceSchema]:
2404+
for num in itertools.count(1):
2405+
yield {"id": str(num), "num" : random.randint(0,num)}
2406+
2407+
class SampleMapSchema(NamedTuple):
2408+
idx: str
2409+
number: int
2410+
2411+
def map_namedtuple_to_namedtuple(tpl) -> SampleMapSchema:
2412+
out = SampleMapSchema(
2413+
idx = 'x-' + tpl.id,
2414+
number = tpl.num + 1
2415+
)
2416+
return out
2417+
2418+
topo = Topology("sample-namedtuple-structured-stream2")
2419+
stream1 = topo.source(SampleSource())
2420+
stream2 = stream1.map(map_namedtuple_to_namedtuple)
2421+
stream2.print()
2422+
streamsx.topology.context.submit("STANDALONE", topo)
2423+
~~~
2424+
2425+
2426+
The following samples uses a SPL operator [streamsx.standard.utility.Sequence](https://streamsxstandard.readthedocs.io/en/latest/generated/streamsx.standard.utility.html#streamsx.standard.utility.Sequence) generating a structured schema [streamsx.standard.utility.SEQUENCE_SCHEMA](https://streamsxstandard.readthedocs.io/en/latest/generated/streamsx.standard.utility.html#streamsx.standard.utility.SEQUENCE_SCHEMA)
2427+
Here you see the difference to the previous sample, that the tuples are passed to the Python callable in *dict* style (see `Delta()` class used in `streams1.map(Delta())`. Furthermore this sample demonstrates how to extend a structured schema with [streamsx.topology.schema.StreamSchema.extend](https://streamsxtopology.readthedocs.io/en/stable/streamsx.topology.schema.html#streamsx.topology.schema.StreamSchema.extend) function. In the `map()` callable the new attribute `d` is set.
2428+
2429+
~~~python
2430+
from streamsx.topology.topology import Topology
2431+
import streamsx.topology.context
2432+
from typing import Iterable, NamedTuple
2433+
import streamsx.standard.utility as U
2434+
from streamsx.topology.schema import StreamSchema
2435+
2436+
class Delta(object):
2437+
def __init__(self):
2438+
self._last = None
2439+
def __call__(self, v):
2440+
if v['seq'] == 0:
2441+
self._last = v['ts']
2442+
return None
2443+
else:
2444+
v['d'] = v['ts'].time() - self._last.time()
2445+
return v
2446+
2447+
topo = Topology("sample-dict-structured-stream")
2448+
stream1 = topo.source(U.Sequence(iterations=50, period=0.2)) # output schema: tuple<uint64 seq, timestamp ts>
2449+
E = U.SEQUENCE_SCHEMA.extend(StreamSchema('tuple<float64 d>'))
2450+
stream2 = stream1.map(Delta(), schema=E) # output schema: tuple<uint64 seq, timestamp ts, float64 d>
2451+
stream2.print()
2452+
streamsx.topology.context.submit("STANDALONE", topo)
2453+
~~~
2454+
2455+
Summary:
2456+
- Passing style (dict/named tuple) in your callable depends on the predecessor callable/operator.
2457+
- When **named tuple** schema is defined in predecessor callable/operator, then expect passing style named tuple in your Python callable.
2458+
- Use either **name tuple** schema or **StreamsSchema** between SPL operators and Python callables.
2459+
2460+
2461+

docs/python/1.6/python-appapi-devguide-6.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@ tag: py16
88
prev:
99
file: python-appapi-devguide-5
1010
title: "API features: Scalability, fault tolerance"
11+
next:
12+
file: python-appapi-devguide-7
13+
title: "Working with SPL toolkits"
1114
---
1215

1316
Depending on the problem at hand, a developer might choose to create an IBM Streams application in a particular programming language. To this end, the 'streamsx.topology' project supports APIs in Java, Scala, Python, and IBM Streams Processing Language (SPL). Regardless of the language used to develop and submit the application, however, it becomes necessary to monitor the application while it is running. By monitoring the application, you can observe runtime information regarding the application or its environment, for example:

0 commit comments

Comments
 (0)