Packaging Guide

Bundling Streaming Connectors

Java / Scala

The below document outlines the structure if you are using mvn as your build tool. If you are using sbt or others, use the following as reference and update accordingly to generate the package structure.

The stream connectors are expected to be bundled as a Single JAR (Fat JAR). We recommend using some thing like maven-shade-plugin to build the final JAR file.

Additionally, we recommend using maven-assembly-plugin to bundle all required files and JAR into a single distribution.

Here is a sample using maven-shade-plugin and maven-assembly-plugin

pom.xml
<build>
    ...
    <plugins>
        ...
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>3.2.1</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <shadedArtifactAttached>false</shadedArtifactAttached>
                        <artifactSet>
                            <excludes>
                                <exclude>com.google.code.findbugs:jsr305</exclude>
                            </excludes>
                        </artifactSet>
                        <filters>
                            <filter>
                                <!-- Do not copy the signatures in the META-INF folder.
                                Otherwise, this might cause SecurityExceptions when using the JAR. -->
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                        <transformers>
                            <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>org.sunbird.obsrv.connector.ExampleSourceConnector</mainClass>
                            </transformer>
                            <!-- append default configs -->
                            <transformer
                                    implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                <resource>reference.conf</resource>
                            </transformer>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <executions>
                <execution>
                    <id>distro-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                    <configuration>
                        <descriptors>
                            <descriptor>src/main/assembly/src.xml</descriptor>
                        </descriptors>
                    </configuration>
                </execution>
            </executions>
        </plugin>
        ...
    </plugins>
</build>

And here is a sample of src/main/assembly/src.xml for building the distribution

Run mvn clean package to generate the distribution

The contents of the distribution should be in the following format

Bundling Batch Connectors

Java / Scala

The batch connectors are bundled as a standalone JAR file, where the dependent JARs are to be included in the libs folder in the bundle, by using the maven-assembly-plugin.

Here is a sample of including the maven-assembly-plugin to your build

And here is a sample of src/main/assembly/src.xml for building the distribution

Run mvn clean package to generate the distribution

The contents of the distribution should be in the following format

Python

Since we use PySpark for building the batch connectors in Python, we are required to include the dependent JARs as a part of the distribution.

Here are the steps if you are using poetry as a package manager for Python Connectors.

Add a build_dist.py script to the scripts folder with the following contents. Briefly the script does the following

  • It exports the python requirements to requirements.txt file, so that these can be installed on the runtime.

  • It downloads all the dependent JARs that are to be included in the package using mvn

The packaging instructions have to be included in the pyproject.toml file.

Add a pom.xml file to the root and specify the dependent JARs required by the connector

Run poetry run package to build the distribution that can be installed on Obsrv.

The contents of the distribution should be in the following format