Common Problems
This page lists a couple of common pitfalls that you may encounter in this guide as well as their solutions.
download-departures fails because of a Timeout
If you encounter an error that looks like this:
┌─────┐
│start│
└─┬─┬─┘
│ │
│ └────────────────────┐
│ │
v v
┌──────────────────────────────────────┐ ┌──────────────────────────────────────┐
│download-departures FAILED PT5.183334S│ │download-airports SUCCEEDED PT1.91309S│
└──────────────────────────────────────┘ └──────────────────────────────────────┘
[main]
Exception in thread "main" io.smartdatalake.util.dag.TaskFailedException: Task download-departures failed. Root cause is 'WebserviceException: Read timed out'
Since both web servers are freely available on the internet, they might be overloaded by traffic. If the download fails because of a timeout, either increase readTimeoutMs or wait a couple of minutes and try again. If the download still won't work (or if you just get empty files), you can copy the contents of the folder data-fallback-download into your data folder.
Configuration objects defined in multiple locations
When executing SDLB, you might get the following exception:
Exception in thread "main" io.smartdatalake.config.ConfigurationException:
Configuration parsing failed because of configuration objects defined in multiple locations:
Action~download-departures=HadoopConfigFile;
HadoopConfigFile DataObject~ext-departures=HadoopConfigFile;
HadoopConfigFile DataObject~stg-departures=HadoopConfigFile;
Note that we are starting SDLB in this getting started guide with the option --config /mnt/config
which means the whole directory.
SDLB will therefore read any .conf
file in this directory and attempt to parse it.
If you define an action in two different files, you will get this error as SDLB can not figure out,
which file takes precedence.
To solve the problem, either remove the .conf
file extension or move one of the files.
How to kill SDLB if it hangs
In case you run into issues when executing your pipeline and you want to terminate the process you can use this docker command to list the running containers:
docker ps
While your feed-execution is running, the output of this command will contain an execution with the image name sdl-spark:latest. Use the container id to stop the container by typing:
docker containter stop <container id>
Certificate of ourairports.com expired
If you get this error
Caused by: sun.security.validator.ValidatorException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
Caused by: java.security.cert.CertificateExpiredException: NotAfter: Wed Oct 09 20:03:30 UTC 2024
this means that the certificate of the website behind the URL https://ourairports.com/data/airports.csv has expired. We need this site for the dataobject ext-airports. If that is the case you can switch to the other url that is commented out, https://davidmegginson.github.io/ourairports-data/airports.csv"