DatabeanStalk documentation

Welcome to DatabeanStalk official documentation. DatabeanStalk is a fully collaborative notebook where you can run python, PySpark, Scala and R in your browser.

The DatabeanStalk data science and machine learning plateform support fully managed Apache Spark which enables data teams to work together and solve world's toughest data problems.

DatabeanStalk's managed spark on kubernetes where you can initialize PySpark, Spark and SparkSQL context and DatabeanStalk manage master and workers executors for you.

DatabeanStalk support data scientist, data engineer and data analyst to work on following use cases.

  • Run advance data analytics and machine learning complex problem at scale

  • Make data warehousing easier using python data-frame and spark sql.

  • Real-time detect threat analysis using spark streaming, data science libaries and Apache Airflow integration.

  • Analyze time-series data in realtime or update dashboard chart.

  • Scheduled based data extract translate and load using Apache Airflow integration

DatabeanStalk offer managed Apache Airflow, so user able to build, package and deploy machine learning jobs and also integrate any third party library with airflow.

Using Grafana dashboard, user able to monitor server resources and analyze application logs.

The following screenshot shows the DatabeanStalk landing page which has JupyterHub, Apache Airflow and Grafana. Using these technology many users able to solve world's complex data problems.

Last updated