Python / PySpark Developer

LOCATION: HYBRID: Work-From-Home / 1031 Bank St. Ottawa, ON

TERM: Full-Time, Permanent

event person Full Time map Hybrid

Overview

Under the direction of the Data Solutions Lead, the Python/PySpark Developer plays a critical role in designing, developing, and optimizing the company's data processing pipelines and systems, ensuring the efficiency, reliability, and scalability of their data infrastructure.

The Python/PySpark Developer contributes to the development and execution of data migration and integration processes between systems and creates comprehensive documentation to support data pipeline designs and development processes.

Leveraging their expertise in Python programming, particularly with the PySpark API, and hands-on experience with Spark, and cloud-based data platforms, the Python/PySpark Developer collaborates with cross-functional teams to translate business requirements into efficient data processing solutions.

The Python/PySpark Developer conducts performance tuning and optimization of Spark jobs, troubleshooting and resolving data processing and pipeline-related issues, and implementing ETL/ELT processes using PySpark for seamless data integration. Additionally, their proficiency in Azure data technologies, such as Azure Data Factory, Azure Synapse Analytics, and Azure Blob Storage, is vital in deploying innovative data solutions.

The Python/PySpark Developer possesses understanding of data modeling within distributed computing environments, and experience with data warehousing concepts. Moreover, familiarity with relational database concepts and basic T-SQL querying is beneficial for integrating with existing systems. Their strong problem-solving skills, attention to detail, and adherence to data governance and security principles ensure the quality, privacy, and compliance of the company's data.

Responsibilities

Data Engineering and Optimization:

  • Design, develop, and maintain complex PySpark applications and data pipelines for data processing, transformation, and analysis.
  • Conduct performance tuning and optimization of PySpark jobs to enhance processing speed and minimize resource consumption.
  • Troubleshoot and resolve data processing and pipeline-related issues, ensuring data quality and consistency.
  • Implement ETL/ELT processes using PySpark for seamless data integration and transformation from various sources.
  • Demonstrate proficiency in data modeling and schema design within a distributed computing environment.

Collaboration and Data Solutions:

  • Collaborate with data scientists, analysts, and other stakeholders to understand and translate business requirements into efficient data processing solutions.
  • Utilize expertise in Azure data technologies, including Microsoft Fabric, Azure SQL Server, Azure Data Factory, and Azure Blob Storage, to deploy robust data solutions.
  • Ensure compliance with data governance and data security principles to protect data privacy and maintain regulatory compliance.
  • Assist in the development and execution of data migration and integration processes between systems, including those involving relational databases.
  • Create and maintain comprehensive documentation for data pipelines, data models, and development processes.

Requirements

The requirements listed below are representative of the knowledge, skill and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.

Education and Experience:

  • Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience).
  • Proven work experience of a minimum of 5 years as a Python/PySpark Developer or similar role with hands-on experience with Spark, Databricks, and cloud-based data platforms.
  • Minimum 3 years of experience with Azure Data Factory and Azure Databricks.
  • Minimum 1 year experience working with relational databases and SQL, preferably T-SQL.
  • Familiarity with data warehousing concepts and methodologies is an asset.
  • Familiarity with Microsoft Fabric is an asset.

Technical Skills:

  • Strong proficiency in Python programming and extensive experience with the PySpark API.
  • Deep understanding of distributed computing concepts and Spark architecture.
  • Demonstrated experience in performance tuning and optimization of Spark jobs.
  • Experience in data warehousing, data modeling, and ETL/ELT processes.
  • Experience with data formats such as Parquet and JSON.
  • Understanding of relational database concepts and basic T-SQL querying.
  • Experience in quality control/auditing to ensure accurate and appropriate use of data for integration testing and user acceptance testing.

Soft Skills:

  • Solid problem-solving skills with a keen attention to detail.
  • Good interpersonal and communication skills to collaborate effectively with cross-functional teams.
  • Ability to work independently and handle multiple tasks with a sense of urgency.
  • Experience in quality control/auditing to ensure accurate and appropriate use of data for integration testing and user acceptance testing.

Other:

  • Must be legally entitled to work in Canada.

Compensation

  • Salary: $75,000 - $85,000
  • 5% Annual Performance Bonus
  • Health & Dental Benefits
  • Pension Plan
  • 3 Weeks Vacation
  • CAA Membership.

Employment is contingent on a successful Criminal Background Check and references.

Job applicants who have disabilities shall be provided with reasonable accommodation throughout the recruiting process.

Job openings

LOCATION: 5703 Hazeldean Rd. Unit 8, Stittsville, ON K2S 0P6

TERM: Full-Time, Permanent

LOCATION: HYBRID: Remote & 1031 Bank Street, Ottawa

TERM: Full-Time, Permanent

LOCATION: HYBRID: Work-From-Home / 1031 Bank St. Ottawa, ON

TERM: Full-Time, Permanent