- 2-3 years of experience with Big Data & Analytics solutions - Hadoop, MapReduce, Hive, Spark, Storm, Amazon Kinesis, AWS EMR, AWS Redshift, Lambda
- Strong design and programming skills
- Fluent with data structures and algorithms
- Strong understanding of OOPs concepts
- Hands-on experience with any of the following valuable, though not mandatory. You must be willing to learn the below technologies on a need basis.
- Modern programming languages like Python and C++.
- Spring Framework, Redis, RDBMS, NoSQL a plus.
- Hadoop, Hive, Oozie, Map Reduce, Spark, Sqoop, Kafka, Flume, etc.
- DevOps with experience building and deploying to CloudFoundry, Docker, Kubernetes
- Infrastructure automation technologies like Docker, Vagrant, etc.
- Build automation technologies like Gradle, Jenkins, etc.
- Building APIs and services using REST, SOAP, etc.
- Scripting languages like Perl, Shell, Groovy, etc.
Job Duties & Required Skills
- You will be responsible for delivering high-value next-generation products on aggressive deadlines and will be required to write high-quality, highly optimized/high-performance and maintainable code
- Work on distributed/big-data system to build, release and maintain an Always On scalable data processing and reporting platform
- Work on relational and NoSQL databases
- Build scalable architectures for data storage, transformation and analysis
- Experience with large scale Hadoop environments build and support including design, capacity planning, cluster set up, performance tuning and monitoring
- Strong understanding of Hadoop eco system such as HDFS, MapReduce, Hadoop streaming, Sqoop, oozie and Hive
- Managing available resources such as hardware, data, and personnel so that deadlines are met
Analysing the ML algorithms that could be used to solve a given problem and ranking them by their success probability
- Exploring and visualizing data to gain an understanding of it, then identifying differences in data distribution that could affect performance when deploying the model in the real world
- Verifying data quality, and/or ensuring it via data cleaning
- Supervising the data acquisition process if more data is needed
- Finding available datasets online that could be used for training
- Defining validation strategies
- Defining the pre-processing or feature engineering to be done on a given dataset
- Defining data augmentation pipelines
- Training models and tuning their hyperparameters
- Analysing the errors of the model and designing strategies to overcome them
- Deploying models to production
- Proficient in shell and Perl, python scripting
- Experience in setup, configuration and management of security for Hadoop clusters.