v3.0.0
  • 12 Oct 2022
  • 1 Minute à lire
  • Sombre
    Lumière
  • PDF

v3.0.0

  • Sombre
    Lumière
  • PDF

The content is currently unavailable in French. You are viewing the default English version.
Résumé de l’article

Data Lake & Hive Tables v3.0.0

April 05th, 2021                                                                                              Version wv_3.0.0


The transactional and operational information generated on DL Freight™ is vital for Walmart Canada to create projections and data reports. Walmart Canada has a comprehensive data lake, an extensive reservoir of enterprise data stored across a cluster of commodity servers that run software such as the open-source Hadoop platform for distributed data analytics. This implementation transfers operational data from DLT systems, i.e., DL Freight, to Walmart Canada’s data lake. 

Requirements 

 1.     Analysis of data lake for compatibility  

2.     Data modeling for conceptual flow 

3.     Transfer of data without any transformation and alteration 

4.     Final data from DL Freight to be saved in appropriate Hive tables


Implementations in this release

  • The source system from DLT Labs was selected as Apache Cassandra, an open-source  NoSQL distributed database with highly available service and no single point of failure, containing data from the DL Freight application. The target system was Walmart Canada’s Data Lake - Walmart’s Enterprise Architecture team managed the access and permissions.
  • Data from the source was transferred using a data pipeline with low-latency and loss tolerating connections between applications on the internet.
  • A service was implemented to transfer data from source to target using scheduled jobs (ETL jobs).
  • The source data was modeled in an acceptable/standard format and transferred to Walmart Canada’s data zone,  as Hive tables in Hadoop.
  • Once data is populated on the Hive tables in raw zone, another job segregates the data based on the information mentioned below: 
    • Load information
    • Stop information
    • Accessorial information
    • Fourkites data
    • Ebill - Both - Carrier to Walmart Canada and Walmart Canada to Carrier.
  • The data is updated on the respective tables existing on Hadoop's staging zone upon the segregation.
  • Data from staging zone is transferred to target zone by the next job of the data pipeline.





For any questions, please email us at dlfreight.support@dltlabs.io


Cet article vous a-t-il été utile ?

What's Next
Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.