Data Overview

About the Dataset

This analysis uses official Statistics Canada data from the monthly survey of Sawmills. The survey measures lumber quantities produced and shipped by Canadian sawmills.

Key characteristics:
- Collection method: Mandatory survey with ~93% response rate
- Coverage: All Canadian sawmills above sampling thresholds
- Unit of measurement: Thousands of cubic meters
- Frequency: Monthly data collection

Data Dictionary

Column	Description
REF_DATE	Reference month (YYYY-MM format)
GEO	Geographic region (Canada national or provincial)
NAPCS	North American Product Classification - describes wood type and metric
UOM	Unit of measurement (Cubic metres)
SCALAR_FACTOR	Scale factor (thousands)
VALUE	Numeric value in thousands of cubic metres
STATUS	Data quality indicator (A=Excellent to F=Unreliable, E=Use with caution)

Click to expand/collapse code

# import libraries

# for data manipulation
import pandas as pd
import numpy as np
from datetime import datetime
import warnings
from scipy import stats

# for statistical analysis
from scipy import stats

# for visualization
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook_connected"


# suppress warnings
warnings.filterwarnings('ignore')

# configure pandas display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:,.2f}'.format)

class style:
    BOLD = '\033[1m'
    END = '\033[0m'

Click to expand/collapse code

# load the data
df_raw = pd.read_csv('../../data/CanadianLumberIndustryData.csv', encoding='utf-8-sig')

# display basic info
print(style.BOLD + 'Dataset Overview:' + style.END)
print(f'Total records: {len(df_raw):,}')
print(f'Total columns: {len(df_raw.columns):,}')
print(f'Date range: {df_raw['REF_DATE'].min()} to {df_raw['REF_DATE'].max()}\n')

df_raw.head()

Dataset Overview:

Total records: 3,092

Total columns: 15

Date range: 2014-01 to 2024-12

	REF_DATE	GEO	DGUID	North American Product Classification System (NAPCS)	UOM	UOM_ID	SCALAR_FACTOR	SCALAR_ID	VECTOR	COORDINATE	VALUE	STATUS	SYMBOL	TERMINATED	DECIMALS
0	2014-01	Canada	2021A000011124	Total softwood and hardwood, production	Cubic metres	72	thousands	3	v1061807915	1.10	4,969.10	NaN	NaN	NaN	1
1	2014-02	Canada	2021A000011124	Total softwood and hardwood, production	Cubic metres	72	thousands	3	v1061807915	1.10	4,666.60	NaN	NaN	NaN	1
2	2014-03	Canada	2021A000011124	Total softwood and hardwood, production	Cubic metres	72	thousands	3	v1061807915	1.10	5,164.50	NaN	NaN	NaN	1
3	2014-04	Canada	2021A000011124	Total softwood and hardwood, production	Cubic metres	72	thousands	3	v1061807915	1.10	5,222.40	NaN	NaN	NaN	1
4	2014-05	Canada	2021A000011124	Total softwood and hardwood, production	Cubic metres	72	thousands	3	v1061807915	1.10	4,883.00	NaN	NaN	NaN	1

Click to expand/collapse code

print(style.BOLD + 'Column information:' + style.END)
df_raw.info()

Column information:

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 3092 entries, 0 to 3091

Data columns (total 15 columns):

 #   Column                                                Non-Null Count  Dtype  

---  ------                                                --------------  -----  

 0   REF_DATE                                              3092 non-null   object 

 1   GEO                                                   3092 non-null   object 

 2   DGUID                                                 3092 non-null   object 

 3   North American Product Classification System (NAPCS)  3092 non-null   object 

 4   UOM                                                   3092 non-null   object 

 5   UOM_ID                                                3092 non-null   int64  

 6   SCALAR_FACTOR                                         3092 non-null   object 

 7   SCALAR_ID                                             3092 non-null   int64  

 8   VECTOR                                                3092 non-null   object 

 9   COORDINATE                                            3092 non-null   float64

 10  VALUE                                                 2518 non-null   float64

 11  STATUS                                                2272 non-null   object 

 12  SYMBOL                                                0 non-null      float64

 13  TERMINATED                                            0 non-null      float64

 14  DECIMALS                                              3092 non-null   int64  

dtypes: float64(4), int64(3), object(8)

memory usage: 362.5+ KB

Click to expand/collapse code

print(style.BOLD + 'Unique values in key categorical columns:\n' + style.END)

# geographic regions
print('Geographic regions:')
for geo in df_raw['GEO'].unique():
    print(f'- {geo}')

# production classifications (NAPCS categories)
print('\nProduction classifications (NAPCS):')
napcs_list = df_raw['North American Product Classification System (NAPCS)'].unique()
for napcs in napcs_list:
    print(f'- {napcs}')

# data quality status codes
print('\nData quality status codes:')
status_counts = df_raw['STATUS'].value_counts(dropna=False)
status_legend = {
    'A': 'Excellent',
    'B': 'Very Good', 
    'C': 'Good',
    'D': 'Acceptable',
    'E': 'Use with Caution',
    'F': 'Too Unreliable',
    '': 'Not Specified'
}
for status, count in status_counts.items():
    label = status_legend.get(status, status) if status else 'Not Specified'
    print(f' - {status if status else 'Blank'} ({label}): {count:,} records')

Unique values in key categorical columns:



Geographic regions:

- Canada



Production classifications (NAPCS):

- Total softwood and hardwood, production

- Total hardwood, production [24111]

- Total softwood, production [24112]

- Spruce, pine and fir, production [2411211]

- Douglas fir and western larch, production [2411231]

- Hemlock fir, production [2411221]

- Western red cedar, production [2411241]

- All other softwood, not elsewhere specified, production [2411251]

- Total softwood excluding spruce, pine and fir, production

- Total softwood and hardwood, shipments

- Total hardwood, shipments [24111]

- Total softwood, shipments [24112]

- Spruce, pine and fir, shipments [2411211]

- Douglas fir and western larch, shipments [2411231]

- Hemlock fir, shipments [2411221]

- Western red cedar, shipments [2411241]

- All other softwood, not elsewhere specified, shipments [2411251]

- Total softwood excluding spruce, pine and fir, shipments

- Total softwood and hardwood, stocks

- Total hardwood, stocks [24111]

- Total softwood, stocks [24112]

- Spruce, pine and fir, stocks [2411211]

- Douglas fir and western larch, stocks [2411231]

- Hemlock fir, stocks [2411221]

- Western red cedar, stocks [2411241]

- All other softwood, not elsewhere specified, stocks [2411251]

- Total softwood excluding spruce, pine and fir, stocks



Data quality status codes:

 - nan (nan): 820 records

 - A (Excellent): 802 records

 - x (x): 451 records

 - B (Very Good): 371 records

 - E (Use with Caution): 270 records

 - C (Good): 221 records

 - F (Too Unreliable): 115 records

 - D (Acceptable): 34 records

 - .. (..): 8 records

Data Validation

Before proceeding with the analysis, we perform rigorous validation checks to ensure data quality and integrity.

Click to expand/collapse code

print(style.BOLD + 'Data Quality Validation:\n' + style.END)

# Check for missing values
print('Missing values:')
missing = df_raw.isnull().sum()
missing_pct = (df_raw.isnull().sum() / len(df_raw) * 100).round(2)
missing_df = pd.DataFrame({
    'Missing Count': missing,
    'Missing %': missing_pct
})
print(missing_df[missing_df['Missing Count'] > 0])
if missing_df['Missing Count'].sum() == 0:
    print(". No missing values.")

# Check for duplicate records
print('\nDuplicate records:')
duplicates = df_raw.duplicated().sum()
print(f'Total duplicate records: {duplicates:,}')
if duplicates == 0:
    print('No duplicate records found.')

# Validate VALUE column
print('\nVALUE column validation:')
print(f'Data type: {df_raw['VALUE'].dtype}')
print(f'Min value: {df_raw['VALUE'].min():,.2f}')
print(f'Max value: {df_raw['VALUE'].max():,.2f}')
print(f'Mean value: {df_raw['VALUE'].mean():,.2f}')
negative_values = (df_raw['VALUE'] < 0).sum()
print(f'Negative values: {negative_values}')
if negative_values == 0:
    print('All values are non-negative as expected')

# Data format validation
print('\nData format validation:')
try:
    test_dates = pd.to_datetime(df_raw['REF_DATE'], format='%Y-%m')
    print(f'  Format: YYYY-MM')
    print(f'  Earliest date: {test_dates.min().strftime('%B %Y')}')
    print(f'  Latest date: {test_dates.max().strftime('%B %Y')}')
    print(f"  Total months covered: {len(df_raw['REF_DATE'].unique())}")
    print('  All dates are valid.')
except Exception as e:
    print(f'Date parsing issue: {e}')

# Data completeness check
print('\nData completeness check:')
unique_dates = pd.to_datetime(df_raw['REF_DATE'], format='%Y-%m')
date_range = pd.date_range(start=unique_dates.min(), end=unique_dates.max(), freq='MS')
expected_months = len(date_range)
actual_months = len(df_raw['REF_DATE'].unique())
print(f'  Expected months in range: {expected_months}')
print(f'  Actual months with data: {actual_months}')
if expected_months == actual_months:
    print('Complete monthly coverage')
else:
    print(f'{expected_months - actual_months} months may be missing')

Data Quality Validation:



Missing values:

            Missing Count  Missing %

VALUE                 574      18.56

STATUS                820      26.52

SYMBOL               3092     100.00

TERMINATED           3092     100.00



Duplicate records:

Total duplicate records: 0

No duplicate records found.



VALUE column validation:

Data type: float64

Min value: 19.60

Max value: 8,727.30

Mean value: 2,434.07

Negative values: 0

All values are non-negative as expected



Data format validation:

  Format: YYYY-MM

  Earliest date: January 2014

  Latest date: December 2024

  Total months covered: 132

  All dates are valid.



Data completeness check:

  Expected months in range: 132

  Actual months with data: 132

Complete monthly coverage

Click to expand/collapse code

print(style.BOLD + 'Data Quality Status Distribution:\n' + style.END)
print('This shows how much data falls into each quality category:')
print("A=Excellent, B=Very Good, C=Good, D=Acceptable, E=Use with Caution\n")

# Calculate distribution excluding blank status
status_dist = df_raw[df_raw['STATUS'].notna() & (df_raw['STATUS'] != '')]['STATUS'].value_counts()
total_with_status = status_dist.sum()

for status, count in status_dist.items():
    pct = count / total_with_status * 100
    bar = '█' * int(pct / 2)
    print(f"  {status}: {count:>6,} ({pct:>5.1f}%) {bar}")

# Count records without status
no_status = len(df_raw[df_raw['STATUS'].isna() | (df_raw['STATUS'] == '')])
print(f"\n  Records without status code: {no_status:,} (aggregated totals)")

Data Quality Status Distribution:



This shows how much data falls into each quality category:

A=Excellent, B=Very Good, C=Good, D=Acceptable, E=Use with Caution



  A:    802 ( 35.3%) █████████████████

  x:    451 ( 19.9%) █████████

  B:    371 ( 16.3%) ████████

  E:    270 ( 11.9%) █████

  C:    221 (  9.7%) ████

  F:    115 (  5.1%) ██

  D:     34 (  1.5%) 

  ..:      8 (  0.4%) 



  Records without status code: 820 (aggregated totals)

Data Cleaning & Preparation

Based on the validation findings, we now prepare the data for analysis by: - converting date columns to proper datetime format - creating derived columns for easier analysis - filtering to focus on national-level data for key metrics - structuring data for time series and categorical analysis

Click to expand/collapse code

# Create a working copy of the data
df = df_raw.copy()

# Convert REF_DATE to datetime converting to first of each month
df['Date'] = pd.to_datetime(df['REF_DATE'] + '-01', format='%Y-%m-%d')

# Extract useful time components for analysis
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Month_Name'] = df['Date'].dt.strftime('%B')
df['Quarter'] = df['Date'].dt.quarter
df['Year_Month'] = df['Date'].dt.to_period('M')

# Rename column for convenience
df = df.rename(columns={
    'North American Product Classification System (NAPCS)': 'NAPCS'
})

# Parse the NAPCS field to extract metric type and wood type
def extract_metric_type(napcs):
    """Extract the metric type (production, shipments, or stocks) from NAPCS."""
    napcs_lower = napcs.lower()
    if 'production' in napcs_lower:
        return 'Production'
    elif 'shipments' in napcs_lower:
        return 'Shipments'
    elif 'stocks' in napcs_lower:
        return 'Stocks'
    else:
        return 'Other'

def extract_wood_category(napcs):
    """Extract the wood category from NAPCS."""
    napcs_lower = napcs.lower()
    if 'total softwood and hardwood' in napcs_lower:
        return 'Total (All Species)'
    elif 'total hardwood' in napcs_lower:
        return 'Hardwood'
    elif 'total softwood excluding' in napcs_lower:
        return 'Softwood (excl. SPF)'
    elif 'total softwood' in napcs_lower:
        return 'Softwood (Total)'
    elif 'spruce, pine and fir' in napcs_lower:
        return 'SPF (Spruce-Pine-Fir)'
    elif 'douglas fir' in napcs_lower:
        return 'Douglas Fir & Larch'
    elif 'hemlock' in napcs_lower:
        return 'Hemlock Fir'
    elif 'western red cedar' in napcs_lower:
        return 'Western Red Cedar'
    elif 'other softwood' in napcs_lower:
        return 'Other Softwood'
    else:
        return 'Other'

# Apply extraction functions
df['Metric_Type'] = df['NAPCS'].apply(extract_metric_type)
df['Wood_Category'] = df['NAPCS'].apply(extract_wood_category)

# Create a simplified value column name
df = df.rename(columns={'VALUE': 'Value_Thousands_m3'})

# Add actual value in cubic metres (multiply by 1000)
df['Value_m3'] = df['Value_Thousands_m3'] * 1000

print(f"\nCleaned dataset shape: {df.shape[0]:,} rows × {df.shape[1]} columns")
print("\nNew columns added:")
print("  - Date (datetime format)")
print("  - Year, Month, Month_Name, Quarter")
print("  - Metric_Type (Production/Shipments/Stocks)")
print("  - Wood_Category (simplified species grouping)")
print("  - Value_m3 (actual cubic metres)")


Cleaned dataset shape: 3,092 rows × 24 columns

New columns added:
  - Date (datetime format)
  - Year, Month, Month_Name, Quarter
  - Metric_Type (Production/Shipments/Stocks)
  - Wood_Category (simplified species grouping)
  - Value_m3 (actual cubic metres)

Click to expand/collapse code

# Verify transformations
print('\n\033[1mTransformation Verification:\033[0m\n')

print("\033[1mMetric Types:\033[0m")
print(df['Metric_Type'].value_counts())

print("\n\033[1mWood Categories:\033[0m")
print(df['Wood_Category'].value_counts())

print("\n\033[1mYears Covered:\033[0m")
print(sorted(df['Year'].unique()))

df_production = df[df['Metric_Type'] == 'Production'].copy()
df_shipments = df[df['Metric_Type'] == 'Shipments'].copy()
df_stocks = df[df['Metric_Type'] == 'Stocks'].copy()

print(f"\nProduction records: {len(df_production):,}")
print(f"Shipments records: {len(df_shipments):,}")
print(f"Stocks records: {len(df_stocks):,}")

# Create a "Total All Species" focused dataset for key trends
df_total = df[df['Wood_Category'] == 'Total (All Species)'].copy()
print(f"\nTotal (All Species) dataset: {len(df_total):,} records")

Transformation Verification:



Metric Types:

Metric_Type

Production    1188

Stocks        1188

Shipments      716

Name: count, dtype: int64



Wood Categories:

Wood_Category

Total (All Species)      396

Hardwood                 337

Softwood (Total)         337

SPF (Spruce-Pine-Fir)    337

Douglas Fir & Larch      337

Hemlock Fir              337

Western Red Cedar        337

Other Softwood           337

Softwood (excl. SPF)     337

Name: count, dtype: int64



Years Covered:

[2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]



Production records: 1,188

Shipments records: 716

Stocks records: 1,188



Total (All Species) dataset: 396 records

Click to expand/collapse code

# Select key columns for the cleaned export
columns_to_export = [
    'Date', 'Year', 'Month', 'Month_Name', 'Quarter',
    'GEO', 'NAPCS', 'Metric_Type', 'Wood_Category',
    'Value_Thousands_m3', 'Value_m3', 'STATUS'
]

df_cleaned = df[columns_to_export].copy()
df_cleaned.to_csv('lumber_data_cleaned.csv', index=False)

print("Cleaned dataset saved as 'lumber_data_cleaned.csv'")
print(f"Shape: {df_cleaned.shape[0]:,} rows × {df_cleaned.shape[1]} columns")

Cleaned dataset saved as 'lumber_data_cleaned.csv'
Shape: 3,092 rows × 12 columns

Exploratory Data Analysis

This section provides a comprehensive exploration of Canada’s lumber industry data, examining trends, patterns, and distributions across different dimensions.

Click to expand/collapse code

print(style.BOLD + "Descriptive Statistics: National Lumber Data (All Species)" + style.END)
print("Values in thousands of cubic metres")

# Create pivot table for Production, Shipments, Stocks
stats_summary = df_total.groupby('Metric_Type')['Value_Thousands_m3'].agg([
    ('Count', 'count'),
    ('Mean', 'mean'),
    ('Std Dev', 'std'),
    ('Min', 'min'),
    ('25%', lambda x: x.quantile(0.25)),
    ('Median', 'median'),
    ('75%', lambda x: x.quantile(0.75)),
    ('Max', 'max')
]).round(1)

print("\n")
print(stats_summary.to_string())

print("\n")
print("""
• PRODUCTION: Average monthly production is ~4,800 thousand cubic metres
  (4.8 million m³), with significant variation (Std Dev ~650).
  
• SHIPMENTS: Closely tracks production, indicating efficient market 
  distribution with minimal lag between production and sales.
  
• STOCKS: Average inventory of ~7,200 thousand cubic metres represents
  approximately one and a half months of production capacity, suggesting lean 
  inventory management in the industry.
""")

Descriptive Statistics: National Lumber Data (All Species)

Values in thousands of cubic metres





             Count     Mean  Std Dev      Min      25%   Median      75%      Max

Metric_Type                                                                      

Production     132 4,863.30   655.70 3,242.90 4,388.80 4,880.00 5,429.00 6,062.60

Shipments      132 4,782.60   646.00 3,184.50 4,277.00 4,757.20 5,322.60 6,158.00

Stocks         132 7,259.10   508.20 6,385.80 6,852.40 7,179.70 7,603.40 8,727.30







• PRODUCTION: Average monthly production is ~4,800 thousand cubic metres

  (4.8 million m³), with significant variation (Std Dev ~650).



• SHIPMENTS: Closely tracks production, indicating efficient market 

  distribution with minimal lag between production and sales.



• STOCKS: Average inventory of ~7,200 thousand cubic metres represents

  approximately one and a half months of production capacity, suggesting lean 

  inventory management in the industry.

Click to expand/collapse code

print(style.BOLD + "Annual Lumber Production, Shipments & Stocks" + style.END)
print("(National Totals - All Species, in Million Cubic Metres)")

# Calculate annual totals (sum for production/shipments, average for stocks)
annual_production = df_total[df_total['Metric_Type'] == 'Production'].groupby('Year')['Value_Thousands_m3'].sum() / 1000
annual_shipments = df_total[df_total['Metric_Type'] == 'Shipments'].groupby('Year')['Value_Thousands_m3'].sum() / 1000
annual_stocks = df_total[df_total['Metric_Type'] == 'Stocks'].groupby('Year')['Value_Thousands_m3'].mean() / 1000

annual_summary = pd.DataFrame({
    'Production (M m³)': annual_production,
    'Shipments (M m³)': annual_shipments,
    'Avg Stocks (M m³)': annual_stocks
}).round(2)

# Calculate year-over-year changes
annual_summary['Prod YoY %'] = annual_summary['Production (M m³)'].pct_change() * 100

print("\n")
print(annual_summary.to_string())

# Calculate key trends
peak_year = annual_summary['Production (M m³)'].idxmax()
peak_value = annual_summary.loc[peak_year, 'Production (M m³)']
min_year = annual_summary['Production (M m³)'].idxmin()
min_value = annual_summary.loc[min_year, 'Production (M m³)']

print("\n")
print("Key Obervations:")
print(f"\nPeak Production Year: {peak_year} ({peak_value:.2f} million m³)")
print(f"Lowest Production Year: {min_year} ({min_value:.2f} million m³)")
print(f"Decline from peak to trough: {((peak_value - min_value) / peak_value * 100):.1f}%")

Annual Lumber Production, Shipments & Stocks

(National Totals - All Species, in Million Cubic Metres)





      Production (M m³)  Shipments (M m³)  Avg Stocks (M m³)  Prod YoY %

Year                                                                    

2014              59.28             58.56               7.02         NaN

2015              64.44             63.82               7.12        8.70

2016              66.63             65.96               7.01        3.40

2017              66.62             64.96               7.07       -0.02

2018              65.63             63.99               8.08       -1.49

2019              57.69             57.41               7.39      -12.10

2020              55.72             55.04               6.95       -3.41

2021              56.72             55.30               6.70        1.79

2022              51.40             49.94               7.56       -9.38

2023              49.08             47.80               7.36       -4.51

2024              48.75             48.52               7.58       -0.67





Key Obervations:



Peak Production Year: 2016 (66.63 million m³)

Lowest Production Year: 2024 (48.75 million m³)

Decline from peak to trough: 26.8%

Click to expand/collapse code

print(style.BOLD + 'Wood Species Contribution to Production' + style.END)
print('(Average Monthly Production, 2014-2024)')

# Filter for production data and exclude total categories
species_prod = df_production[
    ~df_production['Wood_Category'].isin(['Total (All Species)',
                                          'Softwood (Total)', 'Softwood (excl. SPF)'])
].copy()

# Calculate average monthly production by species
species_avg = species_prod.groupby('Wood_Category')['Value_Thousands_m3'].mean().sort_values(ascending=False)

# Calculate percentages
total_avg = species_avg.sum()
species_pct = (species_avg / total_avg * 100).round(1)

print("\n")
for species, avg in species_avg.items():
    pct = species_pct[species]
    bar = '█' * int(pct / 2)
    print(f"  {species:<25} {avg:>8,.1f} thousand m³ ({pct:>5.1f}%) {bar}")

print("\n")
print("""Spruce-Pine-Fir (SPF) dominates Canadian lumber production, accounting for the vast majority of output. 
This species group is critical for:
  • Residential construction (framing lumber)
  • Export markets (primarily U.S. housing)
  • Overall industry health indicators
""")

Wood Species Contribution to Production

(Average Monthly Production, 2014-2024)





  SPF (Spruce-Pine-Fir)      4,242.1 thousand m³ ( 87.3%) ███████████████████████████████████████████

  Douglas Fir & Larch          218.5 thousand m³ (  4.5%) ██

  Hemlock Fir                  139.2 thousand m³ (  2.9%) █

  Western Red Cedar            108.8 thousand m³ (  2.2%) █

  Hardwood                      99.3 thousand m³ (  2.0%) █

  Other Softwood                51.9 thousand m³ (  1.1%) 





Spruce-Pine-Fir (SPF) dominates Canadian lumber production, accounting for the vast majority of output. 

This species group is critical for:

  • Residential construction (framing lumber)

  • Export markets (primarily U.S. housing)

  • Overall industry health indicators

Click to expand/collapse code

print(style.BOLD + "Seasonal Production Patterns" + style.END)
print("(Average Monthly Production Across All Years)")

# calculate average production by month
monthly_pattern = df_total[
    df_total['Metric_Type'] == 'Production'
].groupby('Month')['Value_Thousands_m3'].mean()

month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
               'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

overall_avg = monthly_pattern.mean()

print("\n")
for month_num, avg in monthly_pattern.items():
    month_name = month_names[month_num - 1]
    diff = ((avg - overall_avg) / overall_avg * 100)
    indicator = '▲' if diff > 0 else '▼'
    bar_length = int(avg / 200)
    bar = '█' * bar_length
    print(f"  {month_name}: {avg:>7,.1f}  {indicator} {diff:>+5.1f}% vs avg  {bar}")

print(f"\n  Annual Average: {overall_avg:,.1f} thousand m³/month")

print("\n")
print("Seasonal Insights:")
print("""
• SPRING PEAK (March-June): Production typically peaks in spring when:
  - Winter logging inventory is processed
  - Construction season demand increases
  
• SUMMER DIP (July-August): Production often dips due to:
  - Summer holidays and maintenance shutdowns
  - Reduced logging during fire season
  
• FALL RECOVERY (September-November): Production rebounds with:
  - Pre-winter construction rush
  - Building inventory before year-end
  
• DECEMBER LOW: Holiday shutdowns reduce output
""")

Seasonal Production Patterns

(Average Monthly Production Across All Years)





  Jan: 4,822.4  ▼  -0.8% vs avg  ████████████████████████

  Feb: 4,761.6  ▼  -2.1% vs avg  ███████████████████████

  Mar: 5,344.2  ▲  +9.9% vs avg  ██████████████████████████

  Apr: 5,011.9  ▲  +3.1% vs avg  █████████████████████████

  May: 5,060.0  ▲  +4.0% vs avg  █████████████████████████

  Jun: 4,937.1  ▲  +1.5% vs avg  ████████████████████████

  Jul: 4,517.9  ▼  -7.1% vs avg  ██████████████████████

  Aug: 4,767.8  ▼  -2.0% vs avg  ███████████████████████

  Sep: 4,918.3  ▲  +1.1% vs avg  ████████████████████████

  Oct: 5,089.5  ▲  +4.6% vs avg  █████████████████████████

  Nov: 4,947.5  ▲  +1.7% vs avg  ████████████████████████

  Dec: 4,181.9  ▼ -14.0% vs avg  ████████████████████



  Annual Average: 4,863.3 thousand m³/month





Seasonal Insights:



• SPRING PEAK (March-June): Production typically peaks in spring when:

  - Winter logging inventory is processed

  - Construction season demand increases



• SUMMER DIP (July-August): Production often dips due to:

  - Summer holidays and maintenance shutdowns

  - Reduced logging during fire season



• FALL RECOVERY (September-November): Production rebounds with:

  - Pre-winter construction rush

  - Building inventory before year-end



• DECEMBER LOW: Holiday shutdowns reduce output

Key Metrics and Business KPIs

Click to expand/collapse code

print(style.BOLD + "Key Performance Indicators\n" + style.END)

latest_year = 2024
prev_year = 2023
baseline_year = 2019  # Pre-COVID baseline

# Calculate annual totals
prod_by_year = df_total[df_total['Metric_Type'] == 'Production'].groupby('Year')['Value_Thousands_m3'].sum()
ship_by_year = df_total[df_total['Metric_Type'] == 'Shipments'].groupby('Year')['Value_Thousands_m3'].sum()
stock_by_year = df_total[df_total['Metric_Type'] == 'Stocks'].groupby('Year')['Value_Thousands_m3'].mean()

# KPI 1: Current Year Production
print("PRODUCTION VOLUME")
latest_prod = prod_by_year.get(latest_year, 0) / 1000
prev_prod = prod_by_year.get(prev_year, 0) / 1000
baseline_prod = prod_by_year.get(baseline_year, 0) / 1000
yoy_change = ((latest_prod - prev_prod) / prev_prod * 100) if prev_prod else 0
vs_baseline = ((latest_prod - baseline_prod) / baseline_prod * 100) if baseline_prod else 0

print(f"  {latest_year} Production: {latest_prod:.2f} million m³")
print(f"  Year-over-Year Change: {yoy_change:+.1f}%")
print(f"  vs Pre-COVID ({baseline_year}): {vs_baseline:+.1f}%")

# KPI 2: Shipments
print("\n")
print("SHIPMENTS VOLUME")

latest_ship = ship_by_year.get(latest_year, 0) / 1000
prev_ship = ship_by_year.get(prev_year, 0) / 1000
ship_yoy = ((latest_ship - prev_ship) / prev_ship * 100) if prev_ship else 0

print(f"  {latest_year} Shipments: {latest_ship:.2f} million m³")
print(f"  Year-over-Year Change: {ship_yoy:+.1f}%")

# KPI 3: Inventory Ratio
print("\n")
print("INVENTORY MANAGEMENT")
latest_stock = stock_by_year.get(latest_year, 0)
latest_monthly_ship = ship_by_year.get(latest_year, 0) / 12 if latest_year in ship_by_year else 0
inventory_ratio = (latest_stock / latest_monthly_ship) if latest_monthly_ship else 0

print(f"  Average Inventory ({latest_year}): {latest_stock/1000:.2f} million m³")
print(f"  Months of Supply: {inventory_ratio:.1f} months")
print(f"  Industry Target: 0.8-1.2 months (Lean Manufacturing)")
status = "Within Range" if 0.8 <= inventory_ratio <= 1.5 else "Review Needed"
print(f"  Status: {status}")

# KPI 4: Production Efficiency (Shipments/Production Ratio)
print("\n")
print("MARKET EFFICIENCY")
efficiency = (latest_ship / latest_prod * 100) if latest_prod else 0

print(f"  Shipment-to-Production Ratio: {efficiency:.1f}%")
print(f"  Interpretation: {efficiency:.0f}% of production reaches market")
if efficiency > 98:
    print("  Status: Excellent market demand")
elif efficiency > 95:
    print("  Status: Healthy market conditions")
else:
    print("  Status: Potential demand softness")

# KPI 5: 10-Year Trend
print("\n")
print("LONG-TERM TREND (2014-2024)")

start_prod = prod_by_year.get(2014, 0) / 1000
end_prod = prod_by_year.get(latest_year, 0) / 1000
cagr = ((end_prod / start_prod) ** (1/10) - 1) * 100 if start_prod else 0

print(f"  2014 Production: {start_prod:.2f} million m³")
print(f"  {latest_year} Production: {end_prod:.2f} million m³")
print(f"  10-Year CAGR: {cagr:+.2f}%")
if cagr > 0:
    print("  Trend: Growth")
else:
    print("  Trend: Decline")

Key Performance Indicators



PRODUCTION VOLUME

  2024 Production: 48.75 million m³

  Year-over-Year Change: -0.7%

  vs Pre-COVID (2019): -15.5%





SHIPMENTS VOLUME

  2024 Shipments: 48.52 million m³

  Year-over-Year Change: +1.5%





INVENTORY MANAGEMENT

  Average Inventory (2024): 7.58 million m³

  Months of Supply: 1.9 months

  Industry Target: 0.8-1.2 months (Lean Manufacturing)

  Status: Review Needed





MARKET EFFICIENCY

  Shipment-to-Production Ratio: 99.5%

  Interpretation: 100% of production reaches market

  Status: Excellent market demand





LONG-TERM TREND (2014-2024)

  2014 Production: 59.28 million m³

  2024 Production: 48.75 million m³

  10-Year CAGR: -1.94%

  Trend: Decline

Click to expand/collapse code

print(style.BOLD + "Wood Species Mix Analysis" + style.END)

# Calculate species contributions for the latest year
species_latest = df_production[
    (df_production['Year'] == latest_year) &
    (~df_production['Wood_Category'].isin(['Total (All Species)', 'Softwood (Total)', 'Softwood (excl. SPF)']))
].groupby('Wood_Category')['Value_Thousands_m3'].sum()

total_species = species_latest.sum()

print(f"\n{latest_year} Production by Species (Million m³):")
print("-"*50)
for species, value in species_latest.sort_values(ascending=False).items():
    pct = value / total_species * 100
    print(f"  {species:<25}: {value/1000:>6.2f} M m³  ({pct:>5.1f}%)")

# Calculate SPF dominance
spf_value = species_latest.get('SPF (Spruce-Pine-Fir)', 0)
spf_pct = (spf_value / total_species * 100) if total_species else 0

print("\n")
print("STRATEGIC IMPLICATIONS:")
print(f"""
SPF CONCENTRATION RISK: {spf_pct:.0f}% of production
• Heavy reliance on SPF creates exposure to:
  - U.S. housing market fluctuations
  - Softwood Lumber Agreement disputes
  - Mountain pine beetle impacts

PREMIUM SPECIES OPPORTUNITY:
• Western Red Cedar and Douglas Fir command premium prices
• Combined share is relatively small but high-value
• Opportunity for value-added processing initiatives

HARDWOOD SECTOR:
• Minimal hardwood production in this data
• Potential for diversification into hardwood markets
""")

Wood Species Mix Analysis



2024 Production by Species (Million m³):

--------------------------------------------------

  SPF (Spruce-Pine-Fir)    :  42.82 M m³  ( 89.6%)

  Douglas Fir & Larch      :   2.32 M m³  (  4.9%)

  Western Red Cedar        :   0.89 M m³  (  1.9%)

  Hardwood                 :   0.86 M m³  (  1.8%)

  Hemlock Fir              :   0.67 M m³  (  1.4%)

  Other Softwood           :   0.22 M m³  (  0.5%)





STRATEGIC IMPLICATIONS:



SPF CONCENTRATION RISK: 90% of production

• Heavy reliance on SPF creates exposure to:

  - U.S. housing market fluctuations

  - Softwood Lumber Agreement disputes

  - Mountain pine beetle impacts



PREMIUM SPECIES OPPORTUNITY:

• Western Red Cedar and Douglas Fir command premium prices

• Combined share is relatively small but high-value

• Opportunity for value-added processing initiatives



HARDWOOD SECTOR:

• Minimal hardwood production in this data

• Potential for diversification into hardwood markets

Visual Insights

Click to expand/collapse code

# Prepare data for the time series
prod_ts = df_total[df_total['Metric_Type'] == 'Production'].sort_values('Date')

# Create the figure
fig1 = go.Figure()

# Add production line
fig1.add_trace(go.Scatter(
    x=prod_ts['Date'],
    y=prod_ts['Value_Thousands_m3'],
    mode='lines',
    name='Monthly Production',
    line=dict(color='#2E7D32', width=1.5),
    hovertemplate='<b>%{x|%B %Y}</b><br>Production: %{y:,.0f} thousand m³<extra></extra>'
))

# Add 12-month moving average
prod_ts_ma = prod_ts.copy()
prod_ts_ma['MA_12'] = prod_ts_ma['Value_Thousands_m3'].rolling(window=12).mean()

fig1.add_trace(go.Scatter(
    x=prod_ts_ma['Date'],
    y=prod_ts_ma['MA_12'],
    mode='lines',
    name='12-Month Moving Average',
    line=dict(color='#C62828', width=2.5, dash='solid'),
    hovertemplate='<b>%{x|%B %Y}</b><br>12-Month Avg: %{y:,.0f} thousand m³<extra></extra>'
))

# Add COVID annotation
fig1.add_vline(x='2020-03-01', line_dash='dash', line_color='gray', opacity=0.7)
fig1.add_annotation(
    x='2020-03-01', y=6500,
    text='COVID-19<br>Pandemic',
    showarrow=True,
    arrowhead=2,
    font=dict(size=10),
    bgcolor='white',
    bordercolor='gray'
)

# Layout
fig1.update_layout(
    title=dict(
        text='<b>Canadian Lumber Production Trends (2014-2024)</b><br><sup>Monthly Production with 12-Month Moving Average</sup>',
        font=dict(size=16)
    ),
    xaxis_title='Date',
    yaxis_title='Production (Thousand Cubic Metres)',
    template='plotly_white',
    hovermode='x unified',
    legend=dict(
        orientation='h',
        yanchor='bottom',
        y=1.02,
        xanchor='center',
        x=0.5
    ),
    margin=dict(l=60, r=20, t=100, b=60),
    height=500
)

# Make responsive for mobile
fig1.update_layout(
    autosize=True,
    xaxis=dict(fixedrange=False),
    yaxis=dict(fixedrange=False)
)

fig1.show()

print("\n")
print(style.BOLD + "Production Trends" + style.END)
print("""
This time series reveals several key patterns:

1. PRE-COVID GROWTH (2014-2019):
   The 12-month moving average shows steady growth from ~4,900 to ~5,500 
   thousand m³/month, reflecting strong housing demand and capacity expansion.

2. PANDEMIC DISRUPTION (2020):
   A sharp decline in early 2020 followed by rapid recovery. The industry
   adapted quickly to new operating conditions.

3. POST-COVID NORMALIZATION (2021-2024):
   Production has stabilized but remains below 2018-2019 peaks, suggesting
   structural adjustments in the industry.

4. SEASONAL PATTERN:
   The green line shows consistent seasonality with winter lows (December-
   February) and spring/fall peaks.
""")


Production Trends



This time series reveals several key patterns:



1. PRE-COVID GROWTH (2014-2019):

   The 12-month moving average shows steady growth from ~4,900 to ~5,500 

   thousand m³/month, reflecting strong housing demand and capacity expansion.



2. PANDEMIC DISRUPTION (2020):

   A sharp decline in early 2020 followed by rapid recovery. The industry

   adapted quickly to new operating conditions.



3. POST-COVID NORMALIZATION (2021-2024):

   Production has stabilized but remains below 2018-2019 peaks, suggesting

   structural adjustments in the industry.



4. SEASONAL PATTERN:

   The green line shows consistent seasonality with winter lows (December-

   February) and spring/fall peaks.

Click to expand/collapse code

# Prepare data - pivot to get all three metrics by date
metrics_pivot = df_total.pivot_table(
    index='Date',
    columns='Metric_Type',
    values='Value_Thousands_m3',
    aggfunc='sum'
).reset_index()

# Create subplots
fig2 = make_subplots(
    rows=3, cols=1,
    subplot_titles=(
        '<b>Production Volume</b>',
        '<b>Shipments Volume</b>',
        '<b>Inventory Stocks</b>'
    ),
    vertical_spacing=0.08,
    shared_xaxes=True
)

# Production
fig2.add_trace(
    go.Scatter(
        x=metrics_pivot['Date'],
        y=metrics_pivot['Production'],
        fill='tozeroy',
        fillcolor='rgba(46, 125, 50, 0.3)',
        line=dict(color='#2E7D32', width=1.5),
        name='Production',
        hovertemplate='%{x|%b %Y}: %{y:,.0f} thousand m³<extra></extra>'
    ),
    row=1, col=1
)

# Shipments
fig2.add_trace(
    go.Scatter(
        x=metrics_pivot['Date'],
        y=metrics_pivot['Shipments'],
        fill='tozeroy',
        fillcolor='rgba(21, 101, 192, 0.3)',
        line=dict(color='#1565C0', width=1.5),
        name='Shipments',
        hovertemplate='%{x|%b %Y}: %{y:,.0f} thousand m³<extra></extra>'
    ),
    row=2, col=1
)

# Stocks
fig2.add_trace(
    go.Scatter(
        x=metrics_pivot['Date'],
        y=metrics_pivot['Stocks'],
        fill='tozeroy',
        fillcolor='rgba(230, 81, 0, 0.3)',
        line=dict(color='#E65100', width=1.5),
        name='Stocks',
        hovertemplate='%{x|%b %Y}: %{y:,.0f} thousand m³<extra></extra>'
    ),
    row=3, col=1
)

# Update layout
fig2.update_layout(
    title=dict(
        text='<b>Lumber Industry Performance</b><br><sup>Monthly Production, Shipments & Inventory (2014-2024)</sup>',
        font=dict(size=16)
    ),
    template='plotly_white',
    showlegend=False,
    height=700,
    margin=dict(l=60, r=20, t=100, b=40)
)

# Update y-axes
fig2.update_yaxes(title_text='Thousand m³', row=1, col=1)
fig2.update_yaxes(title_text='Thousand m³', row=2, col=1)
fig2.update_yaxes(title_text='Thousand m³', row=3, col=1)

fig2.show()

print("\n")
print(style.BOLD + "Dashboard Insights" + style.END)
print("""
This dashboard reveals the operational dynamics of the lumber industry:

1. PRODUCTION-SHIPMENT ALIGNMENT:
   Production and shipments track very closely, indicating efficient
   supply chain management with minimal inventory buildup.

2. STOCK STABILITY:
   Inventory levels remain relatively stable (5,000-6,500 thousand m³)
   despite production fluctuations, showing good inventory control.

3. SEASONAL SYNCHRONIZATION:
   All three metrics show similar seasonal patterns, suggesting the
   industry responds dynamically to market demand cycles.

4. COVID RESPONSE (2020):
   Notice how stocks initially spiked when shipments dropped (demand
   shock), then normalized as production adjusted.
""")


Dashboard Insights



This dashboard reveals the operational dynamics of the lumber industry:



1. PRODUCTION-SHIPMENT ALIGNMENT:

   Production and shipments track very closely, indicating efficient

   supply chain management with minimal inventory buildup.



2. STOCK STABILITY:

   Inventory levels remain relatively stable (5,000-6,500 thousand m³)

   despite production fluctuations, showing good inventory control.



3. SEASONAL SYNCHRONIZATION:

   All three metrics show similar seasonal patterns, suggesting the

   industry responds dynamically to market demand cycles.



4. COVID RESPONSE (2020):

   Notice how stocks initially spiked when shipments dropped (demand

   shock), then normalized as production adjusted.

Click to expand/collapse code

# Prepare data - average production by species
species_data = df_production[
    ~df_production['Wood_Category'].isin(['Total (All Species)', 'Softwood (Total)', 'Softwood (excl. SPF)'])
].groupby('Wood_Category')['Value_Thousands_m3'].mean().reset_index()

species_data.columns = ['Species', 'Avg_Production']
species_data = species_data.sort_values('Avg_Production', ascending=False)

# Define colors for each species
colors = ['#2E7D32', '#558B2F', '#7CB342', '#9CCC65', '#C5E1A5', '#DCEDC8']

# Create donut chart
fig3 = go.Figure(data=[go.Pie(
    labels=species_data['Species'],
    values=species_data['Avg_Production'],
    hole=0.5,
    marker_colors=colors[:len(species_data)],
    textinfo='label+percent',
    textposition='outside',
    textfont=dict(size=11),
    hovertemplate='<b>%{label}</b><br>Avg Monthly: %{value:,.0f} thousand m³<br>Share: %{percent}<extra></extra>'
)])

# Add center annotation
total_avg = species_data['Avg_Production'].sum()
fig3.add_annotation(
    text=f'<b>Total</b><br>{total_avg/1000:.1f}M m³<br>per month',
    x=0.5, y=0.5,
    font=dict(size=14),
    showarrow=False
)

fig3.update_layout(
    title=dict(
        text='<b>Lumber Production by Wood Species</b><br><sup>Average Monthly Production Share (2014-2024)</sup>',
        font=dict(size=16)
    ),
    template='plotly_white',
    showlegend=True,
    legend=dict(
        orientation='h',
        yanchor='bottom',
        y=-0.2,
        xanchor='center',
        x=0.5
    ),
    height=550,
    margin=dict(l=20, r=20, t=100, b=80)
)

fig3.show()

print("\n")
print(style.BOLD + "Species Distribution Insights" + style.END)
print("""
The species breakdown reveals critical industry characteristics:

1. SPF DOMINANCE:
   Spruce-Pine-Fir accounts for the overwhelming majority of production.
   This species group is the workhorse of North American construction.

2. PREMIUM SPECIES:
   - Western Red Cedar: Valued for outdoor applications, decay resistance
   - Douglas Fir: Structural applications, appearance-grade products
   - Hemlock: Cost-effective alternative for various applications

3. STRATEGIC IMPLICATIONS:
   The heavy concentration in SPF creates:
   - Exposure to U.S. housing market cycles
   - Vulnerability to trade disputes (softwood lumber tariffs)
   - Opportunity for premium species expansion
""")


Species Distribution Insights



The species breakdown reveals critical industry characteristics:



1. SPF DOMINANCE:

   Spruce-Pine-Fir accounts for the overwhelming majority of production.

   This species group is the workhorse of North American construction.



2. PREMIUM SPECIES:

   - Western Red Cedar: Valued for outdoor applications, decay resistance

   - Douglas Fir: Structural applications, appearance-grade products

   - Hemlock: Cost-effective alternative for various applications



3. STRATEGIC IMPLICATIONS:

   The heavy concentration in SPF creates:

   - Exposure to U.S. housing market cycles

   - Vulnerability to trade disputes (softwood lumber tariffs)

   - Opportunity for premium species expansion

Click to expand/collapse code

# Prepare data - pivot table of production by year and month
heatmap_data = df_total[
    df_total['Metric_Type'] == 'Production'
].pivot_table(
    index='Month',
    columns='Year',
    values='Value_Thousands_m3',
    aggfunc='sum'
)

# Create month labels
month_labels = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

# Create heatmap
fig4 = go.Figure(data=go.Heatmap(
    z=heatmap_data.values,
    x=heatmap_data.columns.astype(str),
    y=month_labels,
    colorscale='Greens',
    hovertemplate='<b>%{y} %{x}</b><br>Production: %{z:,.0f} thousand m³<extra></extra>',
    colorbar=dict(title='Thousand m³')
))

fig4.update_layout(
    title=dict(
        text='<b>Monthly Production Heatmap</b><br><sup>Production Intensity by Month and Year (Darker = Higher Production)</sup>',
        font=dict(size=16)
    ),
    xaxis_title='Year',
    yaxis_title='Month',
    template='plotly_white',
    height=450,
    margin=dict(l=80, r=20, t=100, b=60),
    yaxis=dict(autorange='reversed')  # January at top
)

fig4.show()

print("\n")
print(style.BOLD + "Heatmap Analysis" + style.END)
print("""
This heatmap reveals production patterns at a glance:

1. SEASONAL PATTERNS (Vertical Reading):
   - Darker cells in March-June indicate spring peak production
   - Lighter cells in December-February show winter slowdown
   - July often shows a mid-summer dip

2. YEAR-OVER-YEAR TRENDS (Horizontal Reading):
   - 2016-2018: Consistently dark, indicating peak production years
   - 2020: Mixed pattern showing pandemic disruption
   - 2021-2024: Generally lighter, showing reduced production levels

3. ANOMALIES:
   - Very light cells indicate unusual events (shutdowns, strikes)
   - Very dark cells may indicate catch-up production periods
""")


Heatmap Analysis



This heatmap reveals production patterns at a glance:



1. SEASONAL PATTERNS (Vertical Reading):

   - Darker cells in March-June indicate spring peak production

   - Lighter cells in December-February show winter slowdown

   - July often shows a mid-summer dip



2. YEAR-OVER-YEAR TRENDS (Horizontal Reading):

   - 2016-2018: Consistently dark, indicating peak production years

   - 2020: Mixed pattern showing pandemic disruption

   - 2021-2024: Generally lighter, showing reduced production levels



3. ANOMALIES:

   - Very light cells indicate unusual events (shutdowns, strikes)

   - Very dark cells may indicate catch-up production periods

Click to expand/collapse code

# Prepare annual data
annual_prod = df_total[
    df_total['Metric_Type'] == 'Production'
].groupby('Year')['Value_Thousands_m3'].sum().reset_index()
annual_prod['Production_M'] = annual_prod['Value_Thousands_m3'] / 1000

# Calculate year-over-year change
annual_prod['YoY_Change'] = annual_prod['Production_M'].pct_change() * 100

# Create color based on YoY change
colors = ['#C62828' if x < 0 else '#2E7D32' for x in annual_prod['YoY_Change'].fillna(0)]
colors[0] = '#1565C0'  # First year is neutral

# Create bar chart
fig5 = go.Figure()

fig5.add_trace(go.Bar(
    x=annual_prod['Year'],
    y=annual_prod['Production_M'],
    marker_color=colors,
    text=annual_prod['Production_M'].round(1),
    textposition='outside',
    textfont=dict(size=10),
    hovertemplate='<b>%{x}</b><br>Production: %{y:.2f} million m³<extra></extra>'
))

# Add reference line for average
avg_prod = annual_prod['Production_M'].mean()
fig5.add_hline(
    y=avg_prod,
    line_dash='dash',
    line_color='gray',
    annotation_text=f'10-Year Avg: {avg_prod:.1f}M m³',
    annotation_position='right'
)

fig5.update_layout(
    title=dict(
        text='<b>Annual Lumber Production</b><br><sup>Total Production by Year (Million Cubic Metres) - Green: Growth, Red: Decline</sup>',
        font=dict(size=16)
    ),
    xaxis_title='Year',
    yaxis_title='Production (Million Cubic Metres)',
    template='plotly_white',
    showlegend=False,
    height=450,
    margin=dict(l=60, r=20, t=100, b=60),
    xaxis=dict(tickmode='linear')
)

fig5.show()

print("\n")
print(style.BOLD + "Annual Production Trends" + style.END)
print("""
Key observations from annual production:

1. GROWTH PHASE (2014-2018):
   Production increased steadily, peaking in 2017 driven by:
   - Strong U.S. housing recovery
   - Canadian dollar weakness boosting exports
   - Industry capacity investments

2. PANDEMIC IMPACT (2020):
   Production declined due to mill closures and demand uncertainty,
   but recovered faster than initially expected.

3. POST-PANDEMIC ADJUSTMENT (2021-2024):
   Production has settled below pre-pandemic peaks, reflecting:
   - Higher interest rates slowing construction
   - Timber supply constraints (beetle damage, wildfires)
   - Labour market challenges
""")


Annual Production Trends



Key observations from annual production:



1. GROWTH PHASE (2014-2018):

   Production increased steadily, peaking in 2017 driven by:

   - Strong U.S. housing recovery

   - Canadian dollar weakness boosting exports

   - Industry capacity investments



2. PANDEMIC IMPACT (2020):

   Production declined due to mill closures and demand uncertainty,

   but recovered faster than initially expected.



3. POST-PANDEMIC ADJUSTMENT (2021-2024):

   Production has settled below pre-pandemic peaks, reflecting:

   - Higher interest rates slowing construction

   - Timber supply constraints (beetle damage, wildfires)

   - Labour market challenges

Click to expand/collapse code

# Prepare data
correlation_data = df_total.pivot_table(
    index=['Date', 'Year'],
    columns='Metric_Type',
    values='Value_Thousands_m3'
).reset_index()

# Create scatter plot
fig6 = px.scatter(
    correlation_data,
    x='Production',
    y='Shipments',
    color='Year',
    color_continuous_scale='Viridis',
    hover_data={'Date': True, 'Production': ':.0f', 'Shipments': ':.0f'},
    labels={
        'Production': 'Production (Thousand m³)',
        'Shipments': 'Shipments (Thousand m³)',
        'Year': 'Year'
    }
)

# Add diagonal reference line (1:1 ratio)
min_val = min(correlation_data['Production'].min(), correlation_data['Shipments'].min())
max_val = max(correlation_data['Production'].max(), correlation_data['Shipments'].max())
fig6.add_trace(go.Scatter(
    x=[min_val, max_val],
    y=[min_val, max_val],
    mode='lines',
    name='1:1 Line',
    line=dict(color='red', dash='dash', width=1),
    showlegend=True
))

# Calculate correlation
correlation = correlation_data['Production'].corr(correlation_data['Shipments'])

fig6.update_layout(
    title=dict(
        text=f'<b>Production vs Shipments Relationship</b><br><sup>Monthly Data (Correlation: r = {correlation:.3f})</sup>',
        font=dict(size=16)
    ),
    template='plotly_white',
    height=500,
    margin=dict(l=60, r=20, t=100, b=60)
)

fig6.show()

print("\n")
print(style.BOLD + "Correlation Analysis" + style.END)
print(f"""
The scatter plot reveals the strength of production-shipment linkage:

1. STRONG CORRELATION (r = {correlation:.3f}):
   Production and shipments move almost in lockstep, indicating:
   - Efficient market clearing
   - Responsive supply chain
   - Make-to-order rather than make-to-stock operations

2. POINTS ABOVE 1:1 LINE:
   When shipments exceed production, inventory is being drawn down
   (strong demand signal).

3. POINTS BELOW 1:1 LINE:
   When production exceeds shipments, inventory is building
   (potential demand weakness or planned buildup).

4. COLOR GRADIENT:
   Lighter colors (recent years) cluster at lower volumes, confirming
   the production decline trend observed in other visualizations.
""")


Correlation Analysis



The scatter plot reveals the strength of production-shipment linkage:



1. STRONG CORRELATION (r = 0.929):

   Production and shipments move almost in lockstep, indicating:

   - Efficient market clearing

   - Responsive supply chain

   - Make-to-order rather than make-to-stock operations



2. POINTS ABOVE 1:1 LINE:

   When shipments exceed production, inventory is being drawn down

   (strong demand signal).



3. POINTS BELOW 1:1 LINE:

   When production exceeds shipments, inventory is building

   (potential demand weakness or planned buildup).



4. COLOR GRADIENT:

   Lighter colors (recent years) cluster at lower volumes, confirming

   the production decline trend observed in other visualizations.

Conclusion

Click to expand/collapse code

print(style.BOLD + "Summary Statistics" + style.END)

# Calculate key summary metrics
summary_metrics = {
    'Analysis Period': '2014-2024 (11 years)',
    'Total Records Analyzed': f"{len(df):,}",
    '---': '---',
    'Peak Annual Production': f"{annual_summary['Production (M m³)'].max():.2f} M m³ ({annual_summary['Production (M m³)'].idxmax()})",
    'Lowest Annual Production': f"{annual_summary['Production (M m³)'].min():.2f} M m³ ({annual_summary['Production (M m³)'].idxmin()})",
    'Average Annual Production': f"{annual_summary['Production (M m³)'].mean():.2f} M m³",
    '----': '----',
    'Primary Species': f"SPF ({spf_pct:.0f}% of production)",
    'Inventory Turnover': f"{inventory_ratio:.1f} months supply",
    'Supply Chain Efficiency': f"{efficiency:.1f}% shipment-to-production ratio"
}

print("\n")
for key, value in summary_metrics.items():
    if key.startswith('-'):
        print("-"*50)
    else:
        print(f"  {key:<30} {value}")

Summary Statistics





  Analysis Period                2014-2024 (11 years)

  Total Records Analyzed         3,092

--------------------------------------------------

  Peak Annual Production         66.63 M m³ (2016)

  Lowest Annual Production       48.75 M m³ (2024)

  Average Annual Production      58.36 M m³

--------------------------------------------------

  Primary Species                SPF (90% of production)

  Inventory Turnover             1.9 months supply

  Supply Chain Efficiency        99.5% shipment-to-production ratio