Skip to content
pvmehta.com

pvmehta.com

  • Home
  • About Me
  • Toggle search form
  • Oracle Standby Database Library Index from Metalink Oracle
  • find_open_cur.sql Find open cursorts per session Oracle
  • Oracle Internal Good Websites 1 Oracle
  • Rman Notes -1 Oracle
  • Oracle Connections expire_time and firewall Oracle
  • Adding addidional hard drive and attach it to a linux box. Linux/Unix
  • move_arch_files.ksh /* Good One */ Linux/Unix
  • How to analyze statspack or AWR report. Oracle
  • nfs mount command Linux/Unix
  • Difference between SYNC and AFFIRM Oracle
  • find_pk.sql /* Find Primary Key */ Oracle
  • New Latest Param.sql for finding all hidden parameters also Oracle
  • get_vmstat.ksh for Solaris Oracle
  • Free conference number from http://www.freeconference.com Oracle
  • Resolving RMAN Hung Jobs Oracle

Add new columns in dataframe

Posted on 30-Sep-202301-Oct-2023 By Admin No Comments on Add new columns in dataframe
from pyspark.sql.functions import col, lit

# File location and type
file_location = "/FileStore/tables/sales_data_part1.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "true"
delimiter = ","

# The applied options are for CSV files. For other file types, these will

# be ignored.
df = spark.read.format(file_type) \
.option("inferSchema", infer_schema) \
.option("header", first_row_is_header) \
.option("sep", delimiter) \
.load(file_location)

display(df)



# Adding new column with and with default values.

# Remember to import lit function from from

#pyspark.sql.functions
# Following code will add new column named

#COntinet with default value of North America

df2 = df.withColumn("Continent", lit("North America"))
df2.display()



# Adding new column based on existing row values
# Following code till add new column TotalPrice

# by multiplying Quantity and UnitPrice
df3 = df.withColumn("TotalPrice", col("Quantity")* col("UnitPrice"))
df3.display()



#Adding multiple columns
# Following code will add 2 columns,

# Total price = Quantity * UnitPrice and

# Region with default value as India
df4 = df.withColumn("TotalPrice", col("Quantity")* col("UnitPrice")).withColumn("Region", lit("India"))
df4.display()



#Adding column using SELECT
# Following code will create new DF with single column named Region and assigned

# default value of "India" to all its null values.
df5=df.select(lit("India").alias("Region"))
df5.display();



# To all all colums, you can use following code
df6=df.select(col("InvoiceNo"), col("StockCode"), col("Description"), col("Quantity"), col("InvoiceDate"), col("UnitPrice"), col("CustomerID"), col("Country"),lit("India").alias("Region"))
df6.display()

Python/PySpark

Post navigation

Previous Post: Getting started with notebook
Next Post: Load testing on Oracle 19C RAC with HammerDB

Related Posts

  • Getting started with notebook Python/PySpark
  • Read CSV File using Python Python/PySpark
  • How to connect to Oracle Database with Wallet with Python. Oracle
  • Reading config file from other folder inside class Python/PySpark
  • Python class import from different folders Python/PySpark
  • Read CSV file using PySpark Python/PySpark

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Ansible (0)
  • AWS (2)
  • Azure (1)
  • Django (0)
  • GIT (1)
  • Linux/Unix (149)
  • MYSQL (5)
  • Oracle (393)
  • PHP/MYSQL/Wordpress (10)
  • POSTGRESQL (1)
  • Power-BI (0)
  • Python/PySpark (7)
  • RAC (17)
  • rman-dataguard (26)
  • shell (149)
  • SQL scripts (342)
  • SQL Server (6)
  • Uncategorized (0)
  • Videos (0)

Recent Posts

  • Complete Git Tutorial for Beginners25-Dec-2025
  • Postgres DB user and OS user.25-Dec-2025
  • Trace a SQL session from another session using ORADEBUG30-Sep-2025
  • SQL Server Vs Oracle Architecture difference25-Jul-2025
  • SQL Server: How to see historical transactions25-Jul-2025
  • SQL Server: How to see current transactions or requests25-Jul-2025
  • T-SQL Vs PL/SQL Syntax25-Jul-2025
  • Check SQL Server edition25-Jul-2025
  • Checking SQL Server Version25-Jul-2025
  • Oracle vs MYSQL Architecture differences (For DBAs)24-Jul-2025

Archives

  • 2025
  • 2024
  • 2023
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • find_err.sql for finding errors from dba_errors. Oracle
  • My Minimum Tuning Programs Oracle
  • backspace in SQL Plus not working then..? Linux/Unix
  • Rownum with Order by Oracle
  • Vivek’s egrep commands to trace problem. (on linux x86-64) Linux/Unix
  • Set Role explaination. Oracle
  • find_du.ksh to find # of files, their sizes in current folder and its subdolder Linux/Unix
  • For Perl DBI installation and testing program PHP/MYSQL/Wordpress

Copyright © 2026 pvmehta.com.

Powered by PressBook News WordPress theme