Quest for Data Science: FII and DII turnover with effect on Nifty Downloader

My thirst for statistics has been increasing. IV had another requirement, which would eventually be useful to me as well. He currently downloads FII and DII buy and sell values and its impact on Nifty manually in Excel. He suggested me to try and automate this process in R. Wow! Some more learning of R, which would eventually help me in building my strategies!

Unlike in NSE EOD Bhavcopy and BSE EOD Bhavcopy downloaders, I tried a different approach to download FII + DII stats from NSEIndia website (as if I had a choice), along with the index Nifty values and change over previous day. The National Stock Exchange of India has different structures for different pages. To download index values, you have to refer to the link http://www.nseindia.com/content/indices/ind_histvalues.htm whereas for FII and DII stats, you need to visit http://www.nseindia.com/content/equities/eq_fiidii_archives.htm.

Both of these pages are HTML forms, you have to enter parameters like index name, start and end dates. Another challenge is that each of these pages displays a maximum of 100 rows in table and if you need all rows (more than 100), you have to download a csv (dynamically generated in temp location). I faced problem in accessing the csv, as for older dates, the link generated was not valid. Hence, I decided to read the page itself, parse the table to be consumed as data frame. For instances, where the parameters generate more than 100 rows, I decided to use while loop.

OK, now here is the solution for it.

I am using the packages RHTMLForms, RCurl, and XML. While RCurl and XML packages are available at R repositories, RHTMLForms package is maintained by omegahat, and can be installed by using the following command

install.packages('RHTMLForms', repos = "http://www.omegahat.org/R")

Another peculiar problem faced is that NSE India website has enforced check for useragent, which if not specified explicitly in R, would not allow access to the desired data.

This can be achieved by defining RCurl settings before loading the library. The command is

options(RCurlOptions = list(useragent = "R"))
library(RCurl)

NOTE: 9-Aug-2011, I have modified the code a bit to enable this script with optional input parameters, like Start Date as 16-April-2007, from which both FII and DII stats are available, along with End Date as current date. Another change that has been made is to check, if file already exist; if it does not exist, the script creates the file, if it exists, it reads the last entry in the file, and starts downloading records beyond that to current date.

outputDir = "D:\\FII Stats"
filename = "FII-DII-Nifty.csv"

startDate = as.Date("2007-04-16", order="ymd")
endDate =Sys.Date()

The code can be downloaded from here.

#install.packages('RHTMLForms', repos = "http://www.omegahat.org/R")
 
library(RHTMLForms)
options(RCurlOptions = list(useragent = "R"))
library(RCurl) 
library(XML)  
library(timeDate)
 
blnfileExist=FALSE
 
######################################################################
# User Input
outputDir = "D:\\FII Stats"
filename = "FII-DII-Nifty.csv"
######################################################################
 
 
######################################################################
# Optional User Input
startDate = as.Date("2007-04-16", order="ymd")
endDate =Sys.Date()
 
######################################################################
 
 
 
# If file exists, read the file to retrieve dates for 
# which information already exists
 
 if (file.exists(file.path(outputDir, filename)))
 {
  existingStats <- read.csv(file.path(outputDir, filename), header=TRUE, sep=",")
 
  # Read Last date in the csv and add 1 to begin with next date range
  startDate = end(as.timeDate(existingStats$Date))
  startDate = as.Date(startDate)+1
  blnfileExist = TRUE
 }else
 {
  # User defined startDate 
  existingStats <- NULL
  blnfileExist = FALSE
 }
 
 
 # Read the HTML page since we cannot use htmlParse() directly
 # as it does not specify the user agent or an
 # Accept:*.*
 
urlNifty <- "http://www.nseindia.com/content/indices/ind_histvalues.htm";
urlFIIDIIEq <- "http://www.nseindia.com/content/equities/eq_fiidii_archives.htm"
 
contentNifty = getURLContent(urlNifty)
contentFIIDIIEq = getURLContent(urlFIIDIIEq)
 
 # Now that we have the page, parse it and use the RHTMLForms
 # package to create an R function that will act as an interface
 # to the form.
 
docNifty = htmlParse(contentNifty, asText = TRUE)
docFIIDIIEq = htmlParse(contentFIIDIIEq, asText = TRUE)
 
  # need to set the URL for this document since we read it from
  # text, rather than from the URL directly
 
docName(docNifty) = urlNifty
docName(docFIIDIIEq) = urlFIIDIIEq
 
  # Create the form description and generate the R
  # function "call" the
 
formNifty = getHTMLFormDescription(docNifty)[[1]]
funNifty = createFunction(formNifty)
 
formFIIDIIEq = getHTMLFormDescription(docFIIDIIEq)[[1]]
funFIIDIIEq = createFunction(formFIIDIIEq)
 
  # now we can invoke the form from R. We only need 2
  # inputs  - FromDate and ToDate
 
#Since the NSE URLs only display a maximum of 100 records, 
# we shall attempt the range in loop 
 
myStDt = startDate
 
while (myStDt <= endDate){
 
 if (endDate<= myStDt + 50){ 
  myEnDt = endDate
 }else
 {
  myEnDt = myStDt + 50
 }
 
 print(paste("Downloading from",myStDt,"to", myEnDt ))
 
 Nifty = funNifty(fromDate = as.character(myStDt-5, "%d-%m-%Y"), 
  toDate = as.character(myEnDt, "%d-%m-%Y"), indexType="S&P CNX NIFTY")
 FIIDIIEq = funFIIDIIEq (fromDate = as.character(myStDt, "%d-%m-%Y"), 
  toDate = as.character(myEnDt, "%d-%m-%Y"), category="all")
 
  # Having looked at the tables, I think we want the the 4th one.
 tableNifty = readHTMLTable(htmlParse(Nifty, asText = TRUE),
                        which = 4,
    skip.rows = 3, 
    trim=TRUE,
    as.data.frame = TRUE,
                        header = TRUE,
    stringsAsFactors = FALSE)
 
  # Having looked at the tables, I think we want the the 4th one.
 tableFIIDIIEq = readHTMLTable(htmlParse(FIIDIIEq, asText = TRUE),
                        which = 4,
    trim=TRUE,
    as.data.frame = TRUE,
                        header = TRUE,
    stringsAsFactors = FALSE)
 
 #Select only FII stats
 
 #Format Date Column from string to Date type
 tableFIIDIIEq$Date <- as.Date(tableFIIDIIEq$Date, format="%d-%b-%Y")
 
 #Order by Date in Ascending Order
 tableFIIDIIEq<-tableFIIDIIEq[order(tableFIIDIIEq$Date,decreasing = TRUE),]
 
 dfFIIEq <-subset(tableFIIDIIEq, Category=="FII")
 colnames(dfFIIEq)[3] <- "FIIBuyValue"
 colnames(dfFIIEq)[4] <- "FIISellValue"
 colnames(dfFIIEq)[5] <- "FIINetValue"
 
 #Select only DII stats
 dfDIIEq <-subset(tableFIIDIIEq, Category=="DII")
 colnames(dfDIIEq)[3] <- "DIIBuyValue"
 colnames(dfDIIEq)[4] <- "DIISellValue"
 colnames(dfDIIEq)[5] <- "DIINetValue"
 
 #Merge FII and DII Stats
 dfFIIDIIEq<-merge(dfFIIEq,dfDIIEq, by.x="Date", by.y="Date")
 
 #Convert into numeric for addition
 dfFIIDIIEq$DIINetValue <- as.numeric(dfFIIDIIEq$DIINetValue)
 dfFIIDIIEq$FIINetValue <- as.numeric(dfFIIDIIEq$FIINetValue)
 
 # Find Effective Inflow
 dfFIIDIIEq$EffectiveInflow <- dfFIIDIIEq$FIINetValue + dfFIIDIIEq$DIINetValue
 
 #Process Nifty
 dfNifty <- tableNifty 
 colnames(dfNifty)[6] <- "Volume"
 colnames(dfNifty)[7] <- "TurnoverInCr"
 
 #Format Date Column from string to Date type
 dfNifty$Date <- as.Date(dfNifty$Date, format="%d-%b-%Y")
 
 #Format CLose Column from string to Numeric
 dfNifty$Close <- as.numeric(dfNifty$Close)
 
 #Order by Date in Ascending Order
 dfNifty <-dfNifty[order(dfNifty$Date,decreasing = TRUE),]
 
 #Create a function to find out Change in Nifty Closing over previous day
 FUN=function(a) c(diff(a), NA)
 
 dfNifty$Change<- FUN(dfNifty$Close)*-1
 
 #Store the merged dataframe in temp and then join it with earlier results
 temp <- merge(dfNifty,dfFIIDIIEq, by.x="Date", by.y="Date")
 
 if (startDate == myStDt){ 
  dfNiftyFIIDIIEq = temp
 }else
 {
  dfNiftyFIIDIIEq <- rbind(dfNiftyFIIDIIEq,temp)
 }
 
 myStDt = myStDt + 51
}
 
closeAllConnections() 
 
######################################################################
#Get Rid of unwanted columns
 
 dfNiftyFIIDIIEq$Category.x <- NULL
 dfNiftyFIIDIIEq$Category.y <- NULL
# dfNiftyFIIDIIEq$Open <- NULL
# dfNiftyFIIDIIEq$High <- NULL
# dfNiftyFIIDIIEq$Low <- NULL
# dfNiftyFIIDIIEq$Volume <- NULL
# dfNiftyFIIDIIEq$TurnoverInCr<- NULL
 
# If file exists, merge the retrieved records
 
 if (blnfileExist)
 {
  print("Appending to existing file")
  #existingStats$Date <- as.Date(existingStats$Date, format = "%d-%m-%Y")
  dfNiftyFIIDIIEq$Date <- as.Date(dfNiftyFIIDIIEq$Date, format = "%d-%m-%Y")
  dfNiftyFIIDIIEq <- rbind(dfNiftyFIIDIIEq,existingStats)
  dfNiftyFIIDIIEq <- dfNiftyFIIDIIEq[order(dfNiftyFIIDIIEq$Date, decreasing=TRUE),]
 }else
 {
  print("Writing to a new file")
  dfNiftyFIIDIIEq <- dfNiftyFIIDIIEq[order(dfNiftyFIIDIIEq$Date, decreasing=TRUE),]
 } 
#Finally write the csv file
 write.csv(dfNiftyFIIDIIEq ,file=file.path(outputDir, filename),row.names = FALSE)

Created by Pretty R at inside-R.org

Next steps, add more indices in this list and try to figure out any co-relation.

6 comments:

BransonAug 26, 2011, 1:17:00 AM
In addition to the HTML table, shouldn't you also get the "Download file in csv format" link? Is it possible to directly download data from this link instead of parsing HTML table?
Market AnalyzerAug 27, 2011, 9:59:00 AM
@Branson: Thanks for visiting this blog.

NSE India generates a temp file after one has queried and generates a dynamic link. Technically, yes it is possible to download the file instead of parsing HTML table.

How to do it?
Make a call to the NSE site for Nifty Index(reqd to generate the link), and then join these bits

1. "http://www.nseindia.com/content/indices/histdata/S&P%20CNX%20NIFTY"
2. Start Date "dd-mm-yyyy"
3. "-"
4. End Date "dd-mm-yyyy"
5. ".csv"

Do note that this link cannot be called directly, and you will have to make a call to generate this page and link, hence, you can skip the part of parsing HTML

Similarly you can do it for FII and DII Stats.

I had not used this approach since at times the link generated was not returning any csv file; hence I had to look for alternates and went with parsing of HTML, and had to break a single call into multiple calls using loops.

You can refer to my other post for how to do it.

Let me know, if you have any further questions, ideas and suggestions.
BransonSep 1, 2011, 2:16:00 AM
Thank you very much for the detailed explanation. This is really helpful.
veepsAug 18, 2012, 11:07:00 AM
sir,
I do not know how to convert 'tableFIIDIIEq$Date' to class "Date" ?
Here is the output
-------------------------------------------------------------------R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> source("C:\\Users\\pri\\Desktop\\Extract FII Stats with loop.R")
Loading required package: bitops
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
Tag nobr invalid
Tag nobr invalid
Opening and ending tag mismatch: td and table
Opening and ending tag mismatch: tr and table
htmlParseEntityRef: no name
htmlParseEntityRef: no name
htmlParseEntityRef: no name
htmlParseEntityRef: no name
htmlParseEntityRef: no name
Opening and ending tag mismatch: td and font
Tag nobr invalid
Tag nobr invalid
[1] "Downloading from 2007-04-16 to 2007-06-05"
Error in as.Date.default(tableFIIDIIEq$Date, format = "%d-%b-%Y") :
do not know how to convert 'tableFIIDIIEq$Date' to class "Date"
>
--------------------------------------------------------------------
GraayMay 26, 2015, 4:20:00 PM
I was trying to use this script only for getting nifty50 data from
`http://nseindia.com/products/content/equities/indices/historical_index_data.htm`

This fails at lines where

'formNifty = getHTMLFormDescription(docNifty)[[1]]
funNifty = createFunction(formNifty)`

Gave the error:
For 1st line: Error in getHTMLFormDescription(docNifty)[[1]] : subscript out of bounds
For 2nd line: Error in writeFunction(formDescription, character(), url, con, verbose = verbose, :
You should provide a form description here. See getFormDescription().

Is it fixable? Thanks in advance.

Aug 4, 2011

FII and DII turnover with effect on Nifty Downloader

6 comments: