My thirst for statistics has been increasing. IV had another requirement, which would eventually be useful to me as well. He currently downloads FII and DII buy and sell values and its impact on Nifty manually in Excel. He suggested me to try and automate this process in R. Wow! Some more learning of R, which would eventually help me in building my strategies!
Unlike in NSE EOD Bhavcopy and BSE EOD Bhavcopy downloaders, I tried a different approach to download FII + DII stats from NSEIndia website (as if I had a choice), along with the index Nifty values and change over previous day. The National Stock Exchange of India has different structures for different pages. To download index values, you have to refer to the link http://www.nseindia.com/content/indices/ind_histvalues.htm whereas for FII and DII stats, you need to visit http://www.nseindia.com/content/equities/eq_fiidii_archives.htm.
Both of these pages are HTML forms, you have to enter parameters like index name, start and end dates. Another challenge is that each of these pages displays a maximum of 100 rows in table and if you need all rows (more than 100), you have to download a csv (dynamically generated in temp location). I faced problem in accessing the csv, as for older dates, the link generated was not valid. Hence, I decided to read the page itself, parse the table to be consumed as data frame. For instances, where the parameters generate more than 100 rows, I decided to use while loop.
I am using the packages RHTMLForms, RCurl, and XML. While RCurl and XML packages are available at R repositories, RHTMLForms package is maintained by omegahat, and can be installed by using the following command
install.packages('RHTMLForms', repos = "http://www.omegahat.org/R")
Another peculiar problem faced is that NSE India website has enforced check for useragent, which if not specified explicitly in R, would not allow access to the desired data.
This can be achieved by defining RCurl settings before loading the library. The command is
options(RCurlOptions = list(useragent = "R"))
library(RCurl)
NOTE: 9-Aug-2011, I have modified the code a bit to enable this script with optional input parameters, like Start Date as 16-April-2007, from which both FII and DII stats are available, along with End Date as current date. Another change that has been made is to check, if file already exist; if it does not exist, the script creates the file, if it exists, it reads the last entry in the file, and starts downloading records beyond that to current date.
outputDir = "D:\\FII Stats"
filename = "FII-DII-Nifty.csv"
filename = "FII-DII-Nifty.csv"
startDate = as.Date("2007-04-16", order="ymd")
endDate =Sys.Date()
The code can be downloaded from here.
#install.packages('RHTMLForms', repos = "http://www.omegahat.org/R") library(RHTMLForms) options(RCurlOptions = list(useragent = "R")) library(RCurl) library(XML) library(timeDate) blnfileExist=FALSE ###################################################################### # User Input outputDir = "D:\\FII Stats" filename = "FII-DII-Nifty.csv" ###################################################################### ###################################################################### # Optional User Input startDate = as.Date("2007-04-16", order="ymd") endDate =Sys.Date() ###################################################################### # If file exists, read the file to retrieve dates for # which information already exists if (file.exists(file.path(outputDir, filename))) { existingStats <- read.csv(file.path(outputDir, filename), header=TRUE, sep=",") # Read Last date in the csv and add 1 to begin with next date range startDate = end(as.timeDate(existingStats$Date)) startDate = as.Date(startDate)+1 blnfileExist = TRUE }else { # User defined startDate existingStats <- NULL blnfileExist = FALSE } # Read the HTML page since we cannot use htmlParse() directly # as it does not specify the user agent or an # Accept:*.* urlNifty <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"; urlFIIDIIEq <- "http://www.nseindia.com/content/equities/eq_fiidii_archives.htm" contentNifty = getURLContent(urlNifty) contentFIIDIIEq = getURLContent(urlFIIDIIEq) # Now that we have the page, parse it and use the RHTMLForms # package to create an R function that will act as an interface # to the form. docNifty = htmlParse(contentNifty, asText = TRUE) docFIIDIIEq = htmlParse(contentFIIDIIEq, asText = TRUE) # need to set the URL for this document since we read it from # text, rather than from the URL directly docName(docNifty) = urlNifty docName(docFIIDIIEq) = urlFIIDIIEq # Create the form description and generate the R # function "call" the formNifty = getHTMLFormDescription(docNifty)[[1]] funNifty = createFunction(formNifty) formFIIDIIEq = getHTMLFormDescription(docFIIDIIEq)[[1]] funFIIDIIEq = createFunction(formFIIDIIEq) # now we can invoke the form from R. We only need 2 # inputs - FromDate and ToDate #Since the NSE URLs only display a maximum of 100 records, # we shall attempt the range in loop myStDt = startDate while (myStDt <= endDate){ if (endDate<= myStDt + 50){ myEnDt = endDate }else { myEnDt = myStDt + 50 } print(paste("Downloading from",myStDt,"to", myEnDt )) Nifty = funNifty(fromDate = as.character(myStDt-5, "%d-%m-%Y"), toDate = as.character(myEnDt, "%d-%m-%Y"), indexType="S&P CNX NIFTY") FIIDIIEq = funFIIDIIEq (fromDate = as.character(myStDt, "%d-%m-%Y"), toDate = as.character(myEnDt, "%d-%m-%Y"), category="all") # Having looked at the tables, I think we want the the 4th one. tableNifty = readHTMLTable(htmlParse(Nifty, asText = TRUE), which = 4, skip.rows = 3, trim=TRUE, as.data.frame = TRUE, header = TRUE, stringsAsFactors = FALSE) # Having looked at the tables, I think we want the the 4th one. tableFIIDIIEq = readHTMLTable(htmlParse(FIIDIIEq, asText = TRUE), which = 4, trim=TRUE, as.data.frame = TRUE, header = TRUE, stringsAsFactors = FALSE) #Select only FII stats #Format Date Column from string to Date type tableFIIDIIEq$Date <- as.Date(tableFIIDIIEq$Date, format="%d-%b-%Y") #Order by Date in Ascending Order tableFIIDIIEq<-tableFIIDIIEq[order(tableFIIDIIEq$Date,decreasing = TRUE),] dfFIIEq <-subset(tableFIIDIIEq, Category=="FII") colnames(dfFIIEq)[3] <- "FIIBuyValue" colnames(dfFIIEq)[4] <- "FIISellValue" colnames(dfFIIEq)[5] <- "FIINetValue" #Select only DII stats dfDIIEq <-subset(tableFIIDIIEq, Category=="DII") colnames(dfDIIEq)[3] <- "DIIBuyValue" colnames(dfDIIEq)[4] <- "DIISellValue" colnames(dfDIIEq)[5] <- "DIINetValue" #Merge FII and DII Stats dfFIIDIIEq<-merge(dfFIIEq,dfDIIEq, by.x="Date", by.y="Date") #Convert into numeric for addition dfFIIDIIEq$DIINetValue <- as.numeric(dfFIIDIIEq$DIINetValue) dfFIIDIIEq$FIINetValue <- as.numeric(dfFIIDIIEq$FIINetValue) # Find Effective Inflow dfFIIDIIEq$EffectiveInflow <- dfFIIDIIEq$FIINetValue + dfFIIDIIEq$DIINetValue #Process Nifty dfNifty <- tableNifty colnames(dfNifty)[6] <- "Volume" colnames(dfNifty)[7] <- "TurnoverInCr" #Format Date Column from string to Date type dfNifty$Date <- as.Date(dfNifty$Date, format="%d-%b-%Y") #Format CLose Column from string to Numeric dfNifty$Close <- as.numeric(dfNifty$Close) #Order by Date in Ascending Order dfNifty <-dfNifty[order(dfNifty$Date,decreasing = TRUE),] #Create a function to find out Change in Nifty Closing over previous day FUN=function(a) c(diff(a), NA) dfNifty$Change<- FUN(dfNifty$Close)*-1 #Store the merged dataframe in temp and then join it with earlier results temp <- merge(dfNifty,dfFIIDIIEq, by.x="Date", by.y="Date") if (startDate == myStDt){ dfNiftyFIIDIIEq = temp }else { dfNiftyFIIDIIEq <- rbind(dfNiftyFIIDIIEq,temp) } myStDt = myStDt + 51 } closeAllConnections() ###################################################################### #Get Rid of unwanted columns dfNiftyFIIDIIEq$Category.x <- NULL dfNiftyFIIDIIEq$Category.y <- NULL # dfNiftyFIIDIIEq$Open <- NULL # dfNiftyFIIDIIEq$High <- NULL # dfNiftyFIIDIIEq$Low <- NULL # dfNiftyFIIDIIEq$Volume <- NULL # dfNiftyFIIDIIEq$TurnoverInCr<- NULL # If file exists, merge the retrieved records if (blnfileExist) { print("Appending to existing file") #existingStats$Date <- as.Date(existingStats$Date, format = "%d-%m-%Y") dfNiftyFIIDIIEq$Date <- as.Date(dfNiftyFIIDIIEq$Date, format = "%d-%m-%Y") dfNiftyFIIDIIEq <- rbind(dfNiftyFIIDIIEq,existingStats) dfNiftyFIIDIIEq <- dfNiftyFIIDIIEq[order(dfNiftyFIIDIIEq$Date, decreasing=TRUE),] }else { print("Writing to a new file") dfNiftyFIIDIIEq <- dfNiftyFIIDIIEq[order(dfNiftyFIIDIIEq$Date, decreasing=TRUE),] } #Finally write the csv file write.csv(dfNiftyFIIDIIEq ,file=file.path(outputDir, filename),row.names = FALSE)
Next steps, add more indices in this list and try to figure out any co-relation.
In addition to the HTML table, shouldn't you also get the "Download file in csv format" link? Is it possible to directly download data from this link instead of parsing HTML table?
ReplyDelete@Branson: Thanks for visiting this blog.
ReplyDeleteNSE India generates a temp file after one has queried and generates a dynamic link. Technically, yes it is possible to download the file instead of parsing HTML table.
How to do it?
Make a call to the NSE site for Nifty Index(reqd to generate the link), and then join these bits
1. "http://www.nseindia.com/content/indices/histdata/S&P%20CNX%20NIFTY"
2. Start Date "dd-mm-yyyy"
3. "-"
4. End Date "dd-mm-yyyy"
5. ".csv"
Do note that this link cannot be called directly, and you will have to make a call to generate this page and link, hence, you can skip the part of parsing HTML
Similarly you can do it for FII and DII Stats.
I had not used this approach since at times the link generated was not returning any csv file; hence I had to look for alternates and went with parsing of HTML, and had to break a single call into multiple calls using loops.
You can refer to my other post for how to do it.
Let me know, if you have any further questions, ideas and suggestions.
Thank you very much for the detailed explanation. This is really helpful.
ReplyDeletesir,
ReplyDeleteI do not know how to convert 'tableFIIDIIEq$Date' to class "Date" ?
Here is the output
-------------------------------------------------------------------R version 2.9.2 (2009-08-24)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> source("C:\\Users\\pri\\Desktop\\Extract FII Stats with loop.R")
Loading required package: bitops
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
htmlParseEntityRef: expecting ';'
Tag nobr invalid
Tag nobr invalid
Opening and ending tag mismatch: td and table
Opening and ending tag mismatch: tr and table
htmlParseEntityRef: no name
htmlParseEntityRef: no name
htmlParseEntityRef: no name
htmlParseEntityRef: no name
htmlParseEntityRef: no name
Opening and ending tag mismatch: td and font
Tag nobr invalid
Tag nobr invalid
[1] "Downloading from 2007-04-16 to 2007-06-05"
Error in as.Date.default(tableFIIDIIEq$Date, format = "%d-%b-%Y") :
do not know how to convert 'tableFIIDIIEq$Date' to class "Date"
>
--------------------------------------------------------------------
I was trying to use this script only for getting nifty50 data from
ReplyDelete`http://nseindia.com/products/content/equities/indices/historical_index_data.htm`
This fails at lines where
'formNifty = getHTMLFormDescription(docNifty)[[1]]
funNifty = createFunction(formNifty)`
Gave the error:
For 1st line: Error in getHTMLFormDescription(docNifty)[[1]] : subscript out of bounds
For 2nd line: Error in writeFunction(formDescription, character(), url, con, verbose = verbose, :
You should provide a form description here. See getFormDescription().
Is it fixable? Thanks in advance.
Hey Graay,
Deletenseindia had completely changed the website structure and does not use forms for the link you are trying.
However, have you tried quandl (https://www.quandl.com/data/NSE)?
Let me know, what are you trying exactly.
Cheers