I have attempted to filter out dates for specific files using Apache spark inside the file to RDD function
I have attempted to do the following:
This should match the following:
Any idea how to achieve this?
Looking at the accepted answer, it seems to use some form of glob syntax. It also reveals that the API is an exposure of Hadoop's
Searching reveals that paths supplied to
setInputPath"may represent a file, a directory, or, by using glob, a collection of files and directories". Perhaps,
SparkContext also uses those APIs to set the path.
The syntax of the glob includes:
*(match 0 or more character)
?(match single character)
[^ab](negated character class)
Following the example in the accepted answer, it is possible to write your path as:
It's not clear how alternation syntax can be used here, since comma is used to delimit a list of paths (as shown above). According to zero323's comment, no escaping is necessary: